top of page
  • doctorsmonsters

Find Relevant Top Hashtags Using Python -Part 1



In the age of social media, it is all about getting more audience. One of the may ways of doing so is using hashtags. If you know what the top hashtags are, you can include them in your post and potentially get discovered by more audience who are looking at posts associated with those hashtags.

The source code can be found here.


The outline:

Here, we will write code that takes some text input from the user, converts it to a hashtag and then retrieves top tweets with that hashtag from twitter. The code will then scrape the retrieved tweets for other hashtag and finally return a list of all the hashtags for the user to review/use.


Getting user input

We will ask the user for input and save it to a variable “tag” after converting it to string.

tag =str(input(“Please enter your hashtag/text: “))

Cleaning user input

Hashtags are lower case with no space. Therefore, you have to consider the possibility that the user may provide text that may not be all lower case and may also contain spaces. We will write a function that takes the text, converts it to lower case and removes all spaces. It will also remove “#” from the beginning of the tag in case the user gave a hashtag as input.

def clean_input(tag):
    tag =tag.replace(“ “,””)
    if tag.startswith(‘#’):
        return tag[1:].lower()
    else:
        return tag.lower()

Extracting hashtags from tweets

Before we talk about retrieving the tweets, let’s write a function that extracts hashtags from tweets. Here is the code:

def return_all_hashtags(tweets, tag):
    all_hashtags=[]
    for tweet in tweets:
        for word in tweet.split():
            if word.startswith(‘#’) and word.lower() != '#"+tag.lower():
                all_hashtags.append(word.lower())
    return all_hashtags

The above functions takes a list of tweets and the user’s hashtag. It loops through the list of tweets, split each tweet into words, then loops through all the words. It then checks if the word starts with a hash (#), if it does, it make’s sure it is not the same as the user’s input tag and if it is not, it will add it to the list of all hashtags. Eventually it will return the list of all hashtags.


Retrieving tweets

We are going to use a very robust python library “tweepy” for retrieving tweets. In order to access twitter API. You will need to register as a developer and get your consumer key, consumer secret, access token and access token secret. Please refer to tweepy authentication and twitter API documentations for how to get these access codes.

Authenticating and setting up your access codes:

consumer_key= [Your consumer key]
consumer_secret= [Your consumer secret]
access_token= [Your access token]
access_token_secret= [Your access token secret]
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

Now let’s put it all together in the function that takes the tag as input and then using our previous functions, cleans it, retrieves relevant tweets, extract hashtags from them and then return a sorted list of the hashtags along with their frequencies.

import tweepy as tw
def get_hashtags(tag):
    search_tag=clean_input(tag)
    tweets = tw.Cursor(api.search,
                q=’#’+search_tag,
                lang=”en”).items(200)
    tweets_list=[] 
    for tweet in tweets:
        tweets_list.append(tweet.text)
    all_tags= return_all_hashtags(tweets_list, search_tag)
    frequency={} 
    for item in set(all_tags):
        frequency[item]=all_tags.count(item)
    return {k: v for k, v in sorted(frequency.items(), 
                key=lambda item: item[1], reverse= True)}

Finally, we will put it all together and print out the hashtags with their counts.

all_tags = get_hashtags(tag)
for item in all_tags:
    print(item, all_tags[item])

Next

Why limit yourself to twitter? In part 2, we will add the code for scraping Instagram posts for hashtags.

35 views0 comments

Recent Posts

See All
Post: Blog2_Post
bottom of page