Do you ever want to check out what people are saying about the latest movie release? You may want to know whether a certain movie is recommended or not, whether it’s a box-office hit, or whether it is suited for a family audience. But, you don’t know how to check the above metrics. Don’t worry, this article has got it covered for you. In this article, we are going to pull live data from Twitter with the help of the Tweepy library, pre-processing of the tweets, and building a word cloud over the data.
Twitter is the micro-blogging platform and social media app where people express their opinions and emotions and support others’ opinions. Twitter has a user base of around 217 million people around the world. If many users talk about a particular topic either using a #hashtag or a recurring word, then the topic will become a trending one. By using either the trending #hashtag or most occurrences of a word we can filter tweets and develop an analysis over the filtered data.
Tweepy is a Python package installer for accessing Twitter APIs. With the help of Twitter APIs the data of interest can be pulled out for analysis.
Before jumping into how to use Tweepy, the first thing we need is the key to access the Twitter APIs. For access you need to login to https://developer.twitter.com/. Once logged in you need to provide your name, country, and for what purpose you are going to use the API to access data from Twitter. By clicking the next button, you will get essential access.
After you get the access, you need to create a project to use the v2 endpoints (access the tweets). Once you click on create project you have to provide the project name, use case, and project description. You can even provide a one-line explanation for all the requirements. Finally, the project is ready to use by clicking the app setup.
After creating the project, you need to create the app by providing the name to access the data, and once done with that you will get the keys to the fort. Keys and tokens, which are necessary for accessing the API, are provided and they can be viewed only once. So, one needs to store the keys & tokens someplace safe. If you forget the keys, you can always re-generate them and the old ones get revoked.
So far the user will have essential access. It is good, but not good enough to pull more tweets and create more projects parallelly. The user has to apply for elevated access to get more benefits. Both essential and elevated access come with zero cost.
To apply for elevated access, the user needs to provide the basic info to some very straightforward questions. Once done, the user needs to provide the use case for requesting the elevated access and any honest answer will do for this question. Finally, the user needs to review the information provided and accept the terms. At this point, the user access level will be moved to elevated.
In this section, we see how to extract tweet information from the Python library Tweepy.
First, we import the necessary libraries for this project.
1 2 3 4
import tweepy from wordcloud import WordCloud import matplotlib.pyplot as plt
Second, we need to copy the keys generated from the Twitter developer account here in four different variables as below.
1 2 3 4
CONSUMER_KEY = 'paste_the code_here' CONSUMER_SECRET = 'paste_the code_here' ACCESS_TOKEN = 'paste_the code_here' ACCESS_TOKEN_SECRET = 'paste_the code_here'
Once we import the necessary libs and save the various keys & secrets in the four different variables, we need to establish the connection with the authentication API. Once authentication is successful, then we can do API-related calls on extracting the data.
1 2 3
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET) auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET) api = tweepy.API(auth)
Below is the snippet for extracting tweet data based on the query term of our requirement. Once we query the required data of tweets from Twitter, then we can use some stop-words to avoid the words when processing the tweets.
1
2
3
4
5
6
7
8
9
10
11
12
def tweetSearch(query, limit = 1000, language = "en", remove = []):
text = ""
for tweet in tweepy.Cursor(api.search_tweets, q = query, lang = language)
.items(limit):
text += tweet.text.lower()
removeWords = ["https", "co"]
removeWords += remove
for word in removeWords:
text = text.replace(word, "")
return text
Below is the snippet for visualizing the tweet data we extracted in wordcloud format. Once we build the word cloud, we can plot it and display it for the user.
1
2
3
4
5
6
7
search = tweetSearch('KGF2')
wordcloud = WordCloud(width = 800, height = 600)
.generate(search)
plt.figure(figsize = (15, 15), facecolor = 'k')
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis("off")
plt.savefig('KGF2.png', facecolor = 'k', bbox_inches = 'tight')
From the word cloud, it is evident that the words box office, cr, show full, and many others indicate the movie is well received by the audience and it is a big box-office hit during the release period.
In this article, we saw how to get API key access from the Twitter developer account, extract tweets data using Tweepy, and build visualization over the tweets generated for the particular hashtag.