Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Five steps to search and store tweets by keywords

22,864 views

Published on

This episode of tutorial teaches you how to download tweets that include a set of keywords.

Published in: Education, Technology
  • can any one help me
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • processing id 1/2/home/python/global/set/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning. InsecurePlatformWarning Error reading id %g uncontrol, exception: Twitter API returned a 401 (Unauthorized), Timestamp out of bounds. processing id 2/2 Error reading id %g unviolence, exception: Twitter API returned a 401 (Unauthorized), Timestamp out of bounds.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • @cosmopolitanvan sure...it will be really helpful if you can send a test mail to me from your id...my email id is pallab.sarkar59@gmail.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • @Pallab Sarkar Pallab, can you send me the script you edited? Sometimes, nuanced tweaking in a script can cause problems.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • First of all...awesome tutorial. But while running the embedded code for extracting data for a keyword...its throwing me an error at the getData function.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Five steps to search and store tweets by keywords

  1. 1. Five Steps to Search and Store Tweets by Keywords • Created by The Curiosity Bits Blog (curiositybits.com) • With the support from Dr. Gregory D. Saxton (http://social-metrics.org/ )
  2. 2. The output you will get… Let’s say I want to study Twitter discussions of the missing Malaysian airliner MH370. I plan to gather all tweets that include the keywords MH370 or Malaysian. You will get an ample amount of metadata for each tweet. Here is a breakdown of each metadata type: name Def. tweet_id The unique identifier for a tweet inserted_date When the tweet is downloaded into your database language language retweeted_status Is the tweet a RETWEET? content The content of the tweet from_user_scree n_name The screen name of the tweet sender
  3. 3. name Def. from_user_followers_count The number of followers the sender has from_user_friends_count The number of users the sender is following from_user_listed_count How many times the sender is listed from_user_statuses_count The number of tweets sent by the sender from_user_description The profile bio of the sender from_user_location The location of the sender from_user_created_at When the Twitter account is created retweet_count How many times the tweet is retweeted entities_urls The URLs included in the tweet entities_urls_count The number of URLs included in the tweet entities_hashtags The hashtags included in the tweet entities_hashtags_count The number of hashtags in the tweet entities_mentions The screen-names mentioned in a tweet
  4. 4. name Def. in_reply_to_screen_name The screen name of the user who is replied to by the sender in_reply_to_status_id The unique identifier of a reply entities_expanded_urls Complete URLs extracted from short URLs json_output The ENTIRE metadata in JSON format, including metadata not parsed into columns entities_media_count NA media_expanded_url NA media_url NA media_type NA video_link NA photo_link NA twitpic NA
  5. 5. Step 1: Checklist • Do you know how to install necessary Python libraries? If not, please review pg.8 in http://curiositybits.com/python-for-mining-the-social-web/python- tutorial-mining-twitter-user-profile/ • Do you know how to browse and edit SQLite database through SQLite Database Browser? If not, please review pg.10-14 in http://curiositybits.com/python-for- mining-the-social-web/python-tutorial-mining-twitter-user-profile/ Download the code https://drive.google.com/file/d/0Bwwg6GLCW_I Pdm1mcHNXeU85Nkk/edit?usp=sharing
  6. 6. Have you installed these necessary Python libraries? Step 1: Checklist
  7. 7. Step 1: Checklist Most importantly, we need to install a Twitter mining library called Twython (https://twython.readthedocs.org/en/latest/index.html)
  8. 8. Step 2: enter the search terms You can enter multiple search terms, separated by comas. Please notice that the last search term ends by a coma. You can enter non-English search terms. But make sure the Python script starts by the following block of code:
  9. 9. Step 3: enter your API keys API Key API secret Access token Access token secret Enter the key inside the quotation marks
  10. 10. Step 3: enter your API keys • Set up your API keys - 1 First, go to https://dev.twitter.com/, and sign in your Twitter account. Go to my applications page to create an application.
  11. 11. Step 3: enter your API keys • Set up your API keys - 2 Enter any name that makes sense to you Enter any text that makes sense to you you can enter any legitimate URL, here, I put in the URL of my institution. Same as above, you can enter any legitimate URL, here, I put in the URL of my institution.
  12. 12. Step 4: change the parameter result_type defined by the Twitter API Documents. Now, we set it to recent, we can also set it to mixed or popular.
  13. 13. Step 4: change the parameter Here is a list of parameters you can tweak or add: https://dev.twitter.com/docs/api/1.1/get/search/tweets For example, if you want to limit the search to Chinese, you can add lang = ‘zh’
  14. 14. Step 4: change the parameter For another example, if you want to limit the search to all tweets sent until April 1 of 2014. You can add until = ‘2014- 04-01’
  15. 15. Step 5: set up SQLite database • When you type in just a file name, the database will be saved in the same folder with the Python script. You can use a full file path such as sqlite:///C:/xxxx/xxx/MH370.sqlite.
  16. 16. Hit RUN!
  17. 17. If you run the script daily or twice a day, you should be good enough to cover all tweets generated on that day, and tweets a few days old. But, historical tweets are EXPENSIVE! Tweets older than a week can be purchased through http://gnip.com/ Are we getting all the tweets?

×