Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Five steps to get tweets sent by a list of users

3,813 views

Published on

In this episode of Python tutorial, we teach you how to gather tweets sent by a list of Twitter users. Download the script at https://drive.google.com/file/d/0Bwwg6GLCW_IPVmNBMUV4bVhUU0U/edit?usp=sharing

Published in: Education, Technology

Five steps to get tweets sent by a list of users

  1. 1. Collecting Tweets Sent by a List of Users • This Python tutorial is brought to you by CuriosityBits.com, with the generous support from Dr. Gregory D. Saxton (http://social-metrics.org/) 1
  2. 2. Five Steps… 1. Install Python and necessary Python libraries. 2. Set up Twitter API Keys. 3. Prepare a list of Twitter handles (screen-names) in .csv format. 4. Create a SQLite database using SQLite Browser, and import the Twitter handle list. 5. Modify Python script and run it to get results! Download the Python script https://drive.google.com/file/d/0Bwwg6GLCW_I PVmNBMUV4bVhUU0U/edit?usp=sharing 2
  3. 3. The results you will get… You will get an ample amount of metadata for each tweet collected. Here is a breakdown of some important output variables: name Def. tweet_id The unique identifier for a tweet inserted_date When the tweet is downloaded into your database language language retweeted_status Is the tweet a RETWEET? content The content of the tweet from_user_screen_name The Twitter handle of sender created_at When the tweet is sent 3
  4. 4. name Def. from_user_followers_count The number of followers a sender has from_user_friends_count The number of users a sender is following from_user_listed_count How many times a sender is listed by other users from_user_statuses_count The number of tweets sent by the sender from_user_description The profile bio of the sender from_user_location The location of the sender from_user_created_at When the sender Twitter account is created retweet_count How many times a tweet is retweeted entities_urls The URLs included in a tweet entities_urls_count The number of URLs included in a tweet entities_hashtags The hashtags included in a tweet entities_hashtags_count The number of hashtags in a tweet entities_mentions The Twitter handles mentioned in a tweet 4
  5. 5. name Def. in_reply_to_screen_name Whom do the sender reply to in_reply_to_status_id The unique identifier of the Twitter handle replied to by the sender entities_expanded_urls Complete URLs extracted from short URLs json_output The ENTIRE metadata in JSON format, including metadata not parsed into columns entities_media_count NA media_expanded_url NA media_url NA media_type NA video_link NA photo_link NA twitpic NA 5
  6. 6. Step 1. Install Python and necessary libraries 6 Download Anaconda Python 2.7 to run Python scripts. Anaconda is free to download. Once you’ve installed Anaconda, you can modify scripts in Spyder
  7. 7. • Do you know how to install necessary Python libraries? If not, please review pg.8 in http://curiositybits.com/python-for-mining-the- social-web/python-tutorial-mining-twitter-user- profile/ Install the following libraries 7 Step 1. Install Python and necessary libraries • Simplejson (https://pypi.python.org/pypi/simplejson) • Sqlite3 (http://sqlite.org/) • Sqlalchemy (http://www.sqlalchemy.org/) • Twython (https://twython.readthedocs.org/en/latest/index.html)
  8. 8. Step 2: Set up Twitter API Keys. First, go to https://dev.twitter.com/, and sign in your Twitter account. Go to my applications page to create an application. 8
  9. 9. Enter any name that makes sense to you Enter any text that makes sense to you you can enter any legitimate URL, here, I put in the URL of my institution. Same as above, you can enter any legitimate URL, here, I put in the URL of my institution. 9 Step 2: Set up Twitter API Keys.
  10. 10. Then, go to API Keys page, scroll down to the bottom and click Create my access token. Wait for a few minutes and refresh the page, then you get all your keys! you need API Key, API Secret, Access token, Access token secret. 10 Step 2: Set up Twitter API Keys.
  11. 11. Step 3: Prepare a Twitter handle list Create a list of Twitter handles whose tweets we are interested in collecting. You can create the list in Excel and save it as csv format. The list should contains three columns (in accordance to the configuration in the Python script). The first column lists sequential numbers beginning with 1. The second column lists Twitter handles. For the third column, I entered 1 all throughout, but you can leave it blank. 11
  12. 12. Go to http://sqlitebrowser.sourceforge.net/ and download SQLite Database Browser. It allows you to view and edit SQLite databases. 12 Step 4: Create a SQLite database
  13. 13. • File-New Database to create a new database. • Remember the database filename you enter. • The default file extension is .sqlite, to prevent future complications, add the extension .sqlite when typing filename. 13 Step 4: Create a SQLite database
  14. 14. Use File-Import Table From CSV File, import the .csv file you’ve saved. Name the imported table as accounts. This table name corresponds to the one we will use in Python script. After you click create, the csv list will be loaded into the database, and you can browse it in Browse Data. Lastly, remember to save the database. Stay on the database you’ve just created. 14 Step 4: Create a SQLite database
  15. 15. Modify the imported table: Go to Edit- Modify Tables, use Edit field to change column names. To correspond to the Python script, name the first column as rowid, and Fileld Type as Integer; the second column as screen_name, and Field type String, and the third as user_type, and String. In the end, the database table is defined as the screenshot. 15 Step 4: Create a SQLite database
  16. 16. Step 5: Modify the script and Run API Key API secret Access token Access token secret Find this block of code, and enter your API Keys. 16
  17. 17. Step 5: Modify the script and Run Find this block of code, and enter the filename and file path of the SQLite database you have created. You need to match the file path and file name to the SQLite database you’ve created (RECOMMENDED). If the Python script file and the created SQLite database are in the same folder, just paste your database name here. 17
  18. 18. Step 5: Specify search criteria You can refine search criteria: e.g. Count: Specifies the number of tweets to try and retrieve for each Twitter handle. The maximum value is 200. More on https://dev.twitter.com/docs/api/1/get/statuses/user_timeline 18
  19. 19. Step 4: Modify the script and Run In Spyder, Go to Run, and choose Execute in a new dedicated Python interpreter. The first option Execute in current Python or IPython interpreter does not work on my end, but may be working on your computer. 19
  20. 20. Some issues you may encounter Too many values to unpack ERRORS!! Don’t panic! It is almost certain that you will hit roadblocks when learning Python. So, be prepared to debug. For this error, it is probably because you’ve saved the Python script file in a place other than default Python folders. But what is default Python folder? 20
  21. 21. Find your default Python folders A simple way to find out your default Python folder is, On a WINDOWS machine, In Start menu, right-click the Computer and choose Properties 21
  22. 22. Find the default Python folder Folders listed here are your default Python folders.
  23. 23. On my machine, C:AnacondaLibsite-packages is one of the default Python folders. If the Python script is running successfully, it should give you these.
  24. 24. Some issues you may encounter Oops! Error again! Twitter API has rate limit. It restricts how many tweets you can get within a time frame. Based on the current script, you can cover 300ish users in a 15 minute window. Once you hit the limit, you will see the error message popping up. There are two ways to get around the restriction: 1. wait for 15 minutes for another run; 2. create multiple Twitter apps and get multiple API keys. Once you use up the quota in one run, paste in a new key to start a new run!
  25. 25. Some issues you may encounter But, pay attention to the block of code shown as above, The number 0 means that the script starts with the user listed in the first row. Because we will hit rate limit, you will need to run the code multiple times to complete crawling all users’ tweets. So, make sure to change the starting row number! For example, in the first run, you’ve covered user (0) to user (150), and run into rate limit. You should put 151 as the starting number in the second run.
  26. 26. Load the SQLite data into Excel You can export the data in SQLite Database to Excel. File – Export – Table as CSV to export the data into csv. format. Make sure to add the .csv file extension name.

×