Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

1

Share

Download to read offline

Guess the Country - Playing with Twitter Streaming API

Download to read offline

Using the Twitter statuses sample API to build a name<->country database

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Guess the Country - Playing with Twitter Streaming API

  1. 1. Guess the Country Playing with Twitter Streaming API Chris Birchall #m3dev Tech Talk 2014/7/11
  2. 2. It started with an idle tweet... https://twitter.com/cbirchall/status/466197512143912961
  3. 3. Let’s use Twitter for something (slightly) useful! The plan: ● Collect geo-tagged tweets from Twitter Streaming API ● Use them to build a name⇔country DB ● Build a simple search UI as a proof of concept ● (crowbar Spark in there somewhere because it’s cool)
  4. 4. Implementation Twitter Streaming API EC2 https://github.com/cb372/guess-the-country Twitter4j .log Fluentd S3 EC2 Spark Postgres (RDS) Heroku Rails
  5. 5. Collecting tweets ● Ran the collector for 13 days ● Collected 285,340 geo-tagged tweets ● 205,798 distinct users ● Only collected names and countries, threw everything else away ● Used Spark to filter out duplicate users Processing
  6. 6. Stats Top 10 countries by user count Distinct countries = 204 Distinct first names = 40,689 Distinct last names = 81,674 country | percentage -----------------------------+------------ United States | 39.4 United Kingdom | 10.1 Indonesia | 8.9 Brasil | 8.1 Türkiye | 3.9 España | 2.4 México | 2.2 Republic of the Philippines | 2.0 Canada | 1.8 Malaysia | 1.8 first_name ------------ chris alex david michael sarah second_name ------------- smith jones garcia williams johnson Most popular first names Most popular surnames
  7. 7. Results It works surprisingly well! (well, it worked for my name, anyway) Note for the pedantic: Since the original data is geo-tagged tweets, strictly speaking we only know where a user is, not where they come from.
  8. 8. Try for yourself Demo http://guess-the-country.herokuapp.com/ Code https://github.com/cb372/guess-the-country
  • rekimurakami

    Jul. 11, 2014

Using the Twitter statuses sample API to build a name&lt;->country database

Views

Total views

7,526

On Slideshare

0

From embeds

0

Number of embeds

2,035

Actions

Downloads

5

Shares

0

Comments

0

Likes

1

×