SlideShare a Scribd company logo
Guess the Country
Playing with Twitter Streaming API
Chris Birchall
#m3dev Tech Talk 2014/7/11
It started with an idle tweet...
https://twitter.com/cbirchall/status/466197512143912961
Let’s use Twitter for something
(slightly) useful!
The plan:
● Collect geo-tagged tweets from Twitter
Streaming API
● Use them to build a name⇔country DB
● Build a simple search UI as a proof of
concept
● (crowbar Spark in there somewhere
because it’s cool)
Implementation
Twitter
Streaming
API
EC2
https://github.com/cb372/guess-the-country
Twitter4j
.log
Fluentd
S3
EC2
Spark
Postgres
(RDS)
Heroku
Rails
Collecting tweets
● Ran the collector for 13 days
● Collected 285,340 geo-tagged tweets
● 205,798 distinct users
● Only collected names and countries,
threw everything else away
● Used Spark to filter out duplicate users
Processing
Stats
Top 10 countries by user count
Distinct countries = 204
Distinct first names = 40,689
Distinct last names = 81,674
country | percentage
-----------------------------+------------
United States | 39.4
United Kingdom | 10.1
Indonesia | 8.9
Brasil | 8.1
Türkiye | 3.9
España | 2.4
México | 2.2
Republic of the Philippines | 2.0
Canada | 1.8
Malaysia | 1.8
first_name
------------
chris
alex
david
michael
sarah
second_name
-------------
smith
jones
garcia
williams
johnson
Most popular first names
Most popular surnames
Results
It works surprisingly well!
(well, it worked for my name, anyway)
Note for the pedantic:
Since the original data is geo-tagged tweets, strictly speaking we only know
where a user is, not where they come from.
Try for yourself
Demo
http://guess-the-country.herokuapp.com/
Code
https://github.com/cb372/guess-the-country

More Related Content

Viewers also liked

Writeexcelについて
WriteexcelについてWriteexcelについて
Writeexcelについて
asa 999
 
Coursera experience
Coursera experienceCoursera experience
Coursera experience
Brian Hooper
 
テストの運用について #m3dev
テストの運用について #m3devテストの運用について #m3dev
テストの運用について #m3dev
Kazuhiro Sera
 
Load testing with gatling
Load testing with gatlingLoad testing with gatling
Load testing with gatling
Chris Birchall
 

Viewers also liked (19)

問題が起こった時、変えるのは人かそれともプロセスか?
問題が起こった時、変えるのは人かそれともプロセスか?問題が起こった時、変えるのは人かそれともプロセスか?
問題が起こった時、変えるのは人かそれともプロセスか?
 
DevOpsハッカソン参加レポート
DevOpsハッカソン参加レポートDevOpsハッカソン参加レポート
DevOpsハッカソン参加レポート
 
ScalaCache: simple caching in Scala
ScalaCache: simple caching in ScalaScalaCache: simple caching in Scala
ScalaCache: simple caching in Scala
 
多分モダンなWebアプリ開発
多分モダンなWebアプリ開発多分モダンなWebアプリ開発
多分モダンなWebアプリ開発
 
マイクロサービス運用の所感 #m3dev
マイクロサービス運用の所感 #m3devマイクロサービス運用の所感 #m3dev
マイクロサービス運用の所感 #m3dev
 
Writeexcelについて
WriteexcelについてWriteexcelについて
Writeexcelについて
 
Coursera experience
Coursera experienceCoursera experience
Coursera experience
 
Phone Home: A client-side error collection system
Phone Home: A client-side error collection systemPhone Home: A client-side error collection system
Phone Home: A client-side error collection system
 
Skinny Controllers, Skinny Models
Skinny Controllers, Skinny ModelsSkinny Controllers, Skinny Models
Skinny Controllers, Skinny Models
 
テストの運用について #m3dev
テストの運用について #m3devテストの運用について #m3dev
テストの運用について #m3dev
 
Branching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by AbstractionBranching Strategies: Feature Branches vs Branch by Abstraction
Branching Strategies: Feature Branches vs Branch by Abstraction
 
Load testing with gatling
Load testing with gatlingLoad testing with gatling
Load testing with gatling
 
Skinny Framework 1.0.0
Skinny Framework 1.0.0Skinny Framework 1.0.0
Skinny Framework 1.0.0
 
マイクロサービス化の障壁
マイクロサービス化の障壁マイクロサービス化の障壁
マイクロサービス化の障壁
 
複数サービスを共存させるために 試行錯誤したこと
複数サービスを共存させるために 試行錯誤したこと複数サービスを共存させるために 試行錯誤したこと
複数サービスを共存させるために 試行錯誤したこと
 
20161206 うるう秒社内勉強会 社外向け資料
20161206 うるう秒社内勉強会 社外向け資料20161206 うるう秒社内勉強会 社外向け資料
20161206 うるう秒社内勉強会 社外向け資料
 
akka-streamのマイクロサービスへの適用
akka-streamのマイクロサービスへの適用akka-streamのマイクロサービスへの適用
akka-streamのマイクロサービスへの適用
 
Scala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGSScala.js & friends: SCALA ALL THE THINGS
Scala.js & friends: SCALA ALL THE THINGS
 
Python 機械学習プログラミング データ分析ライブラリー解説編
Python 機械学習プログラミング データ分析ライブラリー解説編Python 機械学習プログラミング データ分析ライブラリー解説編
Python 機械学習プログラミング データ分析ライブラリー解説編
 

Similar to Guess the Country - Playing with Twitter Streaming API

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 

Similar to Guess the Country - Playing with Twitter Streaming API (20)

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
 
Scaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at LyftScaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at Lyft
 
Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018
Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018
Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018
 
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk""Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
 
Apache Spark Best Practices Meetup Talk
Apache Spark Best Practices Meetup TalkApache Spark Best Practices Meetup Talk
Apache Spark Best Practices Meetup Talk
 
Hey Terraform, build me GCP Infrastructure
Hey Terraform, build me GCP InfrastructureHey Terraform, build me GCP Infrastructure
Hey Terraform, build me GCP Infrastructure
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
 
PySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March MeetupPySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March Meetup
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Are general purpose big data systems eating the world?
Are general purpose big data systems eating the world?Are general purpose big data systems eating the world?
Are general purpose big data systems eating the world?
 
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
Collaborative environment with data science notebook
Collaborative environment with data science notebook Collaborative environment with data science notebook
Collaborative environment with data science notebook
 
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
How to Performance-Tune Apache Spark Applications in Large Clusters
How to Performance-Tune Apache Spark Applications in Large ClustersHow to Performance-Tune Apache Spark Applications in Large Clusters
How to Performance-Tune Apache Spark Applications in Large Clusters
 
How to performance tune spark applications in large clusters
How to performance tune spark applications in large clustersHow to performance tune spark applications in large clusters
How to performance tune spark applications in large clusters
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 

More from Chris Birchall (6)

Rust 超入門
Rust 超入門Rust 超入門
Rust 超入門
 
Tour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache KafkaTour of Distributed Systems 3 - Apache Kafka
Tour of Distributed Systems 3 - Apache Kafka
 
Tour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - CassandraTour of distributed systems 2 - Cassandra
Tour of distributed systems 2 - Cassandra
 
Tour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeperTour of distributed systems 1 - ZooKeeper
Tour of distributed systems 1 - ZooKeeper
 
Hydra
HydraHydra
Hydra
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES Systems
 

Recently uploaded

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Guess the Country - Playing with Twitter Streaming API