Chief Technology Officer at Digital Reasoning Systems
Feb. 10, 2014•0 likes•2,904 views
1 of 45
Why Twitter Is All the Rage: A Data Miner's Perspective
Feb. 10, 2014•0 likes•2,904 views
Download to read offline
Report
Technology
A presentation on data mining with Twitter that was originally presented as an O'Reilly webinar. See http://oreillynet.com/pub/e/2928 for the archived webinar video.
Why Twitter Is All the Rage: A Data Miner's Perspective
1. 1
Why Twitter Is All The Rage:
A Data Miner's Perspective
Matthew A. Russell
O'Reilly Webcast
15 Oct 2013
2. 2
Hello, My Name Is ... Matthew
Educated as a Computer Scientist
CTO @ Digital Reasoning Systems
Data mining; machine learning
Author @ O'Reilly Media
5 published books on technology
Principal @ Zaffra
Selective boutique consulting
3. 3
Transforming Curiosity Into Insight
An open source software (OSS) project
http://bit.ly/MiningTheSocialWeb2E
A book
http://bit.ly/135dHfs
Accessible to (virtually) everyone
Virtual machine with turn-key coding
templates for data science experiments
Think of the book as "premium" support for
the OSS project
6. 6
Data Science
Data => Actionable information
Highly interdisciplinary
Nascent
Necessary
http://wikipedia.org/wiki/Data_science
7. 7
Digital Signal Explosion
A model for the world: signal and sinks
Growth in data exhaust is accelerating
Digital fingerprints
"Software is eating the world"
Data mining opportunities galore...
8. 8
Digital Data Stats
100 terabytes of data uploaded daily to Facebook.
Brands and organizations on Facebook receive 34,722 Likes every minute of
the day.
According to Twitter’s own research in early 2012, it sees roughly 175 million
tweets every day
30 Billion pieces of content shared on Facebook every month.
Data production will be 44 times greater in 2020 than it was in 2009
According to estimates, the volume of business data worldwide, across all
companies, doubles every 1.2 years.
See http://wikibon.org/blog/big-data-statistics
9. 9
Social Media Is All the Rage
World population: ~7B people
Facebook: 1.15B users
Twitter: 500M users
Google+ 343M users
LinkedIn: 238M users
~200M+ blogs (conservative estimate)
10. 10
Why Does Social Media Matter?
It's the frontier for predictive analytics
Understanding world events
Swaying political elections
Modeling human behavior
Analyzing sentiment
Making intelligent recommendations
11. 11
Twitter Is All the Rage
It satisfies fundamental human desires
We want to be heard
We want to satisfy our curiosity
We want it easy
We want it now
Accessible, rich, and (mostly) "open" data
RESTful APIs and JSON responses
Great proving ground for predictive analytics
12. 12
Twitter's Network Dynamics
500M curious users
100M curious users actively engaging
Real-time communication
Short, sweet, ... and fast
Asymmetric Following Model
An interest graph
14. 14
What's in a Tweet?
140 Characters ...
... Plus ~5KB of metadata!
Authorship
Time & location
Tweet "entities"
Replying, retweeting, favoriting, etc.
21. 21
~3 Months on Twitter
Aug 2013
Sept 2013
% Change
Johnny Araya
14,573
15,506
6.40%
Otto Guevara
Guth
114
159
39.47%
José María
Villalta FlorezEstrada
8,160
8,990
10.17%
745
858
15.17%
1,192
1,487
24.75%
Dr. Rodolfo
Hernández
Luis Guillermo
Solís Rivera
26. 26
Considerations for Measuring Influence
Spam bot accounts that effectively are zombies and can’t be harnessed for any
utility at all
Inactive or abandoned accounts that can’t influence or be influenced since they
are not in use
Accounts that follow so many other accounts that the likelihood of getting
noticed (and thus influencing) is practically zero
The network effects of retweets by accounts that are active and can be
influenced to spread a message
See also http://wp.me/p3QiJd-2a
27. 27
Social Media Popularity: Araya vs Hernández
Twitter Popularity
Facebook Popularity
Araya%
Araya%
Hernandez%
Hernandez%
28. 28
Realtime Analysis: #Syria
Monitor Twitter's firehose for realtime data using filters such as #Syria
Keep in mind the sheer volume of data can be considerable
Analysis at MiningTheSocialWeb.com
36. 36
#Syria: Why?
That's for you (as the data scientist) to decide
Quantitative automation can amplify human intelligence
Qualitative analysis is still requires human intelligence
38. 38
MTSW Virtual Machine Experience
Goal: Make it easy to transform curiosity into insight
Vagrant-based virtual machine
Virtualbox or AWS
IPython Notebook User Experience
Point-and-click GUI
100+ turn-key examples and templates
Social web mining for the masses
39. 39
Social Media Analysis Framework
A memorable four step process to guide data science experiments:
Aspire
Acquire
Analyze
Summarize