SlideShare a Scribd company logo
1 of 45
Download to read offline
1

Why Twitter Is All The Rage:
A Data Miner's Perspective
Matthew A. Russell - @ptwobrussell - http://MiningTheSocialWeb.com
PyTN - 23 February 2014
2

Overview

Intro
Twitter as a Platform for Data Science
Applications of Firehose Analysis (#Syria circa last)
Understanding the Amazon Prime Air Reaction (IPython Notebook Walk Through)
Q&A
3

Intro
4

Hello, My Name Is ... Matthew
Background in Computer Science
Data mining & machine learning
CTO @ Digital Reasoning Systems
Data mining; machine learning
Author @ O'Reilly Media
5 published books on technology
Principal @ Zaffra
Selective boutique consulting
5

Transforming Curiosity Into Insight
An open source software (OSS) project
http://bit.ly/MiningTheSocialWeb2E
A book
http://bit.ly/135dHfs
Accessible to (virtually) everyone
Virtual machine with turn-key coding
templates for data science experiments
Think of the book as "premium" support for the
OSS project
6

Mining the Social Web ToC
Chapter 1 - Mining Twitter
Chapter 2 - Mining Facebook
Chapter 3 - Mining LinkedIn
Chapter 4 - Mining Google+
Chapter 5 - Mining Web Pages
Chapter 6 - Mining Mailboxes
Chapter 7 - Mining GitHub
Chapter 8 - Mining the Semantically Marked-Up Web
Chapter 9 - Twitter Cookbook
7

Anatomy of Each Chapter
Brief Intro
Objectives
API Primer
Analysis Technique(s)
Data Visualization
Recap
Suggested Exercises
Recommended Resources
8

Opportunities for Data Alchemy

A model for the world: signal and sinks
Growth in data exhaust is accelerating
Digital fingerprints of the "real world" are accumulating
Lots of opportunities for motivated Python hackers
"Software is eating the world"
9

Social Media Is All the Rage
World population: 7B people
Facebook: 1B+ users
Twitter: 650M users
Google+ 500M users
LinkedIn: 260M users
250M+ blogs (conservatively?)
10

But what does it all mean, Basil?
It's a platform for data science and the frontier for predictive analytics
Understanding world events
Swaying political elections
Modeling human behavior
Analyzing sentiment
Making intelligent recommendations
11

Twitter & Data Science
12

Data Science

Data => Actionable information
Highly interdisciplinary
Nascent
Necessary

http://wikipedia.org/wiki/Data_science
13

Another View of Data Science
14
15

Twitter Is All the Rage
It satisfies fundamental human desires
We want to be heard
We want to satisfy our curiosity
We want it easy
We want it now
Accessible, rich, and (mostly) "open" data
RESTful APIs and JSON responses
Great proving ground for predictive analytics about the real world
16

Twitter's Network Dynamics
~650M curious users
A collective consciousness
Real-time communication
Short, sweet, ... and fast
Asymmetric Following Model
An interest graph
17

Twitter Primitives
Accounts Types: "Anything"
"Following" Relationships
Favorites
Retweets
Replies
(Almost) No Privacy Controls
18

Twitter and Facebook Compared
Twitter

Facebook

Accounts Types: "Anything"

Accounts Types: People & Pages

"Following" Relationships

Mutual Connections

Favorites

"Likes"

Retweets

"Shares"

Replies

"Comments"

(Almost) No Privacy Controls

Extensive Privacy Controls
19

What's in a Tweet?
140 Characters ...
... Plus ~5KB of metadata!
Authorship
Time & location
Tweet "entities"
Replying, retweeting, favoriting, etc.
20

What are Tweet Entities?
Essentially, the "easy to get at" data in the 140 characters
@usermentions
#hashtags
URLs
multiple variations

(financial) symbols
stock tickers

media
21

API Requests
RESTful requests
Everything is a "resource"
You GET, PUT, POST, and DELETE resources
Standard HTTP "verbs"

Example: GET https://api.twitter.com/1.1/statuses/user_timeline.json?
screen_name=SocialWebMining

Streaming API filters
JSON responses
Cursors (not quite pagination)
22

Data Mining: Low Hanging Fruit
"Know thy data..."
Start with simple stats:
Count
Compare
Filter
Rank
Then, apply more complex analyses
23

A Starting Point: Histograms

A chart that is handy for frequency analysis
They look like bar charts...except they're not bar charts
Each value on the x-axis is a range (or "bin") of values
Not categorical data
Each value on the y-axis is the combined frequency of values in each range
24

Example: Histogram of Retweets
25

Social Network Mechanics

Roberto

Mercedes

Jorge

Nina

Ana
26

Interest Graph Mechanics
U2

Roberto

Mercedes

Juan
Luis
Luís
Guerra

Ana

Jorge

Nina
27

A (Social) Interest Graph
U2

Roberto

Mercedes

Juan
Luis
Luís
Guerra

Ana

Jorge

Nina
28

A (Political) Interest Graph
Johnny
Araya
Roberto

Mercedes

Rodolfo
Hernández

Ana

Jorge

Nina
29

Measuring Influence Is Tricker Than It Looks

Spam bot accounts that effectively are zombies and can’t be harnessed for any utility
at all
Inactive or abandoned accounts that can’t influence or be influenced since they are
not in use
Accounts that follow so many other accounts that the likelihood of getting noticed (and
thus influencing) is practically zero
The network effects of retweets by accounts that are active and can be influenced to
spread a message
See also http://wp.me/p3QiJd-2a
30

Justin Bieber vs Tea Party
31

Realtime Analysis: #Syria

Monitor Twitter's firehose for realtime data using filters such as #Syria
Keep in mind the sheer volume of data can be considerable
Fuller analysis at http://wp.me/p3QiJd-1I
32

#Syria: Who?

See http://wp.me/p3QiJd-1I
33

#Syria: Who?

See http://wp.me/p3QiJd-1I
34

#Syria: Who?

See http://wp.me/p3QiJd-1I
35

#Syria: What?

See http://wp.me/p3QiJd-1I
36

#Syria: What?

See http://wp.me/p3QiJd-1I
37

#Syria: Where?

See http://wp.me/p3QiJd-1I
38

#Syria: When?

See http://wp.me/p3QiJd-1I
39

#Syria: Why?

That's for you (as the data scientist) to decide
Quantitative automation can amplify human intelligence
Qualitative analysis is still requires human intelligence
40

Twitter Firehose Analysis with
pandas
41

MTSW Virtual Machine Experience
Goal: Make it easy to transform curiosity into insight
Vagrant-based virtual machine
Virtualbox or AWS
IPython Notebook User Experience
Point-and-click GUI
100+ turn-key examples and templates
Social web mining for the masses
42

Social Media Analysis Framework

A memorable four step process to guide data science experiments:
Aspire
Acquire
Analyze
Summarize
43

Goals
To understand how to capture data from Twitter's firehose
A understand basic pandas usage for tweets
To work through a data science experiment with a systematic 4-step
process
To better understand the emotional reaction to the Amazon Prime Air
announcement
To introduce some tools for data science
44

Useful Links
Website
http://MiningTheSocialWeb.com
Twitter Data Mining Round Up
http://wp.me/p3QiJd-5H

All Source Code in IPython Notebook format (GitHub)
http://bit.ly/MiningTheSocialWeb2E
45

Q&A

More Related Content

What's hot

Data Analytics Capstone
Data Analytics CapstoneData Analytics Capstone
Data Analytics CapstoneMacemann
 
Twitter's Elections Integrity Datasets (Galvanize; February 22, 2019)
Twitter's Elections Integrity Datasets (Galvanize; February 22, 2019)Twitter's Elections Integrity Datasets (Galvanize; February 22, 2019)
Twitter's Elections Integrity Datasets (Galvanize; February 22, 2019)Ivan Corneillet
 
News-gathering and Monitoring | by: Menna El-hosary
News-gathering and Monitoring | by: Menna El-hosaryNews-gathering and Monitoring | by: Menna El-hosary
News-gathering and Monitoring | by: Menna El-hosaryMenna El-hosary
 
Twitter Analysis: Fake News
Twitter Analysis: Fake  NewsTwitter Analysis: Fake  News
Twitter Analysis: Fake NewsErika Siregar
 
Fake news detection project
Fake news detection projectFake news detection project
Fake news detection projectHarshdaGhai
 
Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Ke Tao
 
Fake News Checker Chatbot: MisterSpocker
Fake News Checker Chatbot: MisterSpockerFake News Checker Chatbot: MisterSpocker
Fake News Checker Chatbot: MisterSpockerJoey Fuentes
 
Rock your social media data with Tableau
Rock your social media data with TableauRock your social media data with Tableau
Rock your social media data with TableauAlexander Loth
 
How To Analyse Data
How To Analyse DataHow To Analyse Data
How To Analyse DataTempero UK
 
The Science of Social Timing: When to post on each social media network
The Science of Social Timing: When to post on each social media networkThe Science of Social Timing: When to post on each social media network
The Science of Social Timing: When to post on each social media networkMoving Targets
 
Twitter 101 - sending your first tweet
Twitter 101 - sending your first tweetTwitter 101 - sending your first tweet
Twitter 101 - sending your first tweetStephanie Butler
 
Airports on twitter 2015
Airports on twitter 2015Airports on twitter 2015
Airports on twitter 2015Sven Solterbeck
 
Twitter Presentation
Twitter PresentationTwitter Presentation
Twitter Presentationpmief
 
Event Analysis on the 2016 U.S. Presidential Election Using Social Media
Event Analysis on the 2016 U.S. Presidential Election Using Social MediaEvent Analysis on the 2016 U.S. Presidential Election Using Social Media
Event Analysis on the 2016 U.S. Presidential Election Using Social MediaJinho Choi
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Dataconomy Media
 
Hyper Analytics for Social Media Blogworld 2010 by PeopleBrowsr and Klout
Hyper Analytics for Social Media Blogworld 2010 by PeopleBrowsr and KloutHyper Analytics for Social Media Blogworld 2010 by PeopleBrowsr and Klout
Hyper Analytics for Social Media Blogworld 2010 by PeopleBrowsr and KloutPeopleBrowsr
 

What's hot (20)

Data Analytics Capstone
Data Analytics CapstoneData Analytics Capstone
Data Analytics Capstone
 
Twitter's Elections Integrity Datasets (Galvanize; February 22, 2019)
Twitter's Elections Integrity Datasets (Galvanize; February 22, 2019)Twitter's Elections Integrity Datasets (Galvanize; February 22, 2019)
Twitter's Elections Integrity Datasets (Galvanize; February 22, 2019)
 
News-gathering and Monitoring | by: Menna El-hosary
News-gathering and Monitoring | by: Menna El-hosaryNews-gathering and Monitoring | by: Menna El-hosary
News-gathering and Monitoring | by: Menna El-hosary
 
Twitter Analysis: Fake News
Twitter Analysis: Fake  NewsTwitter Analysis: Fake  News
Twitter Analysis: Fake News
 
Fake news detection project
Fake news detection projectFake news detection project
Fake news detection project
 
Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter
 
Fake News Checker Chatbot: MisterSpocker
Fake News Checker Chatbot: MisterSpockerFake News Checker Chatbot: MisterSpocker
Fake News Checker Chatbot: MisterSpocker
 
Rock your social media data with Tableau
Rock your social media data with TableauRock your social media data with Tableau
Rock your social media data with Tableau
 
How To Analyse Data
How To Analyse DataHow To Analyse Data
How To Analyse Data
 
The Science of Social Timing: When to post on each social media network
The Science of Social Timing: When to post on each social media networkThe Science of Social Timing: When to post on each social media network
The Science of Social Timing: When to post on each social media network
 
Twitter 101 - sending your first tweet
Twitter 101 - sending your first tweetTwitter 101 - sending your first tweet
Twitter 101 - sending your first tweet
 
Airports on twitter 2015
Airports on twitter 2015Airports on twitter 2015
Airports on twitter 2015
 
All About Twitter!
All About Twitter!All About Twitter!
All About Twitter!
 
Twitter Presentation
Twitter PresentationTwitter Presentation
Twitter Presentation
 
Event Analysis on the 2016 U.S. Presidential Election Using Social Media
Event Analysis on the 2016 U.S. Presidential Election Using Social MediaEvent Analysis on the 2016 U.S. Presidential Election Using Social Media
Event Analysis on the 2016 U.S. Presidential Election Using Social Media
 
Science of @Twitter
Science of @TwitterScience of @Twitter
Science of @Twitter
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
 
The Value of Twitter
The Value of TwitterThe Value of Twitter
The Value of Twitter
 
Bsm twitter
Bsm twitterBsm twitter
Bsm twitter
 
Hyper Analytics for Social Media Blogworld 2010 by PeopleBrowsr and Klout
Hyper Analytics for Social Media Blogworld 2010 by PeopleBrowsr and KloutHyper Analytics for Social Media Blogworld 2010 by PeopleBrowsr and Klout
Hyper Analytics for Social Media Blogworld 2010 by PeopleBrowsr and Klout
 

Viewers also liked

NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...
NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...
NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...Rizwan Habib
 
Building Tooling And Culture Together
Building Tooling And Culture TogetherBuilding Tooling And Culture Together
Building Tooling And Culture TogetherNishan Subedi
 
NYAI #7 - Using Data Science to Operationalize Machine Learning by Matthew Ru...
NYAI #7 - Using Data Science to Operationalize Machine Learning by Matthew Ru...NYAI #7 - Using Data Science to Operationalize Machine Learning by Matthew Ru...
NYAI #7 - Using Data Science to Operationalize Machine Learning by Matthew Ru...Rizwan Habib
 
NYAI #5 - Fun With Neural Nets by Jason Yosinski
NYAI #5 - Fun With Neural Nets by Jason YosinskiNYAI #5 - Fun With Neural Nets by Jason Yosinski
NYAI #5 - Fun With Neural Nets by Jason YosinskiRizwan Habib
 
NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...
NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...
NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...Rizwan Habib
 
NYAI #9: Concepts and Questions As Programs by Brenden Lake
NYAI #9: Concepts and Questions As Programs by Brenden LakeNYAI #9: Concepts and Questions As Programs by Brenden Lake
NYAI #9: Concepts and Questions As Programs by Brenden LakeRizwan Habib
 
NYAI - Understanding Music Through Machine Learning by Brian McFee
NYAI - Understanding Music Through Machine Learning by Brian McFeeNYAI - Understanding Music Through Machine Learning by Brian McFee
NYAI - Understanding Music Through Machine Learning by Brian McFeeRizwan Habib
 
Virtual Madness @ Etsy
Virtual Madness @ EtsyVirtual Madness @ Etsy
Virtual Madness @ EtsyNishan Subedi
 
NYAI - Commodity Machine Learning & Beyond by Andreas Mueller
NYAI - Commodity Machine Learning & Beyond by Andreas MuellerNYAI - Commodity Machine Learning & Beyond by Andreas Mueller
NYAI - Commodity Machine Learning & Beyond by Andreas MuellerRizwan Habib
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learnodsc
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMatthew Russell
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebMatthew Russell
 
Lessons Learned from Running Hundreds of Kaggle Competitions
Lessons Learned from Running Hundreds of Kaggle CompetitionsLessons Learned from Running Hundreds of Kaggle Competitions
Lessons Learned from Running Hundreds of Kaggle CompetitionsBen Hamner
 
What convnets look at when they look at nudity
What convnets look at when they look at nudityWhat convnets look at when they look at nudity
What convnets look at when they look at nudityRyan Compton
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
 
NYAI - Intersection of neuroscience and deep learning by Russell Hanson
NYAI - Intersection of neuroscience and deep learning by Russell HansonNYAI - Intersection of neuroscience and deep learning by Russell Hanson
NYAI - Intersection of neuroscience and deep learning by Russell HansonRizwan Habib
 
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Matthew Russell
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
NYAI - Visualizing what makes neural networks actually work by Matthew Zeiler
NYAI - Visualizing what makes neural networks actually work by Matthew ZeilerNYAI - Visualizing what makes neural networks actually work by Matthew Zeiler
NYAI - Visualizing what makes neural networks actually work by Matthew ZeilerRizwan Habib
 
NYAI - Interactive Machine Learning by Daniel Hsu
NYAI - Interactive Machine Learning by Daniel HsuNYAI - Interactive Machine Learning by Daniel Hsu
NYAI - Interactive Machine Learning by Daniel HsuRizwan Habib
 

Viewers also liked (20)

NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...
NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...
NYAI #7 - Top-down vs. Bottom-up Computational Creativity by Dr. Cole D. Ingr...
 
Building Tooling And Culture Together
Building Tooling And Culture TogetherBuilding Tooling And Culture Together
Building Tooling And Culture Together
 
NYAI #7 - Using Data Science to Operationalize Machine Learning by Matthew Ru...
NYAI #7 - Using Data Science to Operationalize Machine Learning by Matthew Ru...NYAI #7 - Using Data Science to Operationalize Machine Learning by Matthew Ru...
NYAI #7 - Using Data Science to Operationalize Machine Learning by Matthew Ru...
 
NYAI #5 - Fun With Neural Nets by Jason Yosinski
NYAI #5 - Fun With Neural Nets by Jason YosinskiNYAI #5 - Fun With Neural Nets by Jason Yosinski
NYAI #5 - Fun With Neural Nets by Jason Yosinski
 
NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...
NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...
NYAI #8 - HOLIDAY PARTY + NYC AI OVERVIEW with NYC's Chief Digital Officer Sr...
 
NYAI #9: Concepts and Questions As Programs by Brenden Lake
NYAI #9: Concepts and Questions As Programs by Brenden LakeNYAI #9: Concepts and Questions As Programs by Brenden Lake
NYAI #9: Concepts and Questions As Programs by Brenden Lake
 
NYAI - Understanding Music Through Machine Learning by Brian McFee
NYAI - Understanding Music Through Machine Learning by Brian McFeeNYAI - Understanding Music Through Machine Learning by Brian McFee
NYAI - Understanding Music Through Machine Learning by Brian McFee
 
Virtual Madness @ Etsy
Virtual Madness @ EtsyVirtual Madness @ Etsy
Virtual Madness @ Etsy
 
NYAI - Commodity Machine Learning & Beyond by Andreas Mueller
NYAI - Commodity Machine Learning & Beyond by Andreas MuellerNYAI - Commodity Machine Learning & Beyond by Andreas Mueller
NYAI - Commodity Machine Learning & Beyond by Andreas Mueller
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started Guide
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social Web
 
Lessons Learned from Running Hundreds of Kaggle Competitions
Lessons Learned from Running Hundreds of Kaggle CompetitionsLessons Learned from Running Hundreds of Kaggle Competitions
Lessons Learned from Running Hundreds of Kaggle Competitions
 
What convnets look at when they look at nudity
What convnets look at when they look at nudityWhat convnets look at when they look at nudity
What convnets look at when they look at nudity
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)
 
NYAI - Intersection of neuroscience and deep learning by Russell Hanson
NYAI - Intersection of neuroscience and deep learning by Russell HansonNYAI - Intersection of neuroscience and deep learning by Russell Hanson
NYAI - Intersection of neuroscience and deep learning by Russell Hanson
 
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
NYAI - Visualizing what makes neural networks actually work by Matthew Zeiler
NYAI - Visualizing what makes neural networks actually work by Matthew ZeilerNYAI - Visualizing what makes neural networks actually work by Matthew Zeiler
NYAI - Visualizing what makes neural networks actually work by Matthew Zeiler
 
NYAI - Interactive Machine Learning by Daniel Hsu
NYAI - Interactive Machine Learning by Daniel HsuNYAI - Interactive Machine Learning by Daniel Hsu
NYAI - Interactive Machine Learning by Daniel Hsu
 

Similar to Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Why Twitter Is All the Rage: A Data Miner's Perspective
Why Twitter Is All the Rage: A Data Miner's PerspectiveWhy Twitter Is All the Rage: A Data Miner's Perspective
Why Twitter Is All the Rage: A Data Miner's PerspectiveMatthew Russell
 
Rob Procter
Rob ProcterRob Procter
Rob ProcterNSMNSS
 
Jan 2010 Twitter Effectiveness Preso
Jan 2010 Twitter Effectiveness PresoJan 2010 Twitter Effectiveness Preso
Jan 2010 Twitter Effectiveness PresoHack the Hood
 
Let’s hunt the target using OSINT
Let’s hunt the target using OSINTLet’s hunt the target using OSINT
Let’s hunt the target using OSINTChandrapal Badshah
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social mediaFarida Vis
 
Jumping on the Twitter Bandwagon
Jumping on the Twitter BandwagonJumping on the Twitter Bandwagon
Jumping on the Twitter BandwagonBritta Krabill
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
Thou Shalt not Share Collections of Tweets: Should we give a TOS?
Thou Shalt not Share Collections of Tweets: Should we give a TOS?Thou Shalt not Share Collections of Tweets: Should we give a TOS?
Thou Shalt not Share Collections of Tweets: Should we give a TOS?Andrew Long
 
Social Media Data Collection & Network Analysis with Netlytic and R
Social Media Data Collection & Network Analysis with Netlytic and R Social Media Data Collection & Network Analysis with Netlytic and R
Social Media Data Collection & Network Analysis with Netlytic and R Toronto Metropolitan University
 
20100327 Fewa Presented Slides
20100327 Fewa Presented Slides20100327 Fewa Presented Slides
20100327 Fewa Presented SlidesJohn Larkin
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaArjen de Vries
 
Navca Twitter
Navca TwitterNavca Twitter
Navca TwitterLasa UK
 
Top free social_tools_june2011v2
Top free social_tools_june2011v2Top free social_tools_june2011v2
Top free social_tools_june2011v2Ken Sickles
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitterKatrin Weller
 

Similar to Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014) (20)

Why Twitter Is All the Rage: A Data Miner's Perspective
Why Twitter Is All the Rage: A Data Miner's PerspectiveWhy Twitter Is All the Rage: A Data Miner's Perspective
Why Twitter Is All the Rage: A Data Miner's Perspective
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
 
Jan 2010 Twitter Effectiveness Preso
Jan 2010 Twitter Effectiveness PresoJan 2010 Twitter Effectiveness Preso
Jan 2010 Twitter Effectiveness Preso
 
Twitter
TwitterTwitter
Twitter
 
Let’s hunt the target using OSINT
Let’s hunt the target using OSINTLet’s hunt the target using OSINT
Let’s hunt the target using OSINT
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social media
 
Jumping on the Twitter Bandwagon
Jumping on the Twitter BandwagonJumping on the Twitter Bandwagon
Jumping on the Twitter Bandwagon
 
Twitter
TwitterTwitter
Twitter
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Engaging The Conversation, Presented to the Fort Bend Chamber
Engaging The Conversation, Presented to the Fort Bend ChamberEngaging The Conversation, Presented to the Fort Bend Chamber
Engaging The Conversation, Presented to the Fort Bend Chamber
 
Thou Shalt not Share Collections of Tweets: Should we give a TOS?
Thou Shalt not Share Collections of Tweets: Should we give a TOS?Thou Shalt not Share Collections of Tweets: Should we give a TOS?
Thou Shalt not Share Collections of Tweets: Should we give a TOS?
 
Social Media Data Collection & Network Analysis with Netlytic and R
Social Media Data Collection & Network Analysis with Netlytic and R Social Media Data Collection & Network Analysis with Netlytic and R
Social Media Data Collection & Network Analysis with Netlytic and R
 
20100327 Fewa Presented Slides
20100327 Fewa Presented Slides20100327 Fewa Presented Slides
20100327 Fewa Presented Slides
 
New tools twitter
New tools twitterNew tools twitter
New tools twitter
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
 
Navca Twitter
Navca TwitterNavca Twitter
Navca Twitter
 
HR Tech Conference: #hrtechconf Twitterversity
HR Tech Conference: #hrtechconf TwitterversityHR Tech Conference: #hrtechconf Twitterversity
HR Tech Conference: #hrtechconf Twitterversity
 
Real-Time Web; Trending Social Data
Real-Time Web; Trending Social DataReal-Time Web; Trending Social Data
Real-Time Web; Trending Social Data
 
Top free social_tools_june2011v2
Top free social_tools_june2011v2Top free social_tools_june2011v2
Top free social_tools_june2011v2
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitter
 

Recently uploaded

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

  • 1. 1 Why Twitter Is All The Rage: A Data Miner's Perspective Matthew A. Russell - @ptwobrussell - http://MiningTheSocialWeb.com PyTN - 23 February 2014
  • 2. 2 Overview Intro Twitter as a Platform for Data Science Applications of Firehose Analysis (#Syria circa last) Understanding the Amazon Prime Air Reaction (IPython Notebook Walk Through) Q&A
  • 4. 4 Hello, My Name Is ... Matthew Background in Computer Science Data mining & machine learning CTO @ Digital Reasoning Systems Data mining; machine learning Author @ O'Reilly Media 5 published books on technology Principal @ Zaffra Selective boutique consulting
  • 5. 5 Transforming Curiosity Into Insight An open source software (OSS) project http://bit.ly/MiningTheSocialWeb2E A book http://bit.ly/135dHfs Accessible to (virtually) everyone Virtual machine with turn-key coding templates for data science experiments Think of the book as "premium" support for the OSS project
  • 6. 6 Mining the Social Web ToC Chapter 1 - Mining Twitter Chapter 2 - Mining Facebook Chapter 3 - Mining LinkedIn Chapter 4 - Mining Google+ Chapter 5 - Mining Web Pages Chapter 6 - Mining Mailboxes Chapter 7 - Mining GitHub Chapter 8 - Mining the Semantically Marked-Up Web Chapter 9 - Twitter Cookbook
  • 7. 7 Anatomy of Each Chapter Brief Intro Objectives API Primer Analysis Technique(s) Data Visualization Recap Suggested Exercises Recommended Resources
  • 8. 8 Opportunities for Data Alchemy A model for the world: signal and sinks Growth in data exhaust is accelerating Digital fingerprints of the "real world" are accumulating Lots of opportunities for motivated Python hackers "Software is eating the world"
  • 9. 9 Social Media Is All the Rage World population: 7B people Facebook: 1B+ users Twitter: 650M users Google+ 500M users LinkedIn: 260M users 250M+ blogs (conservatively?)
  • 10. 10 But what does it all mean, Basil? It's a platform for data science and the frontier for predictive analytics Understanding world events Swaying political elections Modeling human behavior Analyzing sentiment Making intelligent recommendations
  • 11. 11 Twitter & Data Science
  • 12. 12 Data Science Data => Actionable information Highly interdisciplinary Nascent Necessary http://wikipedia.org/wiki/Data_science
  • 13. 13 Another View of Data Science
  • 14. 14
  • 15. 15 Twitter Is All the Rage It satisfies fundamental human desires We want to be heard We want to satisfy our curiosity We want it easy We want it now Accessible, rich, and (mostly) "open" data RESTful APIs and JSON responses Great proving ground for predictive analytics about the real world
  • 16. 16 Twitter's Network Dynamics ~650M curious users A collective consciousness Real-time communication Short, sweet, ... and fast Asymmetric Following Model An interest graph
  • 17. 17 Twitter Primitives Accounts Types: "Anything" "Following" Relationships Favorites Retweets Replies (Almost) No Privacy Controls
  • 18. 18 Twitter and Facebook Compared Twitter Facebook Accounts Types: "Anything" Accounts Types: People & Pages "Following" Relationships Mutual Connections Favorites "Likes" Retweets "Shares" Replies "Comments" (Almost) No Privacy Controls Extensive Privacy Controls
  • 19. 19 What's in a Tweet? 140 Characters ... ... Plus ~5KB of metadata! Authorship Time & location Tweet "entities" Replying, retweeting, favoriting, etc.
  • 20. 20 What are Tweet Entities? Essentially, the "easy to get at" data in the 140 characters @usermentions #hashtags URLs multiple variations (financial) symbols stock tickers media
  • 21. 21 API Requests RESTful requests Everything is a "resource" You GET, PUT, POST, and DELETE resources Standard HTTP "verbs" Example: GET https://api.twitter.com/1.1/statuses/user_timeline.json? screen_name=SocialWebMining Streaming API filters JSON responses Cursors (not quite pagination)
  • 22. 22 Data Mining: Low Hanging Fruit "Know thy data..." Start with simple stats: Count Compare Filter Rank Then, apply more complex analyses
  • 23. 23 A Starting Point: Histograms A chart that is handy for frequency analysis They look like bar charts...except they're not bar charts Each value on the x-axis is a range (or "bin") of values Not categorical data Each value on the y-axis is the combined frequency of values in each range
  • 27. 27 A (Social) Interest Graph U2 Roberto Mercedes Juan Luis Luís Guerra Ana Jorge Nina
  • 28. 28 A (Political) Interest Graph Johnny Araya Roberto Mercedes Rodolfo Hernández Ana Jorge Nina
  • 29. 29 Measuring Influence Is Tricker Than It Looks Spam bot accounts that effectively are zombies and can’t be harnessed for any utility at all Inactive or abandoned accounts that can’t influence or be influenced since they are not in use Accounts that follow so many other accounts that the likelihood of getting noticed (and thus influencing) is practically zero The network effects of retweets by accounts that are active and can be influenced to spread a message See also http://wp.me/p3QiJd-2a
  • 30. 30 Justin Bieber vs Tea Party
  • 31. 31 Realtime Analysis: #Syria Monitor Twitter's firehose for realtime data using filters such as #Syria Keep in mind the sheer volume of data can be considerable Fuller analysis at http://wp.me/p3QiJd-1I
  • 39. 39 #Syria: Why? That's for you (as the data scientist) to decide Quantitative automation can amplify human intelligence Qualitative analysis is still requires human intelligence
  • 41. 41 MTSW Virtual Machine Experience Goal: Make it easy to transform curiosity into insight Vagrant-based virtual machine Virtualbox or AWS IPython Notebook User Experience Point-and-click GUI 100+ turn-key examples and templates Social web mining for the masses
  • 42. 42 Social Media Analysis Framework A memorable four step process to guide data science experiments: Aspire Acquire Analyze Summarize
  • 43. 43 Goals To understand how to capture data from Twitter's firehose A understand basic pandas usage for tweets To work through a data science experiment with a systematic 4-step process To better understand the emotional reaction to the Amazon Prime Air announcement To introduce some tools for data science
  • 44. 44 Useful Links Website http://MiningTheSocialWeb.com Twitter Data Mining Round Up http://wp.me/p3QiJd-5H All Source Code in IPython Notebook format (GitHub) http://bit.ly/MiningTheSocialWeb2E