Gambling, Twitter & Time
Sensitive Information
IR14 - Denver,CO
dp.woodford@qut.edu.au
@dpwoodford
Wednesday, 23 October 1...
FORMAT
• Not going to simply repeat the paper.
• I will get to the gambling (& fantasy sports) examples, but want
to discu...
DEALING WITH THE TITLE: TWITTER
• Twitter => Large Data Sets, but specific research
questions often require a small data s...
DEALING WITH THE TITLE: GAMBLING
• Long term interest in the gambling industry (one case
study in my prior work on games)....
DEALING WITH THE TITLE: GAMBLING

Wednesday, 23 October 13
DEALING WITH THE TITLE: GAMBLING

Wednesday, 23 October 13
DEALING WITH THE TITLE: TIME SENSITIVE
INFORMATION

• Lines move incredibly fast: Just
as much a market as day-trading
on ...
WHY IS DATA SLICED?
• Streaming API is limited to ~1% of total tweets per second
& Firehose access is expensive.
• Large d...
HOW IS DATA SLICED: COMPULSORY

Wednesday, 23 October 13
HOW IS DATA SLICED: SELECTING FOR
AUTHENTICITY -- WTA

Wednesday, 23 October 13
HOW IS DATA SLICED: SELECTING FOR
AUTHENTICITY -- FANTASY SPORTS

Wednesday, 23 October 13
HOW IS DATA SLICED: SELECTING FOR
AUTHENTICITY -- FANTASY SPORTS
CLIP	
  FROM	
  YAHOO	
  FANTASY	
  FOOTBALL	
  RE:	
  CA...
BUT YOU STILL NEED A SANITY CHECK

Wednesday, 23 October 13
BUT YOU STILL NEED A SANITY CHECK

Wednesday, 23 October 13
HOW IS DATA SLICED: RANDOM SAMPLING

Source:	
  Tony	
  Hirst	
  (Open	
  University	
  UK)

Wednesday, 23 October 13
BUT SOMETIMES YOU NEED THE FULL
SAMPLE & REPEATED CAPTURE

Source:	
  Bruns	
  /	
  Woodford	
  [Mapping	
  Online	
  Publ...
HOW IS DATA SLICED: ONLY A SMALL
SAMPLE MATTERS

Floods,	
  Earthquake,	
  Tsunami

Media	
  Coverage

Wednesday, 23 Octob...
HOW IS DATA SLICED: TV -- SEASONAL DATA
VS EPISODIC

Impact	
  of	
  Live	
  Feed

Wednesday, 23 October 13
HOW IS DATA SLICED: TV -- SEASONAL DATA
VS EPISODIC

Wednesday, 23 October 13
HOW IS DATA SLICED: TV -- SEASONAL DATA
VS EPISODIC

Delayed	
  TV	
  sucks

Wednesday, 23 October 13
HOW IS DATA SLICED: MOST ACTIVE ≠
REPRESENTATIVE
• Most active (#BB15, #BBLF) users often defend a HM to
the death (akin t...
TIME SLICES OF TWEET CONTENT IS
ENLIGHTENING

Source:	
  Woodford	
  /	
  Prowd	
  [Fan	
  Cultures	
  and	
  Hatred	
  in...
TIME SLICES OF TWEET CONTENT IS
ENLIGHTENING

Source:	
  Woodford	
  /	
  Prowd	
  [Fan	
  Cultures	
  and	
  Hatred	
  in...
HOW IS DATA SLICED: MOST ACTIVE ≠
REPRESENTATIVE

Source:	
  Woodford	
  /	
  Prowd	
  [Fan	
  Cultures	
  and	
  Hatred	
...
HOW IS DATA SLICED: MOST ACTIVE ≠
REPRESENTATIVE

• Twitter closed these quickly, yet the BB15 accounts
remained active fo...
AND A QUICK NOTE ON NON-TWITTER
ANALYTICS

Wednesday, 23 October 13
AND A QUICK NOTE ON NON-TWITTER
ANALYTICS
• There’s lots of data out there,
but it needs to be sliced to be
usable.
• You ...
AND A QUICK NOTE ON NON-TWITTER
ANALYTICS

Wednesday, 23 October 13
ACKNOWLEDGEMENTS
• ARC Centre for Excellence in Creative Industries and
Innovation (CCI) - http://www.cci.edu.au & http://...
Upcoming SlideShare
Loading in …5
×

Slicing Big Data: Gambling, Twitter & Time Sensitive Information

1,394 views

Published on

Presented at the Internet Researchers conference in Denver, CO -- 26 October 2013. Discusses Gambling, Reality TV, and World Events in the Context of Twitter Data, and selecting usable data from big data.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,394
On SlideShare
0
From Embeds
0
Number of Embeds
464
Actions
Shares
0
Downloads
7
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Slicing Big Data: Gambling, Twitter & Time Sensitive Information

  1. 1. Gambling, Twitter & Time Sensitive Information IR14 - Denver,CO dp.woodford@qut.edu.au @dpwoodford Wednesday, 23 October 13
  2. 2. FORMAT • Not going to simply repeat the paper. • I will get to the gambling (& fantasy sports) examples, but want to discuss our wider work with large datasets. • Happy to answer more specific questions about the use in the gambling industry. • Examples from Sport, TV, Gambling & Fantasy Sports. A tourde-force of current research projects Wednesday, 23 October 13
  3. 3. DEALING WITH THE TITLE: TWITTER • Twitter => Large Data Sets, but specific research questions often require a small data set: – Australian users – Users registering on the platform during natural disasters – ‘Experts’ on Fantasy Sports – Sporting Participants: Golf, Tennis, NFL, College Football, etc.. – Reality TV ‘fanatics’ – Almost infinite examples • Goal is to get from “Big Data” to what I’ve been calling “useful data” Wednesday, 23 October 13
  4. 4. DEALING WITH THE TITLE: GAMBLING • Long term interest in the gambling industry (one case study in my prior work on games). • Many parallels between Gambling and Fantasy Sports (another current research project). • When I was an ‘active participant’, Twitter was just becoming popular (2006-2010). • It quickly became a crucial source of information, and websites started aggregating it. Wednesday, 23 October 13
  5. 5. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  6. 6. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  7. 7. DEALING WITH THE TITLE: TIME SENSITIVE INFORMATION • Lines move incredibly fast: Just as much a market as day-trading on the stock exchange Wednesday, 23 October 13
  8. 8. WHY IS DATA SLICED? • Streaming API is limited to ~1% of total tweets per second & Firehose access is expensive. • Large data sets are not easily malleable, or visually analyzed (e.g. with Tableau): – Our database of Twitter users is ~3.7TB, and growing. – A weeks worth of selected TV data (current US shows) in JSON format is 750MB, and 600MB in TSV (selected fields). And millions of rows. • Analyzing large data sets is slow, if it’s even possible => “Usable Data” Wednesday, 23 October 13
  9. 9. HOW IS DATA SLICED: COMPULSORY Wednesday, 23 October 13
  10. 10. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- WTA Wednesday, 23 October 13
  11. 11. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS Wednesday, 23 October 13
  12. 12. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS CLIP  FROM  YAHOO  FANTASY  FOOTBALL  RE:  CALVIN  JOHNSON  INJURY  &   TWITTER  REPORTS Wednesday, 23 October 13
  13. 13. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  14. 14. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  15. 15. HOW IS DATA SLICED: RANDOM SAMPLING Source:  Tony  Hirst  (Open  University  UK) Wednesday, 23 October 13
  16. 16. BUT SOMETIMES YOU NEED THE FULL SAMPLE & REPEATED CAPTURE Source:  Bruns  /  Woodford  [Mapping  Online  Publics] Wednesday, 23 October 13
  17. 17. HOW IS DATA SLICED: ONLY A SMALL SAMPLE MATTERS Floods,  Earthquake,  Tsunami Media  Coverage Wednesday, 23 October 13
  18. 18. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Impact  of  Live  Feed Wednesday, 23 October 13
  19. 19. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Wednesday, 23 October 13
  20. 20. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Delayed  TV  sucks Wednesday, 23 October 13
  21. 21. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Most active (#BB15, #BBLF) users often defend a HM to the death (akin to sporting tribalism), but most users are attackers (forthcoming paper w/ Katie Prowd) Disclaimer:  Scale  changed  to  fit  on  slide Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  22. 22. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  23. 23. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  24. 24. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  25. 25. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Twitter closed these quickly, yet the BB15 accounts remained active for much of the season... Wednesday, 23 October 13
  26. 26. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  27. 27. AND A QUICK NOTE ON NON-TWITTER ANALYTICS • There’s lots of data out there, but it needs to be sliced to be usable. • You can work with large, original, data sets, but often this adds extra complexity that isn’t necessary to answer your research questions. • But don’t delete the data you don’t need! Wednesday, 23 October 13
  28. 28. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  29. 29. ACKNOWLEDGEMENTS • ARC Centre for Excellence in Creative Industries and Innovation (CCI) - http://www.cci.edu.au & http:// www.mappingonlinepublics.net • Social Media Research Group -- http:// socialmedia.qut.edu.au • Queensland University of Technology Wednesday, 23 October 13

×