Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Gambling, Twitter & Time
Sensitive Information
IR14 - Denver,CO
dp.woodford@qut.edu.au
@dpwoodford
Wednesday, 23 October 1...
FORMAT
• Not going to simply repeat the paper.
• I will get to the gambling (& fantasy sports) examples, but want
to discu...
DEALING WITH THE TITLE: TWITTER
• Twitter => Large Data Sets, but specific research
questions often require a small data s...
DEALING WITH THE TITLE: GAMBLING
• Long term interest in the gambling industry (one case
study in my prior work on games)....
DEALING WITH THE TITLE: GAMBLING

Wednesday, 23 October 13
DEALING WITH THE TITLE: GAMBLING

Wednesday, 23 October 13
DEALING WITH THE TITLE: TIME SENSITIVE
INFORMATION

• Lines move incredibly fast: Just
as much a market as day-trading
on ...
WHY IS DATA SLICED?
• Streaming API is limited to ~1% of total tweets per second
& Firehose access is expensive.
• Large d...
HOW IS DATA SLICED: COMPULSORY

Wednesday, 23 October 13
HOW IS DATA SLICED: SELECTING FOR
AUTHENTICITY -- WTA

Wednesday, 23 October 13
HOW IS DATA SLICED: SELECTING FOR
AUTHENTICITY -- FANTASY SPORTS

Wednesday, 23 October 13
HOW IS DATA SLICED: SELECTING FOR
AUTHENTICITY -- FANTASY SPORTS
CLIP	
  FROM	
  YAHOO	
  FANTASY	
  FOOTBALL	
  RE:	
  CA...
BUT YOU STILL NEED A SANITY CHECK

Wednesday, 23 October 13
BUT YOU STILL NEED A SANITY CHECK

Wednesday, 23 October 13
HOW IS DATA SLICED: RANDOM SAMPLING

Source:	
  Tony	
  Hirst	
  (Open	
  University	
  UK)

Wednesday, 23 October 13
BUT SOMETIMES YOU NEED THE FULL
SAMPLE & REPEATED CAPTURE

Source:	
  Bruns	
  /	
  Woodford	
  [Mapping	
  Online	
  Publ...
HOW IS DATA SLICED: ONLY A SMALL
SAMPLE MATTERS

Floods,	
  Earthquake,	
  Tsunami

Media	
  Coverage

Wednesday, 23 Octob...
HOW IS DATA SLICED: TV -- SEASONAL DATA
VS EPISODIC

Impact	
  of	
  Live	
  Feed

Wednesday, 23 October 13
HOW IS DATA SLICED: TV -- SEASONAL DATA
VS EPISODIC

Wednesday, 23 October 13
HOW IS DATA SLICED: TV -- SEASONAL DATA
VS EPISODIC

Delayed	
  TV	
  sucks

Wednesday, 23 October 13
HOW IS DATA SLICED: MOST ACTIVE ≠
REPRESENTATIVE
• Most active (#BB15, #BBLF) users often defend a HM to
the death (akin t...
TIME SLICES OF TWEET CONTENT IS
ENLIGHTENING

Source:	
  Woodford	
  /	
  Prowd	
  [Fan	
  Cultures	
  and	
  Hatred	
  in...
TIME SLICES OF TWEET CONTENT IS
ENLIGHTENING

Source:	
  Woodford	
  /	
  Prowd	
  [Fan	
  Cultures	
  and	
  Hatred	
  in...
HOW IS DATA SLICED: MOST ACTIVE ≠
REPRESENTATIVE

Source:	
  Woodford	
  /	
  Prowd	
  [Fan	
  Cultures	
  and	
  Hatred	
...
HOW IS DATA SLICED: MOST ACTIVE ≠
REPRESENTATIVE

• Twitter closed these quickly, yet the BB15 accounts
remained active fo...
AND A QUICK NOTE ON NON-TWITTER
ANALYTICS

Wednesday, 23 October 13
AND A QUICK NOTE ON NON-TWITTER
ANALYTICS
• There’s lots of data out there,
but it needs to be sliced to be
usable.
• You ...
AND A QUICK NOTE ON NON-TWITTER
ANALYTICS

Wednesday, 23 October 13
ACKNOWLEDGEMENTS
• ARC Centre for Excellence in Creative Industries and
Innovation (CCI) - http://www.cci.edu.au & http://...
Upcoming SlideShare
Loading in …5
×

Slicing Big Data: Gambling, Twitter & Time Sensitive Information

1,528 views

Published on

Presented at the Internet Researchers conference in Denver, CO -- 26 October 2013. Discusses Gambling, Reality TV, and World Events in the Context of Twitter Data, and selecting usable data from big data.

  • Be the first to comment

Slicing Big Data: Gambling, Twitter & Time Sensitive Information

  1. 1. Gambling, Twitter & Time Sensitive Information IR14 - Denver,CO dp.woodford@qut.edu.au @dpwoodford Wednesday, 23 October 13
  2. 2. FORMAT • Not going to simply repeat the paper. • I will get to the gambling (& fantasy sports) examples, but want to discuss our wider work with large datasets. • Happy to answer more specific questions about the use in the gambling industry. • Examples from Sport, TV, Gambling & Fantasy Sports. A tourde-force of current research projects Wednesday, 23 October 13
  3. 3. DEALING WITH THE TITLE: TWITTER • Twitter => Large Data Sets, but specific research questions often require a small data set: – Australian users – Users registering on the platform during natural disasters – ‘Experts’ on Fantasy Sports – Sporting Participants: Golf, Tennis, NFL, College Football, etc.. – Reality TV ‘fanatics’ – Almost infinite examples • Goal is to get from “Big Data” to what I’ve been calling “useful data” Wednesday, 23 October 13
  4. 4. DEALING WITH THE TITLE: GAMBLING • Long term interest in the gambling industry (one case study in my prior work on games). • Many parallels between Gambling and Fantasy Sports (another current research project). • When I was an ‘active participant’, Twitter was just becoming popular (2006-2010). • It quickly became a crucial source of information, and websites started aggregating it. Wednesday, 23 October 13
  5. 5. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  6. 6. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  7. 7. DEALING WITH THE TITLE: TIME SENSITIVE INFORMATION • Lines move incredibly fast: Just as much a market as day-trading on the stock exchange Wednesday, 23 October 13
  8. 8. WHY IS DATA SLICED? • Streaming API is limited to ~1% of total tweets per second & Firehose access is expensive. • Large data sets are not easily malleable, or visually analyzed (e.g. with Tableau): – Our database of Twitter users is ~3.7TB, and growing. – A weeks worth of selected TV data (current US shows) in JSON format is 750MB, and 600MB in TSV (selected fields). And millions of rows. • Analyzing large data sets is slow, if it’s even possible => “Usable Data” Wednesday, 23 October 13
  9. 9. HOW IS DATA SLICED: COMPULSORY Wednesday, 23 October 13
  10. 10. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- WTA Wednesday, 23 October 13
  11. 11. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS Wednesday, 23 October 13
  12. 12. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS CLIP  FROM  YAHOO  FANTASY  FOOTBALL  RE:  CALVIN  JOHNSON  INJURY  &   TWITTER  REPORTS Wednesday, 23 October 13
  13. 13. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  14. 14. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  15. 15. HOW IS DATA SLICED: RANDOM SAMPLING Source:  Tony  Hirst  (Open  University  UK) Wednesday, 23 October 13
  16. 16. BUT SOMETIMES YOU NEED THE FULL SAMPLE & REPEATED CAPTURE Source:  Bruns  /  Woodford  [Mapping  Online  Publics] Wednesday, 23 October 13
  17. 17. HOW IS DATA SLICED: ONLY A SMALL SAMPLE MATTERS Floods,  Earthquake,  Tsunami Media  Coverage Wednesday, 23 October 13
  18. 18. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Impact  of  Live  Feed Wednesday, 23 October 13
  19. 19. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Wednesday, 23 October 13
  20. 20. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Delayed  TV  sucks Wednesday, 23 October 13
  21. 21. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Most active (#BB15, #BBLF) users often defend a HM to the death (akin to sporting tribalism), but most users are attackers (forthcoming paper w/ Katie Prowd) Disclaimer:  Scale  changed  to  fit  on  slide Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  22. 22. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  23. 23. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  24. 24. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  25. 25. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Twitter closed these quickly, yet the BB15 accounts remained active for much of the season... Wednesday, 23 October 13
  26. 26. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  27. 27. AND A QUICK NOTE ON NON-TWITTER ANALYTICS • There’s lots of data out there, but it needs to be sliced to be usable. • You can work with large, original, data sets, but often this adds extra complexity that isn’t necessary to answer your research questions. • But don’t delete the data you don’t need! Wednesday, 23 October 13
  28. 28. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  29. 29. ACKNOWLEDGEMENTS • ARC Centre for Excellence in Creative Industries and Innovation (CCI) - http://www.cci.edu.au & http:// www.mappingonlinepublics.net • Social Media Research Group -- http:// socialmedia.qut.edu.au • Queensland University of Technology Wednesday, 23 October 13

×