Successfully reported this slideshow.

Slicing Big Data: Gambling, Twitter & Time Sensitive Information

4

Share

Upcoming SlideShare
Anticipatory Intelligence
Anticipatory Intelligence
Loading in …3
×
1 of 29
1 of 29

Slicing Big Data: Gambling, Twitter & Time Sensitive Information

4

Share

Download to read offline

Presented at the Internet Researchers conference in Denver, CO -- 26 October 2013. Discusses Gambling, Reality TV, and World Events in the Context of Twitter Data, and selecting usable data from big data.

Presented at the Internet Researchers conference in Denver, CO -- 26 October 2013. Discusses Gambling, Reality TV, and World Events in the Context of Twitter Data, and selecting usable data from big data.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Slicing Big Data: Gambling, Twitter & Time Sensitive Information

  1. 1. Gambling, Twitter & Time Sensitive Information IR14 - Denver,CO dp.woodford@qut.edu.au @dpwoodford Wednesday, 23 October 13
  2. 2. FORMAT • Not going to simply repeat the paper. • I will get to the gambling (& fantasy sports) examples, but want to discuss our wider work with large datasets. • Happy to answer more specific questions about the use in the gambling industry. • Examples from Sport, TV, Gambling & Fantasy Sports. A tourde-force of current research projects Wednesday, 23 October 13
  3. 3. DEALING WITH THE TITLE: TWITTER • Twitter => Large Data Sets, but specific research questions often require a small data set: – Australian users – Users registering on the platform during natural disasters – ‘Experts’ on Fantasy Sports – Sporting Participants: Golf, Tennis, NFL, College Football, etc.. – Reality TV ‘fanatics’ – Almost infinite examples • Goal is to get from “Big Data” to what I’ve been calling “useful data” Wednesday, 23 October 13
  4. 4. DEALING WITH THE TITLE: GAMBLING • Long term interest in the gambling industry (one case study in my prior work on games). • Many parallels between Gambling and Fantasy Sports (another current research project). • When I was an ‘active participant’, Twitter was just becoming popular (2006-2010). • It quickly became a crucial source of information, and websites started aggregating it. Wednesday, 23 October 13
  5. 5. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  6. 6. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  7. 7. DEALING WITH THE TITLE: TIME SENSITIVE INFORMATION • Lines move incredibly fast: Just as much a market as day-trading on the stock exchange Wednesday, 23 October 13
  8. 8. WHY IS DATA SLICED? • Streaming API is limited to ~1% of total tweets per second & Firehose access is expensive. • Large data sets are not easily malleable, or visually analyzed (e.g. with Tableau): – Our database of Twitter users is ~3.7TB, and growing. – A weeks worth of selected TV data (current US shows) in JSON format is 750MB, and 600MB in TSV (selected fields). And millions of rows. • Analyzing large data sets is slow, if it’s even possible => “Usable Data” Wednesday, 23 October 13
  9. 9. HOW IS DATA SLICED: COMPULSORY Wednesday, 23 October 13
  10. 10. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- WTA Wednesday, 23 October 13
  11. 11. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS Wednesday, 23 October 13
  12. 12. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS CLIP  FROM  YAHOO  FANTASY  FOOTBALL  RE:  CALVIN  JOHNSON  INJURY  &   TWITTER  REPORTS Wednesday, 23 October 13
  13. 13. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  14. 14. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  15. 15. HOW IS DATA SLICED: RANDOM SAMPLING Source:  Tony  Hirst  (Open  University  UK) Wednesday, 23 October 13
  16. 16. BUT SOMETIMES YOU NEED THE FULL SAMPLE & REPEATED CAPTURE Source:  Bruns  /  Woodford  [Mapping  Online  Publics] Wednesday, 23 October 13
  17. 17. HOW IS DATA SLICED: ONLY A SMALL SAMPLE MATTERS Floods,  Earthquake,  Tsunami Media  Coverage Wednesday, 23 October 13
  18. 18. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Impact  of  Live  Feed Wednesday, 23 October 13
  19. 19. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Wednesday, 23 October 13
  20. 20. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Delayed  TV  sucks Wednesday, 23 October 13
  21. 21. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Most active (#BB15, #BBLF) users often defend a HM to the death (akin to sporting tribalism), but most users are attackers (forthcoming paper w/ Katie Prowd) Disclaimer:  Scale  changed  to  fit  on  slide Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  22. 22. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  23. 23. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  24. 24. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  25. 25. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Twitter closed these quickly, yet the BB15 accounts remained active for much of the season... Wednesday, 23 October 13
  26. 26. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  27. 27. AND A QUICK NOTE ON NON-TWITTER ANALYTICS • There’s lots of data out there, but it needs to be sliced to be usable. • You can work with large, original, data sets, but often this adds extra complexity that isn’t necessary to answer your research questions. • But don’t delete the data you don’t need! Wednesday, 23 October 13
  28. 28. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  29. 29. ACKNOWLEDGEMENTS • ARC Centre for Excellence in Creative Industries and Innovation (CCI) - http://www.cci.edu.au & http:// www.mappingonlinepublics.net • Social Media Research Group -- http:// socialmedia.qut.edu.au • Queensland University of Technology Wednesday, 23 October 13

×