Advertisement
Advertisement

More Related Content

Advertisement

Slicing Big Data: Gambling, Twitter & Time Sensitive Information

  1. Gambling, Twitter & Time Sensitive Information IR14 - Denver,CO dp.woodford@qut.edu.au @dpwoodford Wednesday, 23 October 13
  2. FORMAT • Not going to simply repeat the paper. • I will get to the gambling (& fantasy sports) examples, but want to discuss our wider work with large datasets. • Happy to answer more specific questions about the use in the gambling industry. • Examples from Sport, TV, Gambling & Fantasy Sports. A tourde-force of current research projects Wednesday, 23 October 13
  3. DEALING WITH THE TITLE: TWITTER • Twitter => Large Data Sets, but specific research questions often require a small data set: – Australian users – Users registering on the platform during natural disasters – ‘Experts’ on Fantasy Sports – Sporting Participants: Golf, Tennis, NFL, College Football, etc.. – Reality TV ‘fanatics’ – Almost infinite examples • Goal is to get from “Big Data” to what I’ve been calling “useful data” Wednesday, 23 October 13
  4. DEALING WITH THE TITLE: GAMBLING • Long term interest in the gambling industry (one case study in my prior work on games). • Many parallels between Gambling and Fantasy Sports (another current research project). • When I was an ‘active participant’, Twitter was just becoming popular (2006-2010). • It quickly became a crucial source of information, and websites started aggregating it. Wednesday, 23 October 13
  5. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  6. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  7. DEALING WITH THE TITLE: TIME SENSITIVE INFORMATION • Lines move incredibly fast: Just as much a market as day-trading on the stock exchange Wednesday, 23 October 13
  8. WHY IS DATA SLICED? • Streaming API is limited to ~1% of total tweets per second & Firehose access is expensive. • Large data sets are not easily malleable, or visually analyzed (e.g. with Tableau): – Our database of Twitter users is ~3.7TB, and growing. – A weeks worth of selected TV data (current US shows) in JSON format is 750MB, and 600MB in TSV (selected fields). And millions of rows. • Analyzing large data sets is slow, if it’s even possible => “Usable Data” Wednesday, 23 October 13
  9. HOW IS DATA SLICED: COMPULSORY Wednesday, 23 October 13
  10. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- WTA Wednesday, 23 October 13
  11. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS Wednesday, 23 October 13
  12. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS CLIP  FROM  YAHOO  FANTASY  FOOTBALL  RE:  CALVIN  JOHNSON  INJURY  &   TWITTER  REPORTS Wednesday, 23 October 13
  13. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  14. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  15. HOW IS DATA SLICED: RANDOM SAMPLING Source:  Tony  Hirst  (Open  University  UK) Wednesday, 23 October 13
  16. BUT SOMETIMES YOU NEED THE FULL SAMPLE & REPEATED CAPTURE Source:  Bruns  /  Woodford  [Mapping  Online  Publics] Wednesday, 23 October 13
  17. HOW IS DATA SLICED: ONLY A SMALL SAMPLE MATTERS Floods,  Earthquake,  Tsunami Media  Coverage Wednesday, 23 October 13
  18. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Impact  of  Live  Feed Wednesday, 23 October 13
  19. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Wednesday, 23 October 13
  20. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Delayed  TV  sucks Wednesday, 23 October 13
  21. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Most active (#BB15, #BBLF) users often defend a HM to the death (akin to sporting tribalism), but most users are attackers (forthcoming paper w/ Katie Prowd) Disclaimer:  Scale  changed  to  fit  on  slide Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  22. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  23. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  24. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  25. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Twitter closed these quickly, yet the BB15 accounts remained active for much of the season... Wednesday, 23 October 13
  26. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  27. AND A QUICK NOTE ON NON-TWITTER ANALYTICS • There’s lots of data out there, but it needs to be sliced to be usable. • You can work with large, original, data sets, but often this adds extra complexity that isn’t necessary to answer your research questions. • But don’t delete the data you don’t need! Wednesday, 23 October 13
  28. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  29. ACKNOWLEDGEMENTS • ARC Centre for Excellence in Creative Industries and Innovation (CCI) - http://www.cci.edu.au & http:// www.mappingonlinepublics.net • Social Media Research Group -- http:// socialmedia.qut.edu.au • Queensland University of Technology Wednesday, 23 October 13
Advertisement