Your SlideShare is downloading. ×
0
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Slicing Big Data: Gambling, Twitter & Time Sensitive Information
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Slicing Big Data: Gambling, Twitter & Time Sensitive Information

884

Published on

Presented at the Internet Researchers conference in Denver, CO -- 26 October 2013. Discusses Gambling, Reality TV, and World Events in the Context of Twitter Data, and selecting usable data from big …

Presented at the Internet Researchers conference in Denver, CO -- 26 October 2013. Discusses Gambling, Reality TV, and World Events in the Context of Twitter Data, and selecting usable data from big data.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
884
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
5
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Gambling, Twitter & Time Sensitive Information IR14 - Denver,CO dp.woodford@qut.edu.au @dpwoodford Wednesday, 23 October 13
  • 2. FORMAT • Not going to simply repeat the paper. • I will get to the gambling (& fantasy sports) examples, but want to discuss our wider work with large datasets. • Happy to answer more specific questions about the use in the gambling industry. • Examples from Sport, TV, Gambling & Fantasy Sports. A tourde-force of current research projects Wednesday, 23 October 13
  • 3. DEALING WITH THE TITLE: TWITTER • Twitter => Large Data Sets, but specific research questions often require a small data set: – Australian users – Users registering on the platform during natural disasters – ‘Experts’ on Fantasy Sports – Sporting Participants: Golf, Tennis, NFL, College Football, etc.. – Reality TV ‘fanatics’ – Almost infinite examples • Goal is to get from “Big Data” to what I’ve been calling “useful data” Wednesday, 23 October 13
  • 4. DEALING WITH THE TITLE: GAMBLING • Long term interest in the gambling industry (one case study in my prior work on games). • Many parallels between Gambling and Fantasy Sports (another current research project). • When I was an ‘active participant’, Twitter was just becoming popular (2006-2010). • It quickly became a crucial source of information, and websites started aggregating it. Wednesday, 23 October 13
  • 5. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  • 6. DEALING WITH THE TITLE: GAMBLING Wednesday, 23 October 13
  • 7. DEALING WITH THE TITLE: TIME SENSITIVE INFORMATION • Lines move incredibly fast: Just as much a market as day-trading on the stock exchange Wednesday, 23 October 13
  • 8. WHY IS DATA SLICED? • Streaming API is limited to ~1% of total tweets per second & Firehose access is expensive. • Large data sets are not easily malleable, or visually analyzed (e.g. with Tableau): – Our database of Twitter users is ~3.7TB, and growing. – A weeks worth of selected TV data (current US shows) in JSON format is 750MB, and 600MB in TSV (selected fields). And millions of rows. • Analyzing large data sets is slow, if it’s even possible => “Usable Data” Wednesday, 23 October 13
  • 9. HOW IS DATA SLICED: COMPULSORY Wednesday, 23 October 13
  • 10. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- WTA Wednesday, 23 October 13
  • 11. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS Wednesday, 23 October 13
  • 12. HOW IS DATA SLICED: SELECTING FOR AUTHENTICITY -- FANTASY SPORTS CLIP  FROM  YAHOO  FANTASY  FOOTBALL  RE:  CALVIN  JOHNSON  INJURY  &   TWITTER  REPORTS Wednesday, 23 October 13
  • 13. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  • 14. BUT YOU STILL NEED A SANITY CHECK Wednesday, 23 October 13
  • 15. HOW IS DATA SLICED: RANDOM SAMPLING Source:  Tony  Hirst  (Open  University  UK) Wednesday, 23 October 13
  • 16. BUT SOMETIMES YOU NEED THE FULL SAMPLE & REPEATED CAPTURE Source:  Bruns  /  Woodford  [Mapping  Online  Publics] Wednesday, 23 October 13
  • 17. HOW IS DATA SLICED: ONLY A SMALL SAMPLE MATTERS Floods,  Earthquake,  Tsunami Media  Coverage Wednesday, 23 October 13
  • 18. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Impact  of  Live  Feed Wednesday, 23 October 13
  • 19. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Wednesday, 23 October 13
  • 20. HOW IS DATA SLICED: TV -- SEASONAL DATA VS EPISODIC Delayed  TV  sucks Wednesday, 23 October 13
  • 21. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Most active (#BB15, #BBLF) users often defend a HM to the death (akin to sporting tribalism), but most users are attackers (forthcoming paper w/ Katie Prowd) Disclaimer:  Scale  changed  to  fit  on  slide Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  • 22. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  • 23. TIME SLICES OF TWEET CONTENT IS ENLIGHTENING Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  • 24. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE Source:  Woodford  /  Prowd  [Fan  Cultures  and  Hatred  in  Big  Brother  15:  Race  Rows,  EliMsm  &  SporMng  Tribalism  -­‐-­‐  Forthcoming] Wednesday, 23 October 13
  • 25. HOW IS DATA SLICED: MOST ACTIVE ≠ REPRESENTATIVE • Twitter closed these quickly, yet the BB15 accounts remained active for much of the season... Wednesday, 23 October 13
  • 26. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  • 27. AND A QUICK NOTE ON NON-TWITTER ANALYTICS • There’s lots of data out there, but it needs to be sliced to be usable. • You can work with large, original, data sets, but often this adds extra complexity that isn’t necessary to answer your research questions. • But don’t delete the data you don’t need! Wednesday, 23 October 13
  • 28. AND A QUICK NOTE ON NON-TWITTER ANALYTICS Wednesday, 23 October 13
  • 29. ACKNOWLEDGEMENTS • ARC Centre for Excellence in Creative Industries and Innovation (CCI) - http://www.cci.edu.au & http:// www.mappingonlinepublics.net • Social Media Research Group -- http:// socialmedia.qut.edu.au • Queensland University of Technology Wednesday, 23 October 13

×