Your SlideShare is downloading. ×
Automatic Extraction of Soccer Game Event Data from Twitter
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Automatic Extraction of Soccer Game Event Data from Twitter

677
views

Published on

Presentation at DeRiVE 2012. Paper at: http://ceur-ws.org/Vol-902/paper_3.pdf …

Presentation at DeRiVE 2012. Paper at: http://ceur-ws.org/Vol-902/paper_3.pdf

Slides by Guido van Oorschot


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
677
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Automac  extracon  of   soccer  game  event  data   from  Twi6er   Guido  van  Oorschot,  Marieke  van  Erp   and  Chris  DijkshoornMonday, November 12, 12
  • 2. Soccer  dataMonday, November 12, 12
  • 3. Theory 1. Fair body of research on automated sports highlight extraction 2. Twitter data can offer interesting insights in real world phenomenaMonday, November 12, 12
  • 4. Automated  highlight  detec@on Let’s Use Twitter data!Monday, November 12, 12
  • 5. 3  Tasks 1. Detecting events What minutes did events occur? 2. Classifying events Is the event a goal, card or substitution? 3. Assigning events to teams Is the event for the home team or away team?Monday, November 12, 12
  • 6. 5  types  of  events - Goal - Own Goal - Red Card - Yellow Card - SubstitutionMonday, November 12, 12
  • 7. Methodology 1. Gathering the data 2. Exploring and cleaning the data 3. Classifying interesting data pointsMonday, November 12, 12
  • 8. Gathering  data - Collect all tweets with game hashtags #ajafey #nacgro #psvutr - Collect official data for each match Goals, cards, substitutionsMonday, November 12, 12
  • 9. Our  data 6 months 61 games 661 events 10,643 tweetsMonday, November 12, 12
  • 10. Three  Experiments 1. Detecting events 2. Classifying events 3. Assigning events to teamsMonday, November 12, 12
  • 11. 1. Detecting eventsMonday, November 12, 12
  • 12. 1. Detecting eventsMonday, November 12, 12
  • 13. 1. Experimental Setup - Goal: detect peaks in # tweets per minute signal to extract events - Setup: Test three peak detection methods: 1. LocMaxNoBaseLineCorr 2. IntThresNoBaseLineCorr 3. IntThresWithBaseLineCorrMonday, November 12, 12
  • 14. 1. ResultsMonday, November 12, 12
  • 15. 1. Findings - Goals and red cards are detected better than yellow cards and substitutions - None of the three peak selection methods works well. - Highlights can be extracted, but not precise enoughMonday, November 12, 12
  • 16. Three  Experiments 1. Detecting events 2. Classifying events 3. Assigning events to teamsMonday, November 12, 12
  • 17. 2. Classifying Events - Goal: Classify minutes into event classes minute “goal” “1” “red” “card” “boring” class 34 0 2 0 1 20 nothing 35 23 34 0 0 0 goal 12 1 2 0 0 5 nothing 13 1 0 22 11 0 red  cardMonday, November 12, 12
  • 18. Issues Problem: Huge, sparse matrix 1. Reduce features Choose words/features smartly 2. Reduce instances Choose minutes smartlyMonday, November 12, 12
  • 19. 2. Experimental Setup - 3 Instance selection settings 1. AllMinutes 2. PeakMinutes 3. EventminutesMonday, November 12, 12
  • 20. 2. Experimental Setup - 7 Feature selection settings 1. AllMoreThanOnce 2. Top500TotalFreq 3. Top10MinuteFreq 4. Top500TotalTfIdf 5. Top10MinuteTfIdf 6. Top50Infogain 7. Top50GainRatioMonday, November 12, 12
  • 21. 2. Experimental Setup - 6 types of classifiers 1. C4.5 2. RandomForest 3. NaiveBayes 4. NaiveBayesMultinomial 5. libSVM 6. IB1Monday, November 12, 12
  • 22. 2. ResultsMonday, November 12, 12
  • 23. 2. Discussion - Top50GainRatio best feature selection - libSVM best classifier - EventMinutes results: Class F-­‐measure OVERALL 0.822 Goal 0.841 Own  goal 0.000           Red  card 0.848 Yellow  card 0.785 Subs@tu@on 0.839Monday, November 12, 12
  • 24. Three  Experiments 1. Detecting events 2. Classifying events 3. Assigning events to teamsMonday, November 12, 12
  • 25. 3. Experimental Setup - Goal: Assign events to team - Based on the ratio between tweets from fans for home and away team - But first: extract fansMonday, November 12, 12
  • 26. 3. Extracting fans - Hypothesis: People that tweet for the same team each week are probably fan of that teamMonday, November 12, 12
  • 27. 3. Extracting fans - Extracted 38,527 fans rom 146,326 f users (26%) - This method of extracting fans works well: Right  team Not  clear Wrong  team 88% 10% 2%Monday, November 12, 12
  • 28. 3. ResultsMonday, November 12, 12
  • 29. 3. Results - Performance of assigning events to teams above baseline performance: Class Baseline Performance OVERALL 52% 58% Goal 58% 69% Red  card 50% 62% Yellow  card 63% 63% Subs@tu@on 52% 57%Monday, November 12, 12
  • 30. Conclusion 1. Detecting events => difficult 2. Classifying events => good results 3. Assigning events to teams => promising resultsMonday, November 12, 12
  • 31. Future Work - Use sentiment in tweets (for detecting events and assigning events to teams) - Player detection - Other sportsMonday, November 12, 12
  • 32. Ques@ons?Monday, November 12, 12