16. Sandjai Bhulai (s.bhulai@vu.nl)
Challenge #1
• How do you get (all) Dutch tweets?
• Twitter has a streaming API
• Fair use policy delivers random 1% of the Twitter
stream
• Following keywords is allowed
• How much data do you need?
• How much data can you get?
• How much data can you deal with?
• How much data can you store?
18. Sandjai Bhulai (s.bhulai@vu.nl)
Challenge #2
• How do you detect trends on Twitter?
• Absolute frequencies of tweets
• Relative frequencies of tweets
• Speed of tweets
• Acceleration of tweets
• Seasonal patterns
• We need a real-time algorithm
• We need to efficiently handle memory
19. Sandjai Bhulai (s.bhulai@vu.nl)
Trending topics
1. #PrayforMexico
2. #SocialMovies
3. #temblor
4. Sismo de 7.8
5. Earthquake in Mexico
6. John Elway
7. Pat Bowlen
8. Marcelo Lagos
9. Azcapotzalco
10.Niñas de 13 y 14
20 maart 2012, Twitter.com
22. Sandjai Bhulai (s.bhulai@vu.nl)
Challenge #3
• How do you deal with the following tweets?
• “Brand in Amsterdam”
• “Vuur in 020”
• “Fikkie in A’dam”
• “Ik heb brand gezien”
• “Ik zag brand”
• “Ik zie brand”
29. Sandjai Bhulai (s.bhulai@vu.nl)
The future
• Many challenges ahead:
• How to deal with retweets?
• Integration of reputation scores?
• Use of profile information?
• Advantages of semantic research?
• Add feeds of other social media?
• Generalize to other languages?
• Dependencies of GPS information?
• …