Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Twitter: Listening to a Kitchen TableConversation Involving 17 Million PeopleErik Tjong Kim Sang (Netherlands eScience Cen...
28 February 1942: De Ruyter sinks                      Ship: HNLMS De Ruyter                      Commander: Karel Doorman...
Social Media
How do we deal with the data volumes?Twitter:• Maximum of 140 characters per messageBut:• 200 million users (1-2 million N...
Netherlands eScience Center (NLeSC)by SURF and NWOSupporting data-intensive researchDistributing project calls offering bo...
Engineers/Scientists @ NLeSCIn every NLeSC project, one eScience engineer isinvolved: a scientist (often with PhD) with st...
The project TwiNLwith Radboud University NijmegenTwitter: a social network on whichparticipants communicate by exchangingm...
Collecting the dataWe continuously search for common Dutch words on Twitter (keywordtracking)Additionally we collect all t...
Searching the dataA web interface has been built for searching the data: http://twiqs.nl
Challenges and solutionsChallenge 1: we have a lot of data (five terabytes)Challenge 2: the collection is growing every se...
Views on search resultsWe offer five different views on search results:•   Tweets (restricted)•   Keyword frequency graph•...
Examples of search results
“eten” (to eat/food)
“rechtbank” (courthouse)
“griep” (flu)
“griep” (flu)
“vast en zeker” in January 2013
“zeker en vast” in January 2013
Words co-occurring with “hedde’’ (Brabant dialect)
All tweets of Tuesday 4 December 2012
“bonus” in February 2013
Concluding remarksSocial media conversations make up interesting research dataThe website http://twiqs.nl can be used for ...
THE END
SRIE13 on Twitter (February 2013)
Twitter: listening to a kitchen table conversation involving 17 million people
Upcoming SlideShare
Loading in …5
×

Twitter: listening to a kitchen table conversation involving 17 million people

3,192 views

Published on

Presentation SURF Research and Innovation Event 2013. February 28, The Hague University of Applied Sciences.

Erik Tjong Kim Sang is eScience Engineer at the Netherlands eScience Center.

  • Be the first to comment

  • Be the first to like this

Twitter: listening to a kitchen table conversation involving 17 million people

  1. 1. Twitter: Listening to a Kitchen TableConversation Involving 17 Million PeopleErik Tjong Kim Sang (Netherlands eScience Center, Amsterdam)
  2. 2. 28 February 1942: De Ruyter sinks Ship: HNLMS De Ruyter Commander: Karel Doorman Date: 28 February 1942 Location: Java Sea Lives lost: 345
  3. 3. Social Media
  4. 4. How do we deal with the data volumes?Twitter:• Maximum of 140 characters per messageBut:• 200 million users (1-2 million NL)• 300 million tweets per day (3-4 million NL)
  5. 5. Netherlands eScience Center (NLeSC)by SURF and NWOSupporting data-intensive researchDistributing project calls offering bothmoney and expertiseInvolved in 21 projects dealing withvarious areas in scienceSee: http://esciencecenter.nl/
  6. 6. Engineers/Scientists @ NLeSCIn every NLeSC project, one eScience engineer isinvolved: a scientist (often with PhD) with strongcomputational skills
  7. 7. The project TwiNLwith Radboud University NijmegenTwitter: a social network on whichparticipants communicate by exchangingmessages of up to 140 charactersWith about 4 million messages in Dutch per day covering differenttopics, this is an interesting data source for researchersProject task: collect Dutch tweets, offer a search facility for the data andprovide different views on search results
  8. 8. Collecting the dataWe continuously search for common Dutch words on Twitter (keywordtracking)Additionally we collect all the tweets of most productive Dutch-speakingusers (user tracking)This generates about 2 milliontweets per day
  9. 9. Searching the dataA web interface has been built for searching the data: http://twiqs.nl
  10. 10. Challenges and solutionsChallenge 1: we have a lot of data (five terabytes)Challenge 2: the collection is growing every secondChallenge 3: we want to search both text and metadataSolution: parallel processing with Hadoop, make possible by :
  11. 11. Views on search resultsWe offer five different views on search results:• Tweets (restricted)• Keyword frequency graph• Keyword usage map• Vocabulary overview, including sentiment analysis• User analysisAll numbers used for creating the views are available for download
  12. 12. Examples of search results
  13. 13. “eten” (to eat/food)
  14. 14. “rechtbank” (courthouse)
  15. 15. “griep” (flu)
  16. 16. “griep” (flu)
  17. 17. “vast en zeker” in January 2013
  18. 18. “zeker en vast” in January 2013
  19. 19. Words co-occurring with “hedde’’ (Brabant dialect)
  20. 20. All tweets of Tuesday 4 December 2012
  21. 21. “bonus” in February 2013
  22. 22. Concluding remarksSocial media conversations make up interesting research dataThe website http://twiqs.nl can be used for accessing this dataThe interface can be used by university staff and students for exploringTwitter conversationsExamples of applications are early news detection, tracking diseasespread, studying dialects and obtaining local weather information
  23. 23. THE END
  24. 24. SRIE13 on Twitter (February 2013)

×