• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Twitter  as  a data mining source
 

Twitter as a data mining source

on

  • 8,338 views

 

Statistics

Views

Total Views
8,338
Views on SlideShare
8,325
Embed Views
13

Actions

Likes
4
Downloads
131
Comments
1

2 Embeds 13

http://www.slideshare.net 12
http://www.lmodules.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Twitter  as  a data mining source Twitter as a data mining source Presentation Transcript

  • Czech Twitter as a data mining source Josef Šlerka, WebExpo 2009
  • Twitter.com Twitter is a free social networking and micro- blogging service that enables its users to send and read messages knows as tweets. Tweets are text-based posts of up to 140 characters displayed on the authorʼs profile page and delivered to the authorʼs subscribers who are known as followers (Wikipedia)
  • What is data mining and how is it connected with Twitter?
  • Data mining is the process of extracting patterns from data. As more data are gathered, data mining is becoming an increasingly important tool to transform there data into information (Wikipedie) Different variations would be text mining, web mining including semantic analysis
  • Twitter Data mining - makes it easy to use all data mining methods - adds ʻʻtimeʼʼ & ʻʻspaceʼʼ - provides real-time picture - easy connects with other social media (about 30% users have unique nickname for all platforms)
  • Data mining - different methods - different variations of semantic distance of similarities (Jaccard index) - frequency analysis based on time (are people happier in the morning or in the evening?) - frequency analysis based on location - one of the results -> identification of opinion makers in the social networks
  • Transmission News using different APIs to get more information
  • Transmission News = 5 APIs in one www. transnews.tw • 5x Twitter News Service accounts • 1x Yahoo Geo • 1x Google Search AJAX • 1x Google Maps • 1x Open Calais • and a little bit of Wikipedia
  • www.transnews.tw
  • This brings us to the downside of Twitter API
  • API searches are limited to the number of inquiries Even worse, their data doesnʼt go farther than 1.5 weeks in the past
  • Hence the development of Sparrow 1.0
  • Czech Twitter by the numbers
  • Sparrow 1.0 application methodology - archives all tweets located in Czech republic in hourly interval via Twitter API (starting June 2009) - automatically detects language - identifies Czech tweets with word count dictionary - compares Czech Twitter statistics with foreign countriesʼ statistics
  • Sparrow 1.0 - June 2009 stats - about 700.000 tweets - created by 10,628 unique users who enabled their geo-location (CZ) or tweeted in Czech - 5.880 users tweeted at least once in Czech - 2.424 Czech writing users revealed their geo-location (usually about 30% of users do that)
  • How many Twitter users are in the Czech republic? Between 6,000 - 8,000 users write in Czech 1.000 až 2.000 users prefer English There are about 10,000 active Twitter users in CR
  • Whatʼs the Czech Twitter dynamics? Every four weeks the number of users with at least one tweet rises about 25% The number of active users rises 3-5% each week Absolute number of tweets rises about 25% too
  • What characteristics do Czech tweets have? 2 % are RT 4 % use a ʻʼ#ʼʼ 21.5 % represent reply and conversation 34.6 % includes a link
  • What languages do people in the CR use for tweeting?
  • Letʼs see that graph English Czech Slovak Deutsch others 13% 4% 7% 44% 33%
  • Geo-location breakdown of Tweets among big cities in CR (July-August 2009) 6. Liberec 14178x en - 9561x ~ 67.44% 1. Praha 247685x cs - 2864x ~ 20.20% en - 116580x ~ 47.07% sk - 462x ~ 3.26% cs - 79957x ~ 32.28% 9 cities Prague others sk - 16449x ~ 6.64% 7. České Budějovice 6219x 2. Brno 37021x cs - 2589x ~ 41.63% en - 16104x ~ 43.50% en - 1386x ~ 22.29% cs - 14753x ~ 39.85% es - 551x ~ 8.86% sk - 3360x ~ 9.08% 8. Hradec Králové 3. Ostrava 23836x 11888x en - 13885x ~ 58.25% 25% cs - 4696x ~ 39.50% cs - 5306x ~ 22.26% 30% en - 4400x ~ 37.01% pl - 1638x ~ 6.87% de - 1113x ~ 9.36% 4. Plzeň 13681x 9. Ústí nad Labem en - 9160x ~ 66.95% 12016x cs - 2206x ~ 16.12% en - 4266x ~ 35.50% fr - 417x ~ 3.05% de - 2882x ~ 23.98% cs - 2570x ~ 21.39% 5. Olomouc 10754 en - 4619x ~ 42.95% 10. Pardubice 5576x cs - 3062x ~ 28.47% cs - 2718x ~ 48.74% pt - 999x ~ 9.29% 45% en - 1831x ~ 32.84% sk - 414x ~ 7.42%
  • And what about ʻʻwhen?ʼʼ And why does it matter?
  • This is what weʼve learned in a few months: - Czechs tweet most often on Tuesday or Thursday, and the least in Saturday Around the world the most popular day is Tuesday, and the least is Sunday - The number of tweets rises steadily from the beginning to the end of the month, then falls and begins rising again. That means people tweet more at the end of the month than at the beginning.
  • Prediction of the presence Google vs. Twitter
  • MADONNA IN PRAGUE 13. 8. 2009
  • Madonna - August 2009 - Google search
  • Madonna - August 2009 - Czech Twitter
  • Sometimes Twitter is quicker & can predict future searches
  • September 17th, Ostrava
  • Rammstein - August 2009 - Google search
  • Rammstein - August 2009 - Czech Twitter 17.9.2009
  • Thanks for your attention. Questions? Ideas? slerka@ataxo.com