Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Discovering Hot Topics using Twitter Streaming Data

1,572 views

Published on

ASONAM 2013 conference presentation materials

  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Published article can be found at http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6785858&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6785858
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Discovering Hot Topics using Twitter Streaming Data

  1. 1. Discovering Hot Topics using Twitter Streaming Data “Social Topics Detection and Geographic Clustering” Hwi-Gang Kim, Seongjoo Lee, and 
 Sunghyon Kyeong† Mathematical Analytics Team, 
 National Institute for Mathematical Scneice
 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining ASONAM 2013 Niagara Falls, Canada, August 25-28, 2013 †: corresponding author
  2. 2. p Outlines • Introduction • Dataset • Analysis Methods and Results • Conclusion 2
  3. 3. Introduction
  4. 4. p Role of SNSs • Informing breaking news (Twitter Journalism) • Expressing one’s feelings and emotions • Communication tool in daily life • Research tools for studying 
 - social behaviors, 
 - human commmunication, 
 - detection of a flu epidemic, 
 - and text mining 4
  5. 5. p In this study • Twitter streaming API and MongoDB were used for data collection. • We proposed a measure for the social hot topic detection of the day. • Geographic communities were detected for the weather related keywords, and visualized using Google Fusion Table. 5
  6. 6. p Related Works • Met et al. (2006) proposed probabilistic latent semantic indexing (PLSI) to discover a spatiotemporal theme pattern on weblogs. • Wang et al. (2007) proposed location aware topic model (LATM) to incorporate the relationship between locations and words. • Yin et al. (2011) proposed Latent Geogrpahical Topic Analysis (LGTA), a novel location-text joint model. • In general, EM algorithm takes huge amount of computing time, and the previous studies did not directly classify locations by topics. 6 EM: expectation minimization
  7. 7. Dataset
  8. 8. p Data collection • Geo-tagged public statuses tweeted in the united states. • A total of ~19 millions geo-tagged Twitter statuses 
 were obtained from March 23 to April 1, 2013. • This period includes events such as snowfall on spring, same-sex marriage issues by the US court, world cup qualifier match between the US and Mexico, basketball games, and the Easter 8 Twitter streaming data in US
  9. 9. p MongoDB Sharding 9 ! ! ! ! ! ! ! ! ! ! ! !Mongod Mongod Mongod ! ! ! ! ! !Mongod Mongod Mongod ! ! ! ! ! !Mongod Mongod Mongod MongoS! ! ! ! C1 Mongod C2 Mongod C3 Mongod Config Servers Shard1 Shard2 Shard3 ! ! Client Application Replica Sets
  10. 10. Analysis Methods and Results
  11. 11. p Word frequency 11 wf! = X t2T X s2S f! tswf! frequency function for a word ( ) 
 in a US state ( ) at time ( ). ! s t The most frequently tweeted words are not the social topic, but emotional words expressing one’s feelings. Top 5 words and Easter
  12. 12. p Distribution of Word Freq. 12 log10(word frequency) log10(Counts) lol like loveEaster ※ scale-free distribution
  13. 13. a measure of social topics R! t The ratio of 
 word frequency
  14. 14. p Ratio of Word Freq. 14 R! t = F! t F! t 1 F! t + F! t 1 F! t = X s2S f! ts The time series function for a word ( ) integrated over the spatial index ( ).s !The definition of a ratio of word frequency to measure social topic. -1.0 -0.5 0.0 0.5 1.0 Mar/24 Mar/25 Mar/26 Mar/27 Mar/28 Mar/29 Mar/30 Mar/31 Apr/1 Easter lol like love
  15. 15. p Social Topics by 15 Topics Top words in terms of frequency Weather H1={weather, snow, winter, cold, sick} Daily life H2={class, school, gym, lunch, job,jobs,tweetmyjobs} Weekend H3={bar,party,drinking,beer,movies,drunk,club} US law H4={gay,marriage} Sports 1 H5={soccer,usa,mexico} Sports 2 H6={basketball,chicago,bulls,lebron,miami,heat,kevin,leg,injury,michigan} TV show H7={thewalkingdead,walking,dead} Easter H8={easter,church,blassed,bunny,jesus,happy,happyeaster,basket,candy,
 egg,eggs,god,lord} April Fools’ Day H9={april,joke,fool} Emotions H10={lol,like,love,shit,fuck,haha,oh,ass} R! t
  16. 16. p Topic - Weather, H1 16 • According to US newspapers, there was a heavy snowfall in about six states in the Midwest to Estern states, from Missouri to Pensylvania on March 24, 2013. • The snowfall stoped on March 25. Interestingly, is dramatically decreased for the word set H1 on March 26. -0.6 -0.3 0.0 0.3 0.6 Mar/24 Mar/25 Mar/26 Mar/27 Mar/28 Mar/29 Mar/30 Mar/31 Apr/1 Weather Snow Winter Cold Sick R! t
  17. 17. p Topic - Weekend, H3 17 -0.4 -0.2 0.0 0.2 0.4 Mar/24 Mar/25 Mar/26 Mar/27 Mar/28 Mar/29 Mar/30 Mar/31 Apr/1 Bar Party Drinking Beer Movies Drunk Club • Topic words during the weekend include the entertainment words such as moview and party but these are also used steadily during the week albeit less frequently.
  18. 18. p Topic - US Law, H4 • On March 26, the hot topic was the same-sex marriage issue by US court, and we can see the corresponding rapid increase on the March 26. 18 -0.8 -0.4 0.0 0.4 0.8 Mar/24 Mar/25 Mar/26 Mar/27 Mar/28 Mar/29 Mar/30 Mar/31 Apr/1 gay marriage
  19. 19. p Topic - Sports, H5 • As the US and Mexico played a World Cup qualifying match in Mexico on March 26, we found that for the topic ‘Sports 1’ peaked on March. 19 -0.8 -0.4 0.0 0.4 0.8 Mar/24 Mar/25 Mar/26 Mar/27 Mar/28 Mar/29 Mar/30 Mar/31 Apr/1 Soccer USA Mexico R! t
  20. 20. p Topic - Easter, H9 • On March 31, we can see that about Easter such as easter, happy, bunny, egg(s), god and jesus increases. • This is expected as the Easter is one of the most cerebrated Christian festivals in the US. 20 -1.0 -0.5 0.0 0.5 1.0 Mar/24 Mar/25 Mar/26 Mar/27 Mar/28 Mar/29 Mar/30 Mar/31 Apr/1 Easter Blessed Bunny Jesus Happy Happyeaster Basket Candy Egg Eggs God Lord R! t
  21. 21. p Topic - Emotions, H10 • The for emotional words was showed a small fluctuation ( ) even though they showed higher word frequency ranking. • This results suggest that the frequency of expressions of feelings and emotions are relatively constant over time. 21 -0.1 -0.1 0.0 0.1 0.1 Mar/24 Mar/25 Mar/26 Mar/27 Mar/28 Mar/29 Mar/30 Mar/31 Apr/1 lol like love shit fuck haha oh ass R! t |R! t | < 0.1
  22. 22. p Geographic Clustering • For each set of hot topic Hk, we computed the spatiotemporal matrix for the k-th hot topic as the following: 22 k ts = X !2Hk f! ts • Then we obtained the adjacency matrix by Pearson’s correlation coefficient between US states: Ak ij = Corr( k •i, k •j) • Modularity (Q) was computed from the weighted graph using a Louvain community detection algorithm, which maximize Q Q = 1 2m X i,j h Aij sisj 2m i (Ci, Cj)
  23. 23. Graph Theory C B A D
  24. 24. p Types of Graph 24 1. What is degree? 2. betweenness centrality? 3. global/local network efficiency? 4. modular structure undirected 
 binary graph directed binary graph directed weighted graph 1 3 6 5 2 4 0 1 1 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 Aij  = Adjacency
 Matrix
  25. 25. p Network Analysis Ex. 25 co-authorship network 
 formed by author list semantic network
 formed by free association Steyvers, Cognitive Science 29 (2005) 41–78 Neumann, PNAS 101 (2004) 5200-5205
  26. 26. p Geographic Clustering 26 Geographic Clustering Adjacency Matrix
  27. 27. p Conclusion • The ratio of word frequency properly detected social hot topics of the day by identifying increasing or decreasing frequency of keywords in Twitter messages, • while supressing the non-topic keywords such as frequencly tweeted emotional words (e.g., lol, like, and love). • The social topic detection method may be applied on a different time scale, e.g., hourly, monghly, or yearly. • The geographic clustering based on a social topic appropriately reflected not only the patyway of spring storm but also the properties of US geography. 27
  28. 28. Thank you for your attention

×