Scaling API-first â The story of a global engineering organization
Â
Analyzing customer sentiments in microblogs
1. Analyzingcustomersentiments in microblogs:A topic-model-basedapproachforTwitterdatasets Amercian Conference on Information Systems, Detroit, 2011 Stefan Sommer, Andreas Schieber Twitterbird: http://www.flickr.com/photos/bertop/3193626407/
2. Imagineyourare a productmanagerof Sony TVs:Whatistheconversationabout? Buy a 3D TV 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 2 Sentiments, opinions Sony TV: http://www.sony.de/hub/lcd-fernseher/produktpalette/3d Communication plattformsofthe Web 2.0
3. The communication plattform Twitter:What am I doing? Microblog - Ordinaryblogwithsocialnetworkingfeatures (followingotherusers) Eachentryis limited to 140 characters Mark wordswithhashtags (#), Adressuserswith@, link informationwithshort URLs Twitteristhemostpopularmicrobloggingservices:1.8 mill. users in Germany (Pettey & Stevens, 2009) 190 mill. usersworldwideï valuablesourceforcompanies (Pak & Paroubekl, 2010) Free accesstothedata via Twitter API 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 3
4. Sentiment analysisisthegoal:Why do weneedtopicmodels? Analytics Sentimentanalysis Hugeammountofdata, only a smallsetofthedatais relevant. Knowledge Discovery in Databases Twitterdata Data source 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 4
5. Research methodology. Research goals Identification of microblog entries containing opinions in a specific context. How can we automatically identify the topics of the entries? Research approach Design-science-based approach (Hevner et al., 2004) Topic detection with generative topic models (Blei& Lafferty, 2009) Twitter as a textual data source 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 5
6. Topic Modelling. âProbabilistic models for uncovering the underlying semantic structure of a document collectionâ (Blei& Lafferty, 2009) Exploratory approach:Latent Dirichlet Allocation (LDA) Lack of knowledge about underlying correlations between topics LDA is allowing the documents to have a mixture of topics LDA is used for exploring and representing topics in microblog entries 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 6
7. Knowledge-Discovery-in-Databasesfor Twitter data sets. Topic modelling Topic identification by implementing LDA Lexicalization and co-occurrences Transformation Preprocessing Removal of unwanted characters Sentiment analysis based on specifictopicclusters Selection by keywords Twitter search by using Twitter API Twitterdata 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 7
8. Data set 1: November 2010, 1.500tweets.Data set 2: January 2011, 1.200 tweets. Target twitter data without search terms Search items 3D Preprocessed targettwitterdata Sony 3D Sony 3D KDL 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 8
9. Results of the first period.November 2010. Identification of 10 topic clusters Most frequent topic âX7â Top 8 words which are representing topic âX7â Sample Twitter documents corresponding to topic âX7â Reduction of raw Twitter data to tweets containing words of a specific context Sentiment analysis of these tweets 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 9
10. Results of the second period.January 2011. Identification of 10 topic clusters Most frequent topic âX9â Top 5 words which are representing topic âX9â Tweets correspond to one topic or at least to one dominant topic We separated tweets with specific topics by using generative topic models We obtained topics referencing current events in our short-time datasets The algorithm automatically detected several topics (manual setting of topic ammount) The topic distribution becomes more specific by specifying the search string 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 10
11. Summary and Q&A. Topic modellingworks in thefieldoftwitterdataanalysis. Wecandistinguishbetween relevant and non-relevant tweetsforspecificquestions(bythecompanies). A firststep in ordertoknowwhattheconversationofcustomersisreallyabout. Goal: Identification of microblog entries containing sentiments in a specific context. Limited datasets, noevaluationmetricsandtheresearchisatthebeginning. Developing a crawlerforlong-term analysis (large scaleevaluation). Evaluation ofothertopicmodelapproaches (correlated, dynamic). Literaturereviewanddiscussion on non topicmodelapproaches. 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 11
12. Contactdetails. Stefan SommerT-Systems Multimedia Solutions GmbH E-Mail: s.sommer@t-systems.comPhone: +49 170 2236 469Twitter: @somerus Andreas SchieberDresden University of Technology E-Mail: andreas.schieber@tu-dresden.deTelefon: +49 351 463 32735Twitter: @reakahont Slides: http://www.slideshare.net/somerus 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 12
13. References. Barnes, S.J. and Böhringer, M. (2009) 'Continuance Usage Intention in Microblogging Services: The Case of Twitter', Proceedings of the 17th European Conference on Information Systems, 1-13. Bermingham, A. and Smeaton, A. (2010) 'Classifying Sentiment in Microblogs - Is Brevity an Advantage?', Proceedings of the 19th ACM international conference on Information and knowledge management, 1833-1836. Blei, D. and Lafferty, J. (2009) Topic Models, [Online], Available: http://www.cs.princeton.edu/~blei/papers/BleiLafferty2009.pdf [30 Nov 2010]. Blei, D., Ng, A. and Jordan, M. (2003) 'Latent Dirichlet Allocation', Journal of Machine Learning Research, pp. 933-1022. Böhringer, M. andGluchowski, P. (2009) 'Microblogging', Informatik-Spektrum, pp. 505-510. Fayyad, U. (1996) Advances in Knowledge Discovery and Data Mining, Menlo Park: AAAI Press. Hevner, A., March, S., Park, J. and Ram, S. (2004) 'Design Science in Information Systems', MIS Quarterly, 28, pp. 75-105. Liu, B. (2007) Web Data Mining, Berlin: Springer. O'Connor, B., Balasubramanyan, R., Routledge, B. and Smith, N. (2010) 'From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series', Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 122-129. Oulasvirta, A., Lehtonen, E., Kurvinen, E. and Raento, M. (2010) 'Making the ordinary visible in microblogs', Personal and ubiquitous computing, Vol. 14 (3), pp. 237-249. Pak, A. and Paroubek, P. (2010) 'Twitter as a Corpus for Sentiment Analysis and Opinion Mining', Proceedings of the International Conference on Language Resources and Evaluation, 1320-1326. Pettey, C. and Stevens, H. (2009) Gartner's Hype Cycle Special Report for 2009, [Online], Available: http://www.gartner.com/it/page.jsp?id=1124212 [7 Dec 2010]. Ramage, D., Dumais, S. and Liebling, D. (2010) 'Characterizing Microblogs with Topic Models', Fourth International AAAI Conference on Weblogs and Social Media. Richter, A., Koch, M. and Krisch, J. (2007) 'Social Commerce - Eine Analyse des Wandels im E-Commerce', Bericht 2007/03, FakultĂ€t Informaitk, UniversitĂ€t der Bundeswehr MĂŒnchen. Stephen, A.T. and Toubia, O. (2010) 'Deriving Value from Social Commerce Networks', Journal of Marketing Research, Nr. 2 Vol. 67, pp. 215-228. Tumasjan, A., Sprenger, T., Sandner, P. and Welpe, I. (2010) 'Predicting Elections with Twitter - What 140 Characters Reveal about Political Sentiment', Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 178-185. Twitter.com (2011) 'Your world, more connected', [Online], Available: http://blog.twitter.com/2011/08/your-world-more-connected.html [2 Aug 2011] 08/06/2011 Sommer, Schieber / Analyzingcustomersentiments in microblogs 13