Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Classifying Twitter Content


Published on

Published in: Technology, Business

Classifying Twitter Content

  1. 1. Classifying Twitter Content Dr Stephen Dann Australian National University @stephendann Presented at Marketing Science, Houston, June 11, 2011
  2. 2. If you’re on Twitter Questions can be sent to @stephendann or Hashtag #mktsci2011
  3. 3. Why here, why now? <ul><li>Why this presentation? </li></ul><ul><ul><li>MSI interest in role of social media in branding </li></ul></ul><ul><ul><li>Attitudinal metrics from web can predict transactions </li></ul></ul><ul><li>Why this method? </li></ul><ul><ul><li>Try to further avoid the criteriaflation issue </li></ul></ul><ul><ul><ul><li>Hence announcing a coding structure exists </li></ul></ul></ul><ul><li>What outcome? </li></ul><ul><ul><li>I could use a good set of equations </li></ul></ul>
  4. 4. Series of Projects <ul><li>Blog Post reacting to Pear Analytics 2009 </li></ul><ul><li>First Monday Paper (Dann, 2010) </li></ul><ul><li>Marketing Science, Method <-You are here. </li></ul><ul><li>USF Social Marketing, Social Media in Social Marketing (next week) </li></ul><ul><li>AMSRS Conference, Crisis Communication Analysis (September) </li></ul><ul><li>ANZMAC, Categories in Detail (December) </li></ul>
  5. 5. Twitter. <ul><li>Twitter matters because of what it is: at its heart, a platform that offers an exchange of ideas and information on an unprecedented scale. </li></ul><ul><li>Why Twitter Matters : Marketing : Idea Hub :: American Express OPEN Forum Fri Oct 02 2009 21:16:49 GMT+1000 (AUS Eastern Standard Time) </li></ul>Twitter in Plain English
  6. 6. How to analyze a living medium? Hawthorn Effect*Uncertainty Principle Sample Size / Twitter Volume [ ]
  7. 7. Why do any coding? <ul><li>Twitter is not about the aggregate firehose </li></ul><ul><ul><li>There are those who disagree, and I have cited many of them. However, few, if any actually read the impossibly fast updating full timeline </li></ul></ul><ul><li>Twitter is about how you use it. </li></ul><ul><ul><li>Twitter becomes something in co-creation </li></ul></ul><ul><ul><li>Twitter timeline as documented history </li></ul></ul><ul><ul><li>Tracking Near-Past Behaviour </li></ul></ul>
  8. 8. Raw Counts Tweetstats –
  9. 9. Text Analysis Tweetstats – Wordle –
  10. 10. Prior Analysis <ul><li>Boyd et al 2010 </li></ul><ul><li>Crawford 2009 </li></ul><ul><li>DiMicco, et al 2008 </li></ul><ul><li>Fahmi 2009 </li></ul><ul><li>Gay et al 2009 </li></ul><ul><li>Heany and McClurg 2009 </li></ul><ul><li>Hohl 2009 </li></ul><ul><li>Honeycutt and Herring 2009 </li></ul><ul><li>Jansen et al 2009 </li></ul><ul><li>Java et al 2007 </li></ul><ul><li>Lariscy et al 2009 </li></ul><ul><li>Makice, 2009 </li></ul><ul><li>Miller, 2008 </li></ul><ul><li>Naaman et al 2010 </li></ul><ul><li>Pear Analytics 2009 </li></ul><ul><li>Steiner 2009 </li></ul><ul><li>Zhao and Rosson 2009 </li></ul>Dann (2010) based on:
  11. 11. Schema <ul><li>Developed from ground theory approach </li></ul><ul><ul><li>60+ Twitter articles </li></ul></ul><ul><ul><ul><li>Use behaviours, content analysis, sentiment analysis </li></ul></ul></ul><ul><ul><li>10,000+ tweets </li></ul></ul><ul><ul><ul><li>Manual coding </li></ul></ul></ul><ul><li>Supporting analysis </li></ul><ul><ul><li>Linguistic Analysis (LIEC) </li></ul></ul><ul><ul><ul><li>Automated analysis </li></ul></ul></ul><ul><ul><li>Leximancer Analysis </li></ul></ul>
  12. 12. Framework <ul><li>Six categories. </li></ul><ul><li>1. Conversational </li></ul><ul><li>2 . News Events </li></ul><ul><li>3 . Pass along </li></ul><ul><li>4 . Phatic </li></ul><ul><li>5 . Status </li></ul><ul><li>6 . Spam </li></ul>
  13. 13. Conversational <ul><li>core of the interpersonal exchange on Twitter, and the binding activity that links different users together into a sense of community, companionship and conversation </li></ul><ul><ul><li>Cahill 2009, Cranefield and Yoong 2009, Honeycutt and Herring 2009, Java et al 2009, Perlmutter 2009, Steiner 2009, Ratkiewicz 2010). </li></ul></ul><ul><li>four identifiable sub components </li></ul><ul><ul><li>action, query, referral and response </li></ul></ul>
  14. 14. News Events <ul><li>broad selection of media releases, citizen journalism, professional journalism, PR and publicity </li></ul><ul><ul><li>Mäkinen and Wangu Kuira 2008, Power and Forte 2008, Java et al 2009, Phelan et al 2009, Chu et al 2010, Petrovic et al 2010, Zhou et al 2010, Phuvipadawat and Murata 2011, Cheong and Lee 2011). </li></ul></ul><ul><ul><li>Seven categories: </li></ul></ul><ul><ul><ul><li>announcements, hashtagged events, headlines, sport, natural disasters, transport and weather. </li></ul></ul></ul>
  15. 15. Pass along <ul><li>where Twitter is used as a short form publishing outlet for recommended links, other Twitter remarks, or links to the author’s own content </li></ul><ul><ul><li>Java et al 2007, Mischaud 2007, Heany and McClurg 2009, Java et al 2009, Pear Analytics, 2009, Naaman et al 2010, Zhang et al 2010, Bakshy et al 2011). </li></ul></ul><ul><li>Five categories </li></ul><ul><ul><li>automated endorsement, endorsements, retweet, secondary social media and user generated content, </li></ul></ul>
  16. 16. Phatic <ul><li>Use of Twitter as a meanings to maintain a presence within a community, and connections to other users of the service without direct conversation </li></ul><ul><ul><li>Java et al 2007, Miller, 2008, Henneburg et al 2009, Keenan and Shiri 2009, Makice 2009, Pear Analytics, 2009, Fernando 2010, Marwick and boyd 2010, Zhang et al 2010 </li></ul></ul><ul><li>Four categories </li></ul><ul><ul><li>undirected broadcast statements, fourth wall breaking meta commentary, greetings and the unclassifiable content </li></ul></ul>
  17. 17. Status <ul><li>Use of the service to answer the original Twitter question of “What are you doing?” in terms of reporting the user’s sense of “Me-Now”, or statements of immediately transpired activity </li></ul><ul><ul><li>Gaonkar et al 2008, Bollen et al 2009, Java et al 2009, Chu et al 2010, Dodds et al 2011, Naaman et al 2010, Zhang et al 2010 </li></ul></ul><ul><li>eight categories </li></ul><ul><ul><li>activity, automated status, location, mechanical, personal statements, physical, temporal and work </li></ul></ul>
  18. 18. Sub categories <ul><li>Conversational </li></ul><ul><ul><li>Response </li></ul></ul><ul><ul><li>Referral </li></ul></ul><ul><ul><li>Query </li></ul></ul><ul><ul><li>Action </li></ul></ul><ul><li>News Events </li></ul><ul><li>Pass along </li></ul><ul><li>Phatic </li></ul><ul><li>Status </li></ul>
  19. 19. Sub categories <ul><li>Conversational </li></ul><ul><li>News Events </li></ul><ul><ul><li>Headlines </li></ul></ul><ul><ul><li>Hashtagged Event </li></ul></ul><ul><ul><li>Natural disasters </li></ul></ul><ul><ul><li>Transport </li></ul></ul><ul><ul><li>Weather </li></ul></ul><ul><ul><li>Sport </li></ul></ul><ul><ul><li>Announcement </li></ul></ul><ul><li>Pass along </li></ul><ul><li>Phatic </li></ul>
  20. 20. Sub categories <ul><li>Conversational </li></ul><ul><li>News Events </li></ul><ul><li>Pass along </li></ul><ul><ul><li>Retweet </li></ul></ul><ul><ul><li>Endorsement </li></ul></ul><ul><ul><li>Secondary Social Media </li></ul></ul><ul><ul><li>User generated content </li></ul></ul><ul><ul><li>Automated Endorsement </li></ul></ul><ul><li>Phatic </li></ul><ul><li>Status </li></ul>
  21. 21. Marketing Science Style <ul><li>N = 11672 </li></ul><ul><ul><li>Three public sector organisation timelines </li></ul></ul><ul><ul><ul><li>Local government, police force, energy company </li></ul></ul></ul><ul><ul><li>Two hashtags </li></ul></ul><ul><ul><ul><li>natural disaster </li></ul></ul></ul><ul><ul><ul><li>conference </li></ul></ul></ul><ul><ul><li>One personal timeline data set </li></ul></ul>
  22. 22. 1072 1823 4344 1020 602 2811 11672   Total 31 34 126 153 10 834 1188 10% Status 20 24 69 60 12 213 398 3% Phatic 896 949 2780 351 533 278 5787 50% Pass Along 10 31 784 29 17 13 884 8% News Events 115 785 585 427 30 1473 3415 29% Convers-ational Ener. Counc. Police #Conf #Dis. Dann n Data  
  23. 23. Uses of the Data
  24. 24. Here’s where you come in…
  25. 25. The Challenge Time Day Month Year * Spam gets a category indicated as “Delete” 140 characters of text [C] [S] [PA] [N] [P] [X]* [C 1 ] [C 2 ] [C 3 ] [C 4 ] [S 1 ] [S 2 ] [S 3 ] [S 4 ] [S 5 ] [S 6 ] [S 7 ] [S 7 ] [PA 1 ] [PA 2 ] [PA 3 ] [PA 4 ] [PA 5 ] [N 1 ] [N 2 ] [N 3 ] [N 4 ] [N 5 ] [N 6 ] [N 7 ] [P 1 ] [P 2 ] [P 3 ] [P 4 ] [X 1 ] [X 2 ] [X 3 ] [X 4 ]
  26. 26. Future plans <ul><li>Segments and Use-Case Scenarios </li></ul><ul><li>Forward facing strategic guidelines </li></ul><ul><li>Predictive Models </li></ul><ul><li>Certain level of automation </li></ul><ul><li>But not autonomous coding. </li></ul>
  27. 27. References <ul><li>Bakshy, E, Hofman, J, Mason, W and Watts, D (2011) Everyone's an influencer: Quantifying Influence on Twitter, WSDM’11, February 9–12, 2011, Hong Kong, China </li></ul><ul><li>Berger, E (2009) This Sentence Easily Would Fit on Twitter: Emergency Physicians Are Learning to “Tweet”, Annals of Emergency Medicine, 54 (2) 23A-25A </li></ul><ul><li>Bollen, J Mao, H and Zeng, X (2011) Twitter mood predicts the stock market, Journal of Computational Science 2 (1) 1-8 </li></ul><ul><li>Bollen, J, Pepe, A, and Mao, H (2009) Modeling public mood and emotion: Twitter sentiment and socioeconomic phenomena, WWW2010, April 2630, 2010, Raleigh, North Carolina </li></ul><ul><li>boyd, d, Golder, S and Lotan, G (2010) Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, Proceedings of HICSS-43 in January, 2010 </li></ul><ul><li>Bryce T and Pieper C (2010) Using Twitter to Receive Storm Reports, 38th Conference on Broadcast Meteorology, June 2010, </li></ul><ul><li>Butcher, L, (2010) Using Twitter to Advance Cancer Knowledge, Oncology Times, 32 (1) 8-10 </li></ul><ul><li>Cahill, K, 2009 Building a virtual branch at Vancouver Public Library using Web 2.0 tools, Program: electronic library and information systems 43 (2) 140-155 </li></ul><ul><li>Cheong, M and Lee, V C S (2011) A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter, Information Systems Frontiers, 13, p 45-59 </li></ul><ul><li>Chu, Z, Gianvecchio, S, Wang, H and Jajodia, S (2010) Who is Tweeting on Twitter: Human, Bot, or Cyborg?, ACSAC '10 Proceedings of the 26th Annual Computer Security Applications Conference </li></ul><ul><li>Cranefield, J and Yoong, P (2009) Crossings: Embedding personal professional knowledge in a complex online community environment, Online Information Review 33 (2) 257-275 </li></ul><ul><li>Crawford, K (2009)'Following you: Disciplines of listening in social media', Continuum, 23:4, 525 — 535 </li></ul><ul><li>Cuddy, Colleen(2009)'Twittering in Health Sciences Libraries', Journal of Electronic Resources in Medical Libraries, 6:2, 169 – 173 </li></ul><ul><li>Dann, S (2010) Twitter content classification, First Monday, 15 (12)- 6 December 2010, </li></ul><ul><li>DiMicco, J, Millen, D Geyer, W, Dugan, C, Brownholtz, B and Muller, M (2008) Motivations for Social Networking at Work CSCW’08, November 8–12, 711-720 </li></ul><ul><li>Doods, P, Harris, K, Kloumann, I, Bliss, C and Danforth, C (2011) Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter, arXiv:101.5120v3 11 Feb 2011 </li></ul><ul><li>Doherty, R (2010) Getting social with recruitment, Strategic HR review, 9 (6) 11-15 </li></ul><ul><li>Dong, A, Zhang, R, Kolari, P, Bai, J, Diaz, F, Chang, Y, Zheng, Z (2010) Time is of the Essence: Improving Recency Ranking Using Twitter Data, WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA </li></ul><ul><li>Efron, M (2011) Information Search and Retrieval in Microblogs, Journal of the American Society for Information Science and Technology, 62 (6) 996–1008 </li></ul><ul><li>Fahmi, W S 2009, Bloggers' street movement and the right to the city. (Re)claiming Cairo's real and virtual &quot;spaces of freedom&quot;, Environment and Urbanization 2009; 21; 89-107 </li></ul>
  28. 28. References <ul><li>Fernando, I (2010) Community creation by means of a social media paradigm, The Learning Organisation, 17 (6) 500-514 </li></ul><ul><li>Fields, E, (2010) A unique Twitter use for reference services, Library Hi Tech News, 6/7 14-15 </li></ul><ul><li>Gaonkar, S., Li, J., Choudhury, R.R., Cox, L., and Schmidt, A (2008) Micro-Blog: Sharing and Querying Content Through Mobile Phones and Social Participation, MobiSys’08, June 17–20, 2008, Breckenridge, Colorado, USA. </li></ul><ul><li>Gay, P Plait, P, Raddick, J, Cain, F and Lakdawalla, E (2009) &quot;Live Casting: Bringing Astronomy to the Masses in Real Time&quot;, CAP Journal, June 26-29 </li></ul><ul><li>Grier, C, Thomas, K., Paxson, V and Zhang, M (2010) @spam: The Underground on 140 Characters or Less, CCS’10, October 4–8, 2010, Chicago, Illinois, USA </li></ul><ul><li>Heany, M and McClurg, S 2009, Social Networks and American Politics: Introduction to the Special Issue, American Politics Research 37, 727-741 </li></ul><ul><li>Henneburg, S. Scammell, M and O'Shaughnessy, N (2009) Political marketing management and theories of democracy, Marketing Theory 2009; 9; 165-188 </li></ul><ul><li>Hohl, M (2009) Beyond the screen: visualizing visits to a website as an experience in physical space, Visual Communication, 8 (3) 273-284 </li></ul><ul><li>Honeycutt, C and Herring, S C (2009) Beyond Microblogging: Conversation and Collaboration via Twitter, (2009). Proceedings of the Forty-Second Hawai’i International Conference on System Sciences (HICSS-42). Los Alamitos, CA: IEEE Press. 1-10, </li></ul><ul><li>Jackson, N and Lilleker, D (2011) 'Microblogging, Constituency Service and Impression Management: UK MPs and the Use of Twitter', The Journal of Legislative Studies, 17: 1, 86 — 105 </li></ul><ul><li>Jansen, B, Zhang, M, Sobel, K and Chowdury, A (2009) Twitter power: Tweets as electronic word of mouth, Journal of the American Society for Information Science and Technology, 60(11):2169–2188, 2009 </li></ul><ul><li>Java, A, Song, X, Finin, T and Tseng, B (2007) Why We Twitter: Understanding Microblogging Usage and Communities, Joint 9th WEBKDD and 1st SNA-KDD Workshop ’07 , August 12, 2007, p 56-65 </li></ul><ul><li>Java, A, Song, X, Finin, T and Tseng, B (2009) Why We Twitter: An Analysis of a Microblogging Community in H. Zhang et al. (Eds.): WebKDD/SNA-KDD 2007, LNCS 5439, pp. 118–138, 2009. </li></ul><ul><li>Keenan, A and Shiri, A, (2009) Sociability and social interaction on social networking websites, Library Review 58 (6) 438-450 </li></ul><ul><li>Krums, 2009 “There's a plane in the Hudson. I'm on the ferry going to pick up the people”, </li></ul><ul><li>, January 16, 2009 </li></ul><ul><li>Lariscy, R Avery, E J, Sweetser, K and Howes, P 2009 An examination of the role of online social media in journalists’ source mix, Public Relations Review 35 (2009) 314–316 </li></ul><ul><li>Lauw, H., Ntoulas, A and Kenthapadi, K (2010) Estimating the Quality of Postings in the Real-time Web, WSDM 2010 Workshop on Search in Social Media. </li></ul><ul><li>Lerman, K and Ghosh, R 2010, Information Contagion: n Empirical Study of the Spread of News on Digg and Twitter Social Networks, In Proceedings of the 4th International Conference on Weblogs and Social Media, 2010. </li></ul><ul><li>Longueville, B, Smith, R., and Luraschi, G., “OMG, from here, I can see the flames!”: a use case of mining Location Based Social Networks to acquire spatiotemporal data on forest fires&quot; ACM LBSN '09, November 3, 2009 </li></ul><ul><li>Makice, K, 2009 Phatics and the Design of Community, CHI 2009, April 4-9, 2009, Boston, Massachusetts </li></ul><ul><li>Mäkinen, M and Wangu Kuira, M 2008, Social Media and Postelection Crisis in Kenya, The International Journal of Press/Politics 2008; 13; 328 </li></ul><ul><li>Marwick A E and boyd, d, (2011) I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience, New Media Society, 13: 114-133 </li></ul>
  29. 29. References <ul><li>Miller, K, 2008, New Media, Networking and Phatic Culture, Convergence: The International Journal of Research into New Media Technologies, 14 (4) 387-400 </li></ul><ul><li>Miller, V (2009) New Media, networking and Phatic Culture, Convergence: The International Journal of Research into New Media Technologies, 14 (4) 387-400 </li></ul><ul><li>Mischaud, E 2007, Twitter: Expressions of the Whole Self An investigation into user appropriation of a web-based communications platform, MSc Dissertation, London School of Economics </li></ul><ul><li>Naaman, M, Boase, J and Lai, C-H (2010) Is it Really About Me? Message Content in Social Awareness Streams, CSCW 2010, February 6–10 </li></ul><ul><li>Okazaki, M and Matsuo, Y 2010 Semantic Twitter: Analyzing Tweets for Real-Time Event Notification, in Breslin, J, Burg, T, Kim, H and Schmidt, J-H (2011) Recent Trends and Developments in Social Software, Springer Berlin / Heidelberg </li></ul><ul><li>Pak, A, Paroubek, P , Twitter as a corpus for sentiment analysis and opinion mining, in N.Calzolari, K.Choukri, B.Maegaard, J.Mariani,J .Odijk, S.Piperidis, M. Rosner, D.Tapias(Eds.), Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), European Language Resources Association ,Valletta, Malta, May 2010, pp.19–21. </li></ul><ul><li>Parslow, G, 2009, Commentary: Twitter for Educational Networking, Biochemistry and Molecular Biology Education 37 (4) 255–256, 2009 </li></ul><ul><li>Pear Analytics (2009) Twitter Study – August 2009, </li></ul><ul><li>Perlmutter, D 2009, Political Blogging and Campaign 2008: A Roundtable, The International Journal of Press/Politics 2008; 13; 160 </li></ul><ul><li>Petrovic S, Osborne, M and Lavrenko, V (2010) Streaming First Story Detection with application to Twitter, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 181-189. </li></ul><ul><li>Phelan, O, McCarthy, K and Smyth, B (2009) Using Twitter to Recommend Real-Time Topical News, RecSys’09, October 23–25, 2009, New York, New York, USA </li></ul><ul><li>Phuvipadawat, S and Murata, T (2011) Detecting a Multi-Level Content Similarity from Microblogs Based on Community Structures and Named Entities, Journal of Emerging Technologies in Web Intelligence, 3 (1), 11-19 </li></ul><ul><li>Power, R and Forte, D 2008, War & Peace in Cyberspace: Don’t twitter away your organisation’s secrets, Computer Fraud and Security, August, 18-20 </li></ul><ul><li>Rath, L (2011) The Effects of Twitter in an Online Learning Environment, eLearn Magazine, </li></ul><ul><li>Ratkiewicz, J, Conover, M, Meiss, M, Gonçalves, B, Patil, S, Flammini, A, and Menczer, F (2010) Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams, Technical Report arXiv:1011.3768 {cs.SI}, CoRR, 2010. </li></ul><ul><li>Sakaki, T, Okazaki, M and Matsuo, Y (2010) Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors, Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 851-860. </li></ul><ul><li>Steiner H, 2009 Reference utility of social networking sites: options and functionality, Library Hi Tech News 5/6, 4-6 </li></ul><ul><li>Sullivan SJ, Schneiders AG, Cheang CW, Kitto E, Lee H, Redhead J, Ward S, Ahmed OH, McCrory PR. (2011) ‘What’s happening?’ A content analysis of concussion-related traffic on Twitter, British Journal of Sports Medicine Mar 15. [Epub ahead of print] </li></ul><ul><li>Thelwall, M, Buckley, K, and Paltoglou, G (2011) Sentiment in Twitter events, Journal of the American Society for Information Science and Technology, 62 (2) 406-418 </li></ul>
  30. 30. References <ul><li>Welch, M., Schonfeld, U., He., D and Cho, J., Topical Semantics of Twitter Links, WSDM’11, February 9–12, 2011, Hong Kong, China </li></ul><ul><li>Wilson, D (2008) Monitoring technology trends with podcasts, RSS and Twitter, Library Hi Tech News, 10, 8-12 </li></ul><ul><li>Zhang, J., Qu, Y., Cody., J and Wu, Y (2010) A Case Study of Micro-blogging in the Enterprise: Use, Value, and Related Issues, CHI 2010, April 10-15, 2010, Atlanta, Georgia, USA </li></ul><ul><li>Zhao, D and Rosson, M B, How and Why People Twitter: The Role that Micro-blogging Plays in Informal Communication at Work, GROUP’04, May 10–13, 2009, 243-252 </li></ul><ul><li>Zhou, Z., Bandari, R., Kong, J., Qian, H., and Roychowdhury, V., (2010) Information Resonance on Twitter: Watching Iran, 1st Workshop on Social Media Analytics (SOMA ’10), July 25, 2010, Washington, DC, USA </li></ul>
  31. 31. Questions [email_address] Or @stephendann
  32. 32. <ul><li>This work is licensed under the Creative Commons Attribution-Share Alike 2.5 Australia License. To view a copy of this license, visit </li></ul>