Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
User-Generated Content
              on Social Media
               Challenges, Opportunities
            Meena Nagarajan,...
The Shift.. in the rules of the game
                    • Online Media: Packaged Goods Media
                      to a C...
Social Media Investigations
                    • Network: Social structure
                            emerges from the a...
Effects of Networked Publics

                    • Certain social phenomenon admittedly
                      more comple...
People-Content-Network - Possibilities
                    • Emerging social order in online
                      convers...
Stand on the shoulder of micro-giants

                    • The point is that we need a strong grasp
                    ...
Mapping User-generated
             Content to Context


                            7
Tuesday, October 27, 2009          ...
Dimensions Of Analysis
                                                 • Named Entity Identification
                     ...
Dimensions Of Analysis

                                                                     WHAT


                      ...
Dimensions Of Analysis

                            WHY

          •       What are the diverse intentions
               ...
Dimensions Of Analysis

             • Mapping User Intentions                     WHY
                   • Information Se...
Dimensions Of Analysis
                                        • Self-presentation in Online
                            H...
Dimensions Of Analysis

                                                                             HOW


               ...
The Social Media Content
            Landscape..


                            14
Tuesday, October 27, 2009        14
+,'((*-./&0)*
                            !"#$%$#&'()*                             9"$:/.,*7'.83*-./&0)*




             ...
Variety & Formality
     Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun...
Variety & Formality
     Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun...
Making up for lack of context..

                    • Supplement what the data is showing
                      you with ...
Representative Efforts

                            WHAT   WHY   HOW




                                    18
Tuesday, O...
Cultural NER
     WHAT




                                 19
Tuesday, October 27, 2009                  19
Cultural NER
     WHAT



                It was THE HANGOVER of the
                year..lasted forever.. so I went
    ...
Cultural NER
     WHAT



                It was THE HANGOVER of the
                year..lasted forever.. so I went
    ...
Cultural NER
     WHAT

                                                    I decided to check out the Wanted demo today
 ...
Cultural NER
     WHAT

                                                     I decided to check out the Wanted demo today
...
Intuitions..
                    • Spotting and Sense Identification
                                                      ...
Two flavors..
                    • Artist and tracks spotting in MySpace
                      music forums
              ...
Cultural NER in Weblogs
                    • Goal: Supplement classifiers with
                      information that will...
Measure of Extraction Complexity
                    • Feature extraction: Graph-based
                      spreading act...
Feature as a Prior
                             Decision Tree and Boosting Classifiers




                                ...
As a Prior in Binary Classification
               Average F-measure over 1000 decision tree, boosting models




         ...
To chew on..


                    • The concept of ‘Extraction Complexity’
                      as an additional prior i...
User Intention Mapping
      WHY
                    • Unlike Web search intent, entity alone is
                      in-...
Action Patterns
      • Resorted to ‘action patterns’ surrounding named
        entities
            • “where can i find a ...
Information Seeking, Transactional
      • Patterns learned using 8000 uncategorized posts
        on MySpace forums
     ...
Impact on Online Advertising?

                    • Generate ads from user profile
                      (interests, hobbi...
Targeted Content Delivery Platform




                • Of all the ads generated using profile (hobbies,
                 ...
Self-Presentation
      HOW
                    • On Online-dating profiles [ICWSM09] (with Prof.
                      Mar...
Imitate to Impress !?
                    • More similarities than differences

                    • Men displaying a hig...
Science, Fun and Profit



                            34
Tuesday, October 27, 2009                34
BBC SoundIndex, IBM




       “A pioneering project to tap into the online buzz surrounding
                             ...
Twitris: Kno.e.sis
         Real-time user perceptions as the fulcrum for
         browsing the Web [ISWC09b]             ...
Iran elections:       Discussions in the US and Iran on the same day




            The mystery of Soylent Green: informa...
every day; legalize                                                       is no more the




                             ...
Thank You!


                        Google, Bing, Yahoo: Meena Nagarajan

                        meena@knoesis.org

    ...
References
                              http://knoesis.wright.edu/researchers/meena/pubs.php
                [WISE09] Mee...
Upcoming SlideShare
Loading in …5
×

User-Generated Content on Social Media

4,616 views

Published on

Keynote talk at Social Data on the Web workshop, ISWC 2009

  • Be the first to comment

User-Generated Content on Social Media

  1. 1. User-Generated Content on Social Media Challenges, Opportunities Meena Nagarajan, KNO.E.SIS, Wright State meena@knoesis.org, http://knoesis.wright.edu/researchers/meena/ 1 Tuesday, October 27, 2009 1
  2. 2. The Shift.. in the rules of the game • Online Media: Packaged Goods Media to a Conversational Media • Variety of networked interactions, many in near real-time • Information economy: from dearth of signals to plenty much! 2 http://gregverdino.typepad.com/greg_verdinos_blog/images/2007/07/09/web2_logos.jpg Tuesday, October 27, 2009 2
  3. 3. Social Media Investigations • Network: Social structure emerges from the aggregate of relationships (ties) • People: poster identities, the !" !" active effort of accomplishing interaction • Content : studying the content of communication."Who says what, to whom, why, to what extent and with what effect?" [Laswell] 3 Tuesday, October 27, 2009 3
  4. 4. Effects of Networked Publics • Certain social phenomenon admittedly more complex • begs for a people-content-network confluence • Micro-level variations of Content- people-network on macro-level features • “How do the topic of discussion, emotional charge of a conversation, poster characteristics & network connections affect ....?” 4 Tuesday, October 27, 2009 4
  5. 5. People-Content-Network - Possibilities • Emerging social order in online conversations • How are the people-content-network dynamics shaping online conversations? • Can we understand the Influentials theory, information diffusion properties in networks (etc.) while taking people and content into account? 5 Tuesday, October 27, 2009 5
  6. 6. Stand on the shoulder of micro-giants • The point is that we need a strong grasp on the micro-level variables of the content, people and network dimensions to begin explaining what they are doing to any social phenomenon... • My focus is on the micro-level variables in the content dimension. 6 Tuesday, October 27, 2009 6
  7. 7. Mapping User-generated Content to Context 7 Tuesday, October 27, 2009 7
  8. 8. Dimensions Of Analysis • Named Entity Identification WHAT and Disambiguation • Cultural Named Entities • Music artist, track named • What are the Named Entities entities (IBM) [ISWC09a, and topics that people are VLDB09], Movie named making references to? entities (MSR) [WWW2010] • How are they interpreting any • Summaries of user situation in local contexts and perceptions behind real-time supporting them in their events from Twitter variable observations? 8 Tuesday, October 27, 2009 8
  9. 9. Dimensions Of Analysis WHAT • What are the Named Entities www.evri.com and topics that people are http://memetracker.org/ making references to? • How are they interpreting any situation in local contexts and supporting them in their variable observations? 9 Tuesday, October 27, 2009 9
  10. 10. Dimensions Of Analysis WHY • What are the diverse intentions that produce the diverse content on social media? • Why we share by looking at what we predominantly do with the medium. Value derived, repurposing.. • Emotion, sentiment expressions.. 10 Tuesday, October 27, 2009 10
  11. 11. Dimensions Of Analysis • Mapping User Intentions WHY • Information Seeking, Sharing, Transactional intents [WI09] • What is the intention landscape of social media • where is the monetization potential 11 Tuesday, October 27, 2009 11
  12. 12. Dimensions Of Analysis • Self-presentation in Online HOW Dating Profiles (with Prof. Marti Hearst, UC Berkeley) [ICWSM09] • What do word usages tell us about an active population, or about the medium? • Dynamics of a conversation - snubs, flaming words, coordination.. or lack thereof! 12 Tuesday, October 27, 2009 12
  13. 13. Dimensions Of Analysis HOW • What do word usages tell us about an active population? • Self-presentation • Dynamics of a conversation - snubs, flaming words, coordination.. or lack thereof! http://wordwatchers.wordpress.com/2008/10/06/language-in-speeches-and-interviews-summary-comparisons/ 13 Tuesday, October 27, 2009 13
  14. 14. The Social Media Content Landscape.. 14 Tuesday, October 27, 2009 14
  15. 15. +,'((*-./&0)* !"#$%$#&'()* 9"$:/.,*7'.83*-./&0)* ;353./83"3/&)** 1'.$'2(3*4/"5.$2&6/"*7'.83*-./&0)* 1'.$'2(3*4/"5.$2&6/"** 7'.83*-./&0)* Population, Medium Diversity Some mediation Rate of exchange (asynchronous, synchronous) Many-to-many reach Shared Contexts Slangs, abbreviations, grammar, spelling, media-specific vocabulary Interpersonal interactions 15 !"#$%%&&&'()*+,'*-.%#!-/-0%123,45--/%676879:8;8%0)<40%-%== Tuesday, October 27, 2009 15
  16. 16. Variety & Formality Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 * TYPE OF DATA FORMALITY Nat Broadcast Reportage 62.2 Informational writing 61 Academic Social Science 60.6 Writing 58 Professional Letters 57.5 Non Acad Social Science 56.9 Broadcasts 55 Blog corpus 53.3 Scripted Speech 53 TYPE OF DATA FORMALITY Email Corpus 50.8 Prepared speeches 50 Personal Letters 49.7 Imaginative writing 47 Fiction Prose 46.3 Interviews 46 Unscripted Speeches 44.4 Spontaneous speech 44 Conversations 38 Phone Conversations 36 * Heylighen, F. & Dewaele, J. Variation in the contextuality of language: An empirical measure Foundations of Science, 2002, 293-340 16 Weblogs, Genres and Individual Differences: How bloggers write for who they write for; Scott Nowson Tuesday, October 27, 2009 16
  17. 17. Variety & Formality Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 * TYPE OF DATA FORMALITY Nat Broadcast Reportage 62.2 TYPE OF DATA FORMALITY Informational writing 61 Broadcasts 55 Academic Social Science 60.6 Blog corpus 53.3 Writing 58 Scripted Speech 53 Professional Letters 57.5 Email Corpus 50.8 Non Acad Social Science 56.9 Critic Music reviews from 50.13 www.metacritic.com/music/ Broadcasts 55 Yahoo Personals AboutMe 50.10 Blog corpus 53.3 MySpace About Me 50.07 Scripted Speech 53 TYPE OF DATA FORMALITY MySpace - comments on Artist Pages 50.06 Email Corpus 50.8 Prepared speeches 50 Prepared speeches 50 Personal Letters 49.7 Personal Letters 49.7 Imaginative writing 47 Twitter 49.46 Fiction Prose 46.3 Facebook posts 48.20 Interviews 46 Imaginative writing 47 Unscripted Speeches 44.4 Fiction Prose 46.3 Spontaneous speech 44 Conversations 38 Phone Conversations 36 * Heylighen, F. & Dewaele, J. Variation in the contextuality of language: An empirical measure Foundations of Science, 2002, 293-340 16 Weblogs, Genres and Individual Differences: How bloggers write for who they write for; Scott Nowson Tuesday, October 27, 2009 16
  18. 18. Making up for lack of context.. • Supplement what the data is showing you with what you already know.. • Statistical NLP + Contextual Knowledge • Ontologies, Taxonomies, Dictionaries, social medium, shared spatio- temporal contexts.. 17 Tuesday, October 27, 2009 17
  19. 19. Representative Efforts WHAT WHY HOW 18 Tuesday, October 27, 2009 18
  20. 20. Cultural NER WHAT 19 Tuesday, October 27, 2009 19
  21. 21. Cultural NER WHAT It was THE HANGOVER of the year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now 19 Tuesday, October 27, 2009 19
  22. 22. Cultural NER WHAT It was THE HANGOVER of the year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now LOVED UR MUSIC YESTERDAY! 19 Tuesday, October 27, 2009 19
  23. 23. Cultural NER WHAT I decided to check out the Wanted demo today even though I really did not like the movie It was THE HANGOVER of the minus Mrs Jolie a.k.a Fox of course!  year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now LOVED UR MUSIC YESTERDAY! 19 Tuesday, October 27, 2009 19
  24. 24. Cultural NER WHAT I decided to check out the Wanted demo today even though I really did not like the movie It was THE HANGOVER of the minus Mrs Jolie a.k.a Fox of course!  year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now LOVED UR MUSIC YESTERDAY! Obama the Dark Knight of socialism.. the man is not as impressive as Ledger yea 19 Tuesday, October 27, 2009 19
  25. 25. Intuitions.. • Spotting and Sense Identification It was THE HANGOVER of the year..lasted forever.. so I went to the • Open vs. Closed world movies..bad choice picking “GI Jane” worse now • unlike person, location, named entities, contexts and senses change fairly rapidly • We assume an open-world wrt senses • No comprehensive sense knowledge base • Reduce it to a spotting and binary sense classification problem 20 Tuesday, October 27, 2009 20
  26. 26. Two flavors.. • Artist and tracks spotting in MySpace music forums • using the MusicBrainz Taxonomy • with Daniel Gruhl, Jan Pieper, Christine Robson, IBM Almaden, Amit Sheth, Knoesis [ISWC09a] • on Thursday Oct 29, Session: Discovering Semantics • Movie names from Weblogs • with Amir Padovitz, Social Streams MSR, [WWW2010] 21 Tuesday, October 27, 2009 21
  27. 27. Cultural NER in Weblogs • Goal: Supplement classifiers with information that will help them disambiguate the reference of a term better! • A Complexity of Extraction measure associated with an entity in target sense in a corpus • with all cues equal, systems that are ‘complexity aware’ will treat cues differently 22 Tuesday, October 27, 2009 22
  28. 28. Measure of Extraction Complexity • Feature extraction: Graph-based spreading activation and Extracted Complexity clustering (general weblogs) Time Travellerʼs Wife Angels and Demons • entity sense definition from .. Wikipedia + evidence a corpus The Hangover .. presents for the target sense of the Star Trek .. entity Wanted Up Twilight • Ranked list speaks for itself ... • More varied senses and contexts, implies higher extraction complexity 23 Tuesday, October 27, 2009 23
  29. 29. Feature as a Prior Decision Tree and Boosting Classifiers X axis: precision 1500+ hand-labeled data points Y axis: recall Blue: basic features Red: with Entropy baseline Green: with our Complexity of Extraction feature 24 Tuesday, October 27, 2009 24
  30. 30. As a Prior in Binary Classification Average F-measure over 1000 decision tree, boosting models Average Accuracy over 1000 decision tree, boosting models 1500+ hand-labeled data points Blue: basic features Red: with Entropy baseline Green: with our Complexity of 25 Extraction feature Tuesday, October 27, 2009 25
  31. 31. To chew on.. • The concept of ‘Extraction Complexity’ as an additional prior is very promising • applies to general NER 26 Tuesday, October 27, 2009 26
  32. 32. User Intention Mapping WHY • Unlike Web search intent, entity alone is in-sufficient to characterize intent here.. • Three broad intentions: information seeking, sharing, transactional, combinations thereof. • ‘i am thinking of getting X’ (transactional) ‘i like my new X’ (information sharing) ‘what do you think about X’ (information seeking) 27 Tuesday, October 27, 2009 27
  33. 33. Action Patterns • Resorted to ‘action patterns’ surrounding named entities • “where can i find a psp cam..” • A minimally supervised bootstrapping algorithm • 10 seed action patterns, learn new ones from unannotated corpus, relying on a empirical and semantic similarity with seed patterns • semantic similarity from communicative functions of words Linguistic Inquiry Word Count (www.LIWC.net) 28 Tuesday, October 27, 2009 28
  34. 34. Information Seeking, Transactional • Patterns learned using 8000 uncategorized posts on MySpace forums Sample learned patterns does anyone know how know where i can was wondering if someone Im not sure how someone tell me how • Intent recognition recall using pre-classified user posts from Facebook Marketplace (to buy): 81% 29 Tuesday, October 27, 2009 29
  35. 35. Impact on Online Advertising? • Generate ads from user profile (interests, hobbies) or from posts with monetizable intents? 30 Tuesday, October 27, 2009 30
  36. 36. Targeted Content Delivery Platform • Of all the ads generated using profile (hobbies, interests) information, 7% received attention • Ads generated using authored, monetizable posts, 59% received attention What More at [WI09], Beyond Search and Internet Economics Workshop, Why MSR, Redmond, WA http://research.microsoft.com/en-us/um/redmond/about/collaboration/awards/beyondsearchawards.aspx 31 Tuesday, October 27, 2009 31
  37. 37. Self-Presentation HOW • On Online-dating profiles [ICWSM09] (with Prof. Marti Hearst, UCB) • quantifying usages of words from linguistic, personal and psychological categories in LIWC • Exploratory Factor Analysis to identify systematic co-occurrence patterns among LIWC variables • grouping user profiles on the basis of their shared multi-dimensional features to compare and contrast self-presentation 32 Tuesday, October 27, 2009 32
  38. 38. Imitate to Impress !? • More similarities than differences • Men displaying a higher usage of tentative words (maybe, perhaps..) • typically attributed to feminine discourse • Many similarities in word combinations and words used! • Perhaps, self-expression tends towards attempting homophily in online dating.. 33 Tuesday, October 27, 2009 33
  39. 39. Science, Fun and Profit 34 Tuesday, October 27, 2009 34
  40. 40. BBC SoundIndex, IBM “A pioneering project to tap into the online buzz surrounding What artists and songs, by leveraging several popular online sources” Why De-spam, slang transliterations, entity identification, voting theory When, to combine multi-modal online data sources [ICSC08a,VLDB09] Where, http://www.almaden.ibm.com/cs/projects/iis/sound/ Who 35 Tuesday, October 27, 2009 35
  41. 41. Twitris: Kno.e.sis Real-time user perceptions as the fulcrum for browsing the Web [ISWC09b] What When, Where 36 Tuesday, October 27, 2009 36
  42. 42. Iran elections: Discussions in the US and Iran on the same day The mystery of Soylent Green: information where you can use it 37 Tuesday, October 27, 2009 37
  43. 43. every day; legalize is no more the th lk tio n o cr o ro op eo s ab n f illegal immigrants news for Obama! ta lec io m es cy t de pr a u E pin in the healthcare captured October on si , n ra , st on O tio context on 12. September 18. Find resources related to social perceptions Twitris: Twitter through The fourth estate perspective space,time,theme News and Wikipedia articles to put extracted descriptors in context Integrate user observations with news on a particular day; Correlate citizen journalism with the fourth estate; On September 18, Obama was talking about Illegal immigrants in the context of health care; semantics for thematic aggregation Little statistics from Tiwtris (unit: tweets) a tweet ck and checkl new events on twitris #twitris" Healthcare ( Aug 19 - Oct 20) : 721 K (US Only) of a tweet; # (hashtags) user generated meta; @- refer to Obama (Oct 8 - 20): 312 K (US Only) es (Twitter, news services, Wikipedia, and other Web Come see & play with Twitris @ the H1N1 (Oct 5 - 20) : 232 K (US Only) Iran Election (June 5 - Oct 20) : 2.8 m (Worldwide) Concept Cloud, News and related International Semantic Web Challenge ` twi tris inte articles 140 rnals at ISWC ’09 cha in le Twitris rac s Parallel crawling to scale ters s tha Context Data processing pipeline to streamline n + Selected Twitter, geocode services, data analytics, Term to handle heterogeneity Live resource aggregation Near real time: Processing upto a day ogle DBpedia before widget widget Spatio-temporally weighted text analytics Data Processing TFIDF Spatio, Temporal, Extracting Twitris DB based Thematic storylines Cavetas and Future work descriptor descriptor around extraction extraction descriptors 1. Handle Twitter constructs such as hashtags, retweets, mentions and replies better 2. Different viz widgets such as time series to http://twitris.knoesis.org show changing perceptions from a place for an Data Collection event and demographic based visualizations. S S h Geocode Lookup . h Data Dumper . 3. Sentiment analysis a . . a . . 4. Robust computing approaches (Cloud, Hadoop) r . r . e Geocode Lookup . e Data Dumper . 5. FB Connect for sharing and personalization d . d . . . . . M Geocode Lookup M Data Dumper e e m m o o r r y y knoesis.org A tetris like approach to twitter to gather Twtitris with everyone aggregated social signals is defined as 38 Tuesday, October 27, 2009 38
  44. 44. Thank You! Google, Bing, Yahoo: Meena Nagarajan meena@knoesis.org http://knoesis.wright.edu/researchers/meena 39 Tuesday, October 27, 2009 39
  45. 45. References http://knoesis.wright.edu/researchers/meena/pubs.php [WISE09] Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009. [ISWC09a] Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Context and Domain Knowledge Enhanced Entity Spotting in Informal Text, The 8th International Semantic Web Conference, 2009. [ISWC09b] Twitris, Submission to the International Semantic Web Challenge, collocated with the International Semantic Web Conference 2009. [VLDB09] Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social Intelligence in a Realtime Dashboard System, Pending Review, VLDB Journal, Special Issue on "Data Management and Mining on Social Networks and Social Media", 2009. [WWW2010] Meenakshi Nagarajan, Amir Padovitz, A Measure of Extraction Complexity: a Novel Prior for Improving Recognition of Cultural Entities, Manuscript in Preparation, for The Nineteenth International World Wide Web Conference, 2010. [ICSC08a] Alfredo Alba, Varun Bhagwan, Julia Grace, Daniel Gruhl, Kevin Haas, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Nachiketa Sahoo. Applications of Voting Theory to Information Mashups, Second IEEE International Conference on Semantic Computing, ICSC 2008. [ICSC08b] Meenakshi Nagarajan, Cartic Ramakrishnan, Amit Sheth, “Text Analytics for Semantic Computing - the good, the bad and the ugly”, Second IEEE International Conference on Semantic Computing Santa Clara, CA, USA, 2008. [WI09] Meenakshi Nagarajan, Kamal Baid, Amit P. Sheth, and Shaojun Wang, Monetizing User Activity on Social Networks - Challenges and Experiences, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009. [ICWSM09] Meenakshi Nagarajan, Marti A. Hearst. An Examination of Language Use in Online Dating Personals, 3rd Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2009 [IC09] Amit Sheth, Meenakshi Nagarajan. Semantics-Empowered Social Computing IEEE Internet Computing 13(1), 2009. 40 Tuesday, October 27, 2009 40

×