User-Generated Content on Social Media

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite & 1 Group

    User-Generated Content on Social Media - Presentation Transcript

    1. User-Generated Content on Social Media Challenges, Opportunities Meena Nagarajan, KNO.E.SIS, Wright State meena@knoesis.org, http://knoesis.wright.edu/researchers/meena/ 1 Tuesday, October 27, 2009 1
    2. The Shift.. in the rules of the game • Online Media: Packaged Goods Media to a Conversational Media • Variety of networked interactions, many in near real-time • Information economy: from dearth of signals to plenty much! 2 http://gregverdino.typepad.com/greg_verdinos_blog/images/2007/07/09/web2_logos.jpg Tuesday, October 27, 2009 2
    3. Social Media Investigations • Network: Social structure emerges from the aggregate of relationships (ties) • People: poster identities, the !" !" active effort of accomplishing interaction • Content : studying the content of communication."Who says what, to whom, why, to what extent and with what effect?" [Laswell] 3 Tuesday, October 27, 2009 3
    4. Effects of Networked Publics • Certain social phenomenon admittedly more complex • begs for a people-content-network confluence • Micro-level variations of Content- people-network on macro-level features • “How do the topic of discussion, emotional charge of a conversation, poster characteristics & network connections affect ....?” 4 Tuesday, October 27, 2009 4
    5. People-Content-Network - Possibilities • Emerging social order in online conversations • How are the people-content-network dynamics shaping online conversations? • Can we understand the Influentials theory, information diffusion properties in networks (etc.) while taking people and content into account? 5 Tuesday, October 27, 2009 5
    6. Stand on the shoulder of micro-giants • The point is that we need a strong grasp on the micro-level variables of the content, people and network dimensions to begin explaining what they are doing to any social phenomenon... • My focus is on the micro-level variables in the content dimension. 6 Tuesday, October 27, 2009 6
    7. Mapping User-generated Content to Context 7 Tuesday, October 27, 2009 7
    8. Dimensions Of Analysis • Named Entity Identification WHAT and Disambiguation • Cultural Named Entities • Music artist, track named • What are the Named Entities entities (IBM) [ISWC09a, and topics that people are VLDB09], Movie named making references to? entities (MSR) [WWW2010] • How are they interpreting any • Summaries of user situation in local contexts and perceptions behind real-time supporting them in their events from Twitter variable observations? 8 Tuesday, October 27, 2009 8
    9. Dimensions Of Analysis WHAT • What are the Named Entities www.evri.com and topics that people are http://memetracker.org/ making references to? • How are they interpreting any situation in local contexts and supporting them in their variable observations? 9 Tuesday, October 27, 2009 9
    10. Dimensions Of Analysis WHY • What are the diverse intentions that produce the diverse content on social media? • Why we share by looking at what we predominantly do with the medium. Value derived, repurposing.. • Emotion, sentiment expressions.. 10 Tuesday, October 27, 2009 10
    11. Dimensions Of Analysis • Mapping User Intentions WHY • Information Seeking, Sharing, Transactional intents [WI09] • What is the intention landscape of social media • where is the monetization potential 11 Tuesday, October 27, 2009 11
    12. Dimensions Of Analysis • Self-presentation in Online HOW Dating Profiles (with Prof. Marti Hearst, UC Berkeley) [ICWSM09] • What do word usages tell us about an active population, or about the medium? • Dynamics of a conversation - snubs, flaming words, coordination.. or lack thereof! 12 Tuesday, October 27, 2009 12
    13. Dimensions Of Analysis HOW • What do word usages tell us about an active population? • Self-presentation • Dynamics of a conversation - snubs, flaming words, coordination.. or lack thereof! http://wordwatchers.wordpress.com/2008/10/06/language-in-speeches-and-interviews-summary-comparisons/ 13 Tuesday, October 27, 2009 13
    14. The Social Media Content Landscape.. 14 Tuesday, October 27, 2009 14
    15. +,'((*-./&0)* !"#$%$#&'()* 9"$:/.,*7'.83*-./&0)* ;353./83"3/&)** 1'.$'2(3*4/"5.$2&6/"*7'.83*-./&0)* 1'.$'2(3*4/"5.$2&6/"** 7'.83*-./&0)* Population, Medium Diversity Some mediation Rate of exchange (asynchronous, synchronous) Many-to-many reach Shared Contexts Slangs, abbreviations, grammar, spelling, media-specific vocabulary Interpersonal interactions 15 !"#$%%&&&'()*+,'*-.%#!-/-0%123,45--/%676879:8;8%0)<40%-%== Tuesday, October 27, 2009 15
    16. Variety & Formality Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 * TYPE OF DATA FORMALITY Nat Broadcast Reportage 62.2 Informational writing 61 Academic Social Science 60.6 Writing 58 Professional Letters 57.5 Non Acad Social Science 56.9 Broadcasts 55 Blog corpus 53.3 Scripted Speech 53 TYPE OF DATA FORMALITY Email Corpus 50.8 Prepared speeches 50 Personal Letters 49.7 Imaginative writing 47 Fiction Prose 46.3 Interviews 46 Unscripted Speeches 44.4 Spontaneous speech 44 Conversations 38 Phone Conversations 36 * Heylighen, F. & Dewaele, J. Variation in the contextuality of language: An empirical measure Foundations of Science, 2002, 293-340 16 Weblogs, Genres and Individual Differences: How bloggers write for who they write for; Scott Nowson Tuesday, October 27, 2009 16
    17. Variety & Formality Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 * TYPE OF DATA FORMALITY Nat Broadcast Reportage 62.2 TYPE OF DATA FORMALITY Informational writing 61 Broadcasts 55 Academic Social Science 60.6 Blog corpus 53.3 Writing 58 Scripted Speech 53 Professional Letters 57.5 Email Corpus 50.8 Non Acad Social Science 56.9 Critic Music reviews from 50.13 www.metacritic.com/music/ Broadcasts 55 Yahoo Personals AboutMe 50.10 Blog corpus 53.3 MySpace About Me 50.07 Scripted Speech 53 TYPE OF DATA FORMALITY MySpace - comments on Artist Pages 50.06 Email Corpus 50.8 Prepared speeches 50 Prepared speeches 50 Personal Letters 49.7 Personal Letters 49.7 Imaginative writing 47 Twitter 49.46 Fiction Prose 46.3 Facebook posts 48.20 Interviews 46 Imaginative writing 47 Unscripted Speeches 44.4 Fiction Prose 46.3 Spontaneous speech 44 Conversations 38 Phone Conversations 36 * Heylighen, F. & Dewaele, J. Variation in the contextuality of language: An empirical measure Foundations of Science, 2002, 293-340 16 Weblogs, Genres and Individual Differences: How bloggers write for who they write for; Scott Nowson Tuesday, October 27, 2009 16
    18. Making up for lack of context.. • Supplement what the data is showing you with what you already know.. • Statistical NLP + Contextual Knowledge • Ontologies, Taxonomies, Dictionaries, social medium, shared spatio- temporal contexts.. 17 Tuesday, October 27, 2009 17
    19. Representative Efforts WHAT WHY HOW 18 Tuesday, October 27, 2009 18
    20. Cultural NER WHAT 19 Tuesday, October 27, 2009 19
    21. Cultural NER WHAT It was THE HANGOVER of the year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now 19 Tuesday, October 27, 2009 19
    22. Cultural NER WHAT It was THE HANGOVER of the year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now LOVED UR MUSIC YESTERDAY! 19 Tuesday, October 27, 2009 19
    23. Cultural NER WHAT I decided to check out the Wanted demo today even though I really did not like the movie It was THE HANGOVER of the minus Mrs Jolie a.k.a Fox of course!  year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now LOVED UR MUSIC YESTERDAY! 19 Tuesday, October 27, 2009 19
    24. Cultural NER WHAT I decided to check out the Wanted demo today even though I really did not like the movie It was THE HANGOVER of the minus Mrs Jolie a.k.a Fox of course!  year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now LOVED UR MUSIC YESTERDAY! Obama the Dark Knight of socialism.. the man is not as impressive as Ledger yea 19 Tuesday, October 27, 2009 19
    25. Intuitions.. • Spotting and Sense Identification It was THE HANGOVER of the year..lasted forever.. so I went to the • Open vs. Closed world movies..bad choice picking “GI Jane” worse now • unlike person, location, named entities, contexts and senses change fairly rapidly • We assume an open-world wrt senses • No comprehensive sense knowledge base • Reduce it to a spotting and binary sense classification problem 20 Tuesday, October 27, 2009 20
    26. Two flavors.. • Artist and tracks spotting in MySpace music forums • using the MusicBrainz Taxonomy • with Daniel Gruhl, Jan Pieper, Christine Robson, IBM Almaden, Amit Sheth, Knoesis [ISWC09a] • on Thursday Oct 29, Session: Discovering Semantics • Movie names from Weblogs • with Amir Padovitz, Social Streams MSR, [WWW2010] 21 Tuesday, October 27, 2009 21
    27. Cultural NER in Weblogs • Goal: Supplement classifiers with information that will help them disambiguate the reference of a term better! • A Complexity of Extraction measure associated with an entity in target sense in a corpus • with all cues equal, systems that are ‘complexity aware’ will treat cues differently 22 Tuesday, October 27, 2009 22
    28. Measure of Extraction Complexity • Feature extraction: Graph-based spreading activation and Extracted Complexity clustering (general weblogs) Time Travellerʼs Wife Angels and Demons • entity sense definition from .. Wikipedia + evidence a corpus The Hangover .. presents for the target sense of the Star Trek .. entity Wanted Up Twilight • Ranked list speaks for itself ... • More varied senses and contexts, implies higher extraction complexity 23 Tuesday, October 27, 2009 23
    29. Feature as a Prior Decision Tree and Boosting Classifiers X axis: precision 1500+ hand-labeled data points Y axis: recall Blue: basic features Red: with Entropy baseline Green: with our Complexity of Extraction feature 24 Tuesday, October 27, 2009 24
    30. As a Prior in Binary Classification Average F-measure over 1000 decision tree, boosting models Average Accuracy over 1000 decision tree, boosting models 1500+ hand-labeled data points Blue: basic features Red: with Entropy baseline Green: with our Complexity of 25 Extraction feature Tuesday, October 27, 2009 25
    31. To chew on.. • The concept of ‘Extraction Complexity’ as an additional prior is very promising • applies to general NER 26 Tuesday, October 27, 2009 26
    32. User Intention Mapping WHY • Unlike Web search intent, entity alone is in-sufficient to characterize intent here.. • Three broad intentions: information seeking, sharing, transactional, combinations thereof. • ‘i am thinking of getting X’ (transactional) ‘i like my new X’ (information sharing) ‘what do you think about X’ (information seeking) 27 Tuesday, October 27, 2009 27
    33. Action Patterns • Resorted to ‘action patterns’ surrounding named entities • “where can i find a psp cam..” • A minimally supervised bootstrapping algorithm • 10 seed action patterns, learn new ones from unannotated corpus, relying on a empirical and semantic similarity with seed patterns • semantic similarity from communicative functions of words Linguistic Inquiry Word Count (www.LIWC.net) 28 Tuesday, October 27, 2009 28
    34. Information Seeking, Transactional • Patterns learned using 8000 uncategorized posts on MySpace forums Sample learned patterns does anyone know how know where i can was wondering if someone Im not sure how someone tell me how • Intent recognition recall using pre-classified user posts from Facebook Marketplace (to buy): 81% 29 Tuesday, October 27, 2009 29
    35. Impact on Online Advertising? • Generate ads from user profile (interests, hobbies) or from posts with monetizable intents? 30 Tuesday, October 27, 2009 30
    36. Targeted Content Delivery Platform • Of all the ads generated using profile (hobbies, interests) information, 7% received attention • Ads generated using authored, monetizable posts, 59% received attention What More at [WI09], Beyond Search and Internet Economics Workshop, Why MSR, Redmond, WA http://research.microsoft.com/en-us/um/redmond/about/collaboration/awards/beyondsearchawards.aspx 31 Tuesday, October 27, 2009 31
    37. Self-Presentation HOW • On Online-dating profiles [ICWSM09] (with Prof. Marti Hearst, UCB) • quantifying usages of words from linguistic, personal and psychological categories in LIWC • Exploratory Factor Analysis to identify systematic co-occurrence patterns among LIWC variables • grouping user profiles on the basis of their shared multi-dimensional features to compare and contrast self-presentation 32 Tuesday, October 27, 2009 32
    38. Imitate to Impress !? • More similarities than differences • Men displaying a higher usage of tentative words (maybe, perhaps..) • typically attributed to feminine discourse • Many similarities in word combinations and words used! • Perhaps, self-expression tends towards attempting homophily in online dating.. 33 Tuesday, October 27, 2009 33
    39. Science, Fun and Profit 34 Tuesday, October 27, 2009 34
    40. BBC SoundIndex, IBM “A pioneering project to tap into the online buzz surrounding What artists and songs, by leveraging several popular online sources” Why De-spam, slang transliterations, entity identification, voting theory When, to combine multi-modal online data sources [ICSC08a,VLDB09] Where, http://www.almaden.ibm.com/cs/projects/iis/sound/ Who 35 Tuesday, October 27, 2009 35
    41. Twitris: Kno.e.sis Real-time user perceptions as the fulcrum for browsing the Web [ISWC09b] What When, Where 36 Tuesday, October 27, 2009 36
    42. Iran elections: Discussions in the US and Iran on the same day The mystery of Soylent Green: information where you can use it 37 Tuesday, October 27, 2009 37
    43. every day; legalize is no more the th lk tio n o cr o ro op eo s ab n f illegal immigrants news for Obama! ta lec io m es cy t de pr a u E pin in the healthcare captured October on si , n ra , st on O tio context on 12. September 18. Find resources related to social perceptions Twitris: Twitter through The fourth estate perspective space,time,theme News and Wikipedia articles to put extracted descriptors in context Integrate user observations with news on a particular day; Correlate citizen journalism with the fourth estate; On September 18, Obama was talking about Illegal immigrants in the context of health care; semantics for thematic aggregation Little statistics from Tiwtris (unit: tweets) a tweet ck and checkl new events on twitris #twitris" Healthcare ( Aug 19 - Oct 20) : 721 K (US Only) of a tweet; # (hashtags) user generated meta; @- refer to Obama (Oct 8 - 20): 312 K (US Only) es (Twitter, news services, Wikipedia, and other Web Come see & play with Twitris @ the H1N1 (Oct 5 - 20) : 232 K (US Only) Iran Election (June 5 - Oct 20) : 2.8 m (Worldwide) Concept Cloud, News and related International Semantic Web Challenge ` twi tris inte articles 140 rnals at ISWC ’09 cha in le Twitris rac s Parallel crawling to scale ters s tha Context Data processing pipeline to streamline n + Selected Twitter, geocode services, data analytics, Term to handle heterogeneity Live resource aggregation Near real time: Processing upto a day ogle DBpedia before widget widget Spatio-temporally weighted text analytics Data Processing TFIDF Spatio, Temporal, Extracting Twitris DB based Thematic storylines Cavetas and Future work descriptor descriptor around extraction extraction descriptors 1. Handle Twitter constructs such as hashtags, retweets, mentions and replies better 2. Different viz widgets such as time series to http://twitris.knoesis.org show changing perceptions from a place for an Data Collection event and demographic based visualizations. S S h Geocode Lookup . h Data Dumper . 3. Sentiment analysis a . . a . . 4. Robust computing approaches (Cloud, Hadoop) r . r . e Geocode Lookup . e Data Dumper . 5. FB Connect for sharing and personalization d . d . . . . . M Geocode Lookup M Data Dumper e e m m o o r r y y knoesis.org A tetris like approach to twitter to gather Twtitris with everyone aggregated social signals is defined as 38 Tuesday, October 27, 2009 38
    44. Thank You! Google, Bing, Yahoo: Meena Nagarajan meena@knoesis.org http://knoesis.wright.edu/researchers/meena 39 Tuesday, October 27, 2009 39
    45. References http://knoesis.wright.edu/researchers/meena/pubs.php [WISE09] Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009. [ISWC09a] Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Context and Domain Knowledge Enhanced Entity Spotting in Informal Text, The 8th International Semantic Web Conference, 2009. [ISWC09b] Twitris, Submission to the International Semantic Web Challenge, collocated with the International Semantic Web Conference 2009. [VLDB09] Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social Intelligence in a Realtime Dashboard System, Pending Review, VLDB Journal, Special Issue on "Data Management and Mining on Social Networks and Social Media", 2009. [WWW2010] Meenakshi Nagarajan, Amir Padovitz, A Measure of Extraction Complexity: a Novel Prior for Improving Recognition of Cultural Entities, Manuscript in Preparation, for The Nineteenth International World Wide Web Conference, 2010. [ICSC08a] Alfredo Alba, Varun Bhagwan, Julia Grace, Daniel Gruhl, Kevin Haas, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Nachiketa Sahoo. Applications of Voting Theory to Information Mashups, Second IEEE International Conference on Semantic Computing, ICSC 2008. [ICSC08b] Meenakshi Nagarajan, Cartic Ramakrishnan, Amit Sheth, “Text Analytics for Semantic Computing - the good, the bad and the ugly”, Second IEEE International Conference on Semantic Computing Santa Clara, CA, USA, 2008. [WI09] Meenakshi Nagarajan, Kamal Baid, Amit P. Sheth, and Shaojun Wang, Monetizing User Activity on Social Networks - Challenges and Experiences, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009. [ICWSM09] Meenakshi Nagarajan, Marti A. Hearst. An Examination of Language Use in Online Dating Personals, 3rd Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2009 [IC09] Amit Sheth, Meenakshi Nagarajan. Semantics-Empowered Social Computing IEEE Internet Computing 13(1), 2009. 40 Tuesday, October 27, 2009 40

    + Meena NagarajanMeena Nagarajan, 3 weeks ago

    custom

    178 views, 1 favs, 1 embeds more stats

    Keynote talk at Social Data on the Web workshop, IS more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 178
      • 155 on SlideShare
      • 23 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 10
    Most viewed embeds
    • 23 views on http://knoesis.wright.edu

    more

    All embeds
    • 23 views on http://knoesis.wright.edu

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Groups / Events