Citizen  Sensor  Data  Mining,                    Social  Media  Analytics  and             Development  Centric  Web  App...
Meena Nagarajan                                Selvam Velmurugan  (Content Analysis)                            (Kiirti, e...
A  Quick  Word       Much  of  the  work  discussed  in  this  tutorial  is         primarily  the  doctoral  research  by...
Outline        Citizen  Sensing:  Role,  Enablers,  Apps            Systematic  Study  Social  Media        Citizen  Sensi...
Citizen  Sensing    Everyday users of Web2.0 and social networks:    Citizens of an Internet- or Web-enabled social    com...
Social  Signals       The activity of observing, reporting, disseminating       information via text, audio, video and bui...
Enablers:  Mobile  Devices  &                   Ubiquitous  Connectivity       Mobile device fast emerging as our primary ...
Enablers:  Mobile  Devices  &                   Ubiquitous  Connectivity       Global, Ubiquitous, always available       ...
Enablers:  Mobile  Devices  &                   Ubiquitous  Connectivity       Sense where you are, how you are, …Monday, ...
Enablers:  Mobile  Devices  &                   Ubiquitous  ConnectivityMonday, June 6, 2011
Enablers:  Mobile  Devices  &                   Ubiquitous  Connectivity       Mobile Platforms Hit Critical Mass       ‣ ...
Enablers:  Web  2.0  &  Social  Media       500M+ Facebook Users       100M+ Twitter users, 85M+ tweets/day       Internet...
Enablers:  Web  2.0  &  Social  Media       100M+ Twitter users, 85M+ tweets/day       Internet Users: 1.8 Bln       Conte...
Enablers:  Web  2.0  &  Social  Media       Internet Users: 1.8 Bln       Content dissemination medium      ‣ Even for tra...
Enablers:  Web  2.0  &  Social  Media       Content dissemination medium      ‣ Even for traditional media (@cnn, @nytimes...
Enablers:  Web  2.0  &  Social  MediaMonday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media        Types of UGC: Twitter(text/microblogs), Facebook       (multimedia),YouTube(v...
Enablers:  Web  2.0  &  Social  Media        Flicker(images), Blogs(text),         Ping: (Social network for music) Monday...
Enablers:  Web  2.0  &  Social  Media        Ping: (Social network for music) Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  MediaMonday, June 6, 2011
Citizen  Sensors  in  Action                                     Iran election                                     Haiti E...
Revolution  2.0                           Political/Social  Activism       “If you want to liberate a government, give the...
Revolution  2.0                           Political/Social  Activism       When Blitzer asked “Tunisia, then Egypt, what’s...
Revolution  2.0                           Political/Social  ActivismMonday, June 6, 2011
Citizen  Journalism                                      Twitter JournalismMonday, June 6, 2011
Social  Media  Influence:              Intelligence,  News  &  Analysis        Many media companies use Facebook and Twitte...
Business  Intelligence  Trend             SpoTing,  Forecasting,  Brand          Tracking    and  Crisis  Management     S...
Development                     (Education,  Health,  eGov)       LiveMocha  (http://www.livemocha.com/)      ‣ Online Lan...
Development                     (Education,  Health,  eGov)       Soliya (http://www.soliya.net/)      ‣ Dialogue between ...
Development                     (Education,  Health,  eGov)       Project Einstein (http://digital-democracy.org/what-we-d...
Development                     (Education,  Health,  eGov)Monday, June 6, 2011
Development                     (Education,  Health,  eGov)       PatientsLikeMe (http://mashable.com/2010/07/13/social-me...
Why  People-­‐‑Content-­‐‑Network                      metadata?Monday, June 6, 2011
Dimensions  of  Systematic  Study                  of  Social  Media             Spatio - Temporal -Thematic              ...
Social  Information                           Processing       "Who says what, to whom, why, to what extent       and with...
Studying  Online  Human  Social                        Dynamics        How  does  the  (semantics  or  style  of)  content...
Studying  Online  Human  Social                        DynamicsMonday, June 6, 2011
Studying  Online  Human  Social                        Dynamics        Example:  how  does  the  topic  of  discussion,   ...
Studying  Online  Human  Social                        DynamicsMonday, June 6, 2011
Metadata/Annotations       Metadata: an organized way to study      ‣ types      ‣ creation/extraction and storage      ‣ ...
The  Anatomy  of  a  TweetMonday, June 6, 2011
People  Metadata:  Variety  of           Self-­‐‑expression  Modes  on    Multiple                    Social  Media  Platf...
People  Metadata:  Various  Levels                                    Demographic                                      Int...
People  Metadata:  Continued     User Demographic Metadata Interest Level Metadata     •User-id                  •Author t...
People  Metadata:  Continued     Activity  Level  Metadata                      Influence  Level  Metadata                 ...
Content  Metadata          Content Independent metadata     ‣"     date, location, author etc       Content Dependent meta...
Content  Metadata       Content Dependent metadata      ‣        Direct content-based metadata      ‣        Explicit/Ment...
Content  MetadataMonday, June 6, 2011
Content  Independent  Metadata           For Tweets           ‣ Published date and time           ‣ Location (where tweet ...
Content  Independent  MetadataMonday, June 6, 2011
Content  Independent  Metadata           For Text messages           ‣     Published date and time           ‣     Origin ...
Content  Independent  MetadataMonday, June 6, 2011
Content  Independent  MetadataMonday, June 6, 2011
Content  Dependent  Metadata  (Tweet)           Direct  Content-­‐‑based  Metadata                       Direct Content-ba...
Content  Dependent  Metadata                         Direct  Content-­‐‑based  MetadataMonday, June 6, 2011
Network  Metadata     Connections/Relationships (foundation for the network)     matter!        Structure  Level  Metadata...
Metadata:  Creation,  Extraction                      and  StorageMonday, June 6, 2011
Metadata  Creation  &  Extraction      Extracted Metadata     ‣ Directly visible information from the user profile, tweet  ...
An  Example     Length: 144 characters; General topic: Egypt protest      This poor {sentiment_expression: {target:”Lara  ...
Why  Semantic  Web  is  a  standard                  for  social  metadata?        Rich  Snippet,  RDFa,  open  graph,  se...
Semantic  Web:  A  Very  Short                           PrimerMonday, June 6, 2011
Semantic  Web:  A  Very  Short                           Primer      Representation     ‣ RDF       ‣ relationships as firs...
Semantic  Web:  A  Very  Short                           PrimerMonday, June 6, 2011
Semantic  Web:  A  Very  Short                           Primer      Annotation     ‣ RDFa, Xlink, model referenceMonday, ...
Semantic  Web:  A  Very  Short                           Primer      Annotation     ‣ RDFa, Xlink, model reference      We...
Semantic  Web:  A  Very  Short                           Primer      Annotation     ‣ RDFa, Xlink, model reference      We...
How  to  save  and  use  metadata?      Store metadata as data and use standard database techniques      Use filtering and ...
How  to  save  and  use  metadata?      Use filtering and clustering, summarization, statistics - implicit semanticsMonday,...
How  to  save  and  use  metadata?Monday, June 6, 2011
How  to  save  and  use  metadata?Monday, June 6, 2011
How  to  save  and  use  metadata?    Use explicit semantics and Semantic Web standards and technologies       ‣semantics ...
Metadata  Extraction  from                              Informal  Text   Meena Nagarajan, Understanding User-Generated Con...
Characteristics  of  Text  on  Social                         MediaMonday, June 6, 2011
The  Formality  of  TextMonday, June 6, 2011
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks       Recognize key entities mentioned in content      ‣ Information Extraction...
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks       Topic Classification, Aboutness of content       ‣ What is the content abo...
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks       Intention Analysis       ‣ Why did they share this content?Monday, June 6...
Content  Analysis-­‐‑Typical  Sub-­‐‑tasksMonday, June 6, 2011
Content  Analysis-­‐‑Typical  Sub-­‐‑tasksMonday, June 6, 2011
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks      Sentiment Analysis       ‣What opinions are people conveying via the conte...
Research  Efforts,  Contributions  in                     this  space..       Examining usefulness of multiple context cues...
Part  1.  NER,                                                              Key                      Phrase  Extraction   ...
Multiple  Context  Cues  Utilized  for          NER  in  Blogs  and  MySpace  Monday, June 6, 2011
Multiple  Context  Cues  Utilized  for       Keyphrase  Extraction  from  TwiTer,            Facebook  and  MySpaceMonday,...
Focus,  Impact       Techniques focus on      ‣ relatively less explored content aspects on social        media platforms ...
NAMED  ENTITY                         RECOGNITIONMonday, June 6, 2011
NAMED  ENTITY                         RECOGNITION      I loved your music Yesterday!      “It was THE HANGOVER of the year...
NAMED  ENTITY                             RECOGNITION                       Identifying and classifying tokensMonday, June...
NER  in  prior  work  vs.  NER  for                      Informal  TextMonday, June 6, 2011
Cultural  Named  Entities          NER  focus  in  this  work:  Cultural  Named       Entities        Artifacts  of  Cultu...
Characteristics  of  Cultural  Entities       Varied senses, several poorly documented      ‣ Merry Christmas covered by 6...
NER  in  prior  work  vs.                           NER  for  Informal  TextMonday, June 6, 2011
A  Spot  and  Disambiguate                               Paradigm       NER generally a sequential prediction problem     ...
A  Spot  and  Disambiguate                               Paradigm       Spot, then disambiguate in context (natural       ...
NER  in  prior  work  vs.                                               NER  for  Informal  TextMonday, June 6, 2011
Algorithmic  Contributions                    Supervised  AlgorithmsMonday, June 6, 2011
Algorithmic  Contributions                    Supervised  Algorithms Examples: “I am watching Pattinson scenes in <movie  ...
Multiple  Senses  in  the  Same                           DomainMonday, June 6, 2011
Algorithm  Preliminaries       Problem Defn       ‣ Cultural Entity Identification : Music album, tracks       ‣ Smile (Lil...
Algorithm  Preliminaries      Corpus: MySpace comments       ‣ Context-poor utterances " “Happy 25th Lilly, Alfieis funny”M...
Algorithm  Preliminaries " “Happy 25th Lilly, Alfieis funny”Monday, June 6, 2011
Algorithm  Preliminaries  Goal:  Semantic  Annotation  of      music  named  entities  (w.r.t             MusicBrainz)Mond...
Using  a  Knowledge  Resource  for           NER  is  not  straight-­‐‑forward..Monday, June 6, 2011
Approach  Overview        Scoped Relationship graphs       ‣Using context cues from the         content, webpage title, ur...
Sample  Real-­‐‑world  Constraints      Career Restrictions       ‣“release your third album already..”      Recent Album ...
Non-­‐‑Music  Mentions       Challenge 1: Several senses in the same domain      ‣ Scoping relationship graphs narrows pos...
Non-­‐‑Music  Mentions       Challenge 1: Several senses in the same domain      ‣ Scoping relationship graphs narrows pos...
Using  Language  Features  to          eliminate  incorrect  mentions..       Syntactic features      ‣ POS Tags, Typed de...
Supervised  LearnersMonday, June 6, 2011
Hand  Labeling  -­‐‑  Fairly  Subjective       1800+  spots  in  MySpace  user  comments  from        artist  pages       ...
Dictionary  SpoTer  +  NLP  Step         Daniel  Gruhl,  Meena  Nagarajan,  Jan  Pieper,  Christine  Robson,  Amit  Sheth,...
NER  on  Social  Media  Text  using                Domain  Knowledge       Highlights issues with using a domain       kno...
BBC  SoundIndex  (IBM  Almaden):           Pulse  of  the  Online  Music            " "                  Daniel  Gruhl,  M...
The  Vision     http://www.almaden.ibm.com/cs/projects/iis/sound/Monday, June 6, 2011
Monday, June 6, 2011
Several  Insights                       Trending  popularity  of  artists            Trending  topics  in  artist  pages O...
Predictive  Power  of  Data    Billboards Top 50 Singles chart during the week of  Sept 22-28 ’07 vs. MySpace popularity c...
Key  Phrase  ExtractionMonday, June 6, 2011
Key  Phrase  Extraction:  Example     Key phrases extracted from prominent discussions     on Twitter around the 2009 Heal...
Key  Phrase  Extraction  from  SM                          Text          Different from Information Extraction          Ex...
Key  Phrase  Extraction  on  SM                         Content       Focus: Summarizing Social Perceptions via key       ...
Key  Phrase  Extraction  on  SM                         Content     ‣ Accounting for redundancy, variability, off-topic   ...
Social  and  Cultural  Logic  in  SMC       Thematic components      ‣ similar messages convey similar ideas       Space, ...
Feature  Space  (common  to  several                      efforts)           Focus: n-grams, spatio-temporal metadata (soci...
Feature  Space  (common  to  several                      efforts)           Document and Structural Cues: Two word        ...
Key  Phrase  Extraction:  Overview“President Obama in trying to regain control of the  health-care debate will likely shif...
A descriptor is an n-gram weighted by:     ‣ Thematic Importance       ‣ TFIDF, stop words, noun phrases          ‣ Redund...
Monday, June 6, 2011
Eliminating Off-topic Content [WISE2009]           Frequency based heuristics will not eliminate           off-topic conte...
Approach  Overview      “Yeah i know this a bit off topic but the other     electronics forum is dead right now. im lookin...
Approach  Overview      Assume one or more seed words (from domain     knowledge base) C1 -[camcorder]      Extracted Key ...
Key  Phrases  and  Aboutness                         Evaluations       Are the key phrases we extracted topical and       ...
Targeted  Content                         Delivery  -­‐‑Evaluations       12K posts from MySpace and Facebook       Electr...
Targeted  Content  for  all  content              vs.  extracted  key  phrasesMonday, June 6, 2011
User  Studies  and  ResultsMonday, June 6, 2011
Impact  and  Contributions       TFIDF + social contextual cues yield more useful       phrases that preserve social perce...
Intention  MiningMonday, June 6, 2011
Targeted  Content  Delivery  via                                                                                  Intentio...
Circa.  2009  Content-­‐‑based  AdsMonday, June 6, 2011
Circa.  2009  -­‐‑Ads  on  ProfilesMonday, June 6, 2011
What  is  going  on  here      Interests do not translate to purchase intents     ‣"    Interests are often outdated..    ...
Intents  in  User  Monday, June 6, 2011
Content  Ads  Outside  ProfilesMonday, June 6, 2011
Targeted  Content-­‐‑based                               Advertising         Non-trivial      ‣ Non-policed content       ...
Targeted  Content-­‐‑based                               Advertising  Monday, June 6, 2011
Targeted  Content-­‐‑based                               Advertising      I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave...
Preliminary  Results  in…         Identifying intents behind user posts on social       networks      ‣ Identify Content w...
Investigations       User studies     ‣ Hard to compare activity based ads to s.o.t.a       ‣ Impressions to Clickthroughs...
Identifying  Monetizable  Intents       Scribe Intent not same as Web Search Intent 1B.       People write sentences, not ...
From  X  to  Action  PaTerns       Action patterns surrounding an entity      ‣ How questions are asked and not topic word...
Conceptual  Overview         Bootstrapping  to  learn  IS  paTerns       Set of user posts from SNSs       Not annotated f...
Bootstrapping  to                            learn  IS  paTerns     Generate  a  universal  set  of  n-­‐‑  gram  paMerns;...
Bootstrapping  to                         learn  IS  paTerns     ! !     Generate  set  of  candidate  paMerns  from  seed...
Bootstrapping  to                         learn  IS  paTerns     ! !     User  picks  10  seed  paMerns  from  Sc     Sis=...
Bootstrapping  to                          learn  IS  paTerns     ! !     ! !               Gradually  expand  Sis  by  ad...
Bootstrapping  to                          learn  IS  paTerns ! ! ! !           For  every  pis  in  Sis  generate  set  o...
Bootstrapping  to                              learn  IS  paTerns     ‘.*  anyone  know  how’‘	          does  .*  know  h...
Extracting  and  Scoring  PaTernsMonday, June 6, 2011
Extracting  and  Scoring  PaTerns                                   •‘does  *  know  how’                               –‘...
Extracting  and  Scoring  PaTerns        Sc=  {‘does  anyone  know  how’,  ‘where  do  I  find’,           ‘someone  tell  ...
Extracting  and  Scoring  PaTerns          pis=  `does  anyone  know  how’Monday, June 6, 2011
Extracting  and  Scoring  PaTernsMonday, June 6, 2011
Expanding  the  PaTern  Pool         Functional  properties  /  communicative  functions           of  words         From ...
Details  in  [WISE2009]  for..            Over  iterations,  single-­‐‑word  substitutions,              functional  usage...
Sample  Extracted  PaTernsMonday, June 6, 2011
Identifying  Monetizable  Posts        Information  Seeking  paMerns  generated  offline        Information  seeking  intent...
Keywords  for  Advertizing       Identifying keywords in monetizable posts       " –Plethora of work in this space       O...
Keywords  for  Advertising       Identifying keywords in monetizable posts      ‣ Plethora of work in this space       Off...
Conceptual  Overview                         (also  see  slides  88,89)         Topical hints      ‣ C1 -[camcorder]      ...
Off-­‐‑topic  ChaTer       C1 -[camcorder]       C2 -[electronics forum, hd, camcorder, somethin,       ive, canon, little ...
Evaluations  -­‐‑User  Study       Keywords from 60 monetizable user posts      ‣ Monetizable intent, at least 3 keywords ...
1.  Effectiveness  of  using                              topical  keywords       Google AdSenseads for user post vs. extra...
Instructions  –User  StudyMonday, June 6, 2011
Result  -­‐‑2X  Relevant  Impressions       Users picked ads relevant to the post      ‣ At least 50% inter-evaluator agre...
2.  Profile  Ads  vs.  Activity  Ads       User’s profile information      ‣ Interests, hobbies, TV shows..      ‣ Non-demog...
Result  -­‐‑8X  Generated  Interest       Using profile ads      ‣ Total of 56 ad impressions      ‣ 7% of ads generated in...
To  note…       User studies small and preliminary, clearly suggest      ‣ Monetization potential in user activity      ‣ ...
To  note…       A world between relevant impressions and click       throughs      ‣ Objectionable content, vocabulary imp...
SENTIMENT  /  OPINION                           MININGMonday, June 6, 2011
Content  Analysis:  Sentiment                 Analysis/Opinion  Mining       Two main types of information we can learn fr...
Sentiment  Analysis  Motivation                                                Why do           Which movie     What custo...
Sentiment  Analysis:  Tasks       Example:      ‣ How awful that many #Egyptian artifacts are in danger of            bein...
Sentiment  Analysis:  TasksMonday, June 6, 2011
Sentiment  Analysis:  Tasks   Classification: overall sentiment polarity: positive/ neutral/negative      ‣Example: “How aw...
Sentiment  Analysis:  TasksMonday, June 6, 2011
Sentiment  Analysis:  Tasks   Identification & Extraction: opinion, opinion holder, opinion target   Example: opinion="awfu...
Sentiment  Analysis:  Approaches       Classification:      ‣ Supervised:         ‣ labeled training data           ‣ featu...
Sentiment  Analysis:  ApproachesMonday, June 6, 2011
Sentiment  Analysis:  Approaches      Identification & Extraction:      ‣utilizing the relations between opinion and opinio...
Sentiment  Analysis:                         From  Tweets  to  polls                                                      ...
Sentiment  Analysis:                         From  Tweets  to  polls                                                      ...
Sentiment  Analysis:                         From  Tweets  to  polls                                                      ...
Sentiment  Analysis:                         From  Tweets  to  polls                                                      ...
Sentiment  Analysis:  Predicting              the  Future  With  Social  Media     Corpus: 2.89 million tweets referring t...
Sentiment  Analysis:  Predicting              the  Future  With  Social  Media     Sentiment Analysis Classifier:          ...
Sentiment  Analysis:  Predicting              the  Future  With  Social  Media            DynamicLMClassifier provided by L...
Sentiment  Analysis:  Predicting              the  Future  With  Social  Media           thousands of workers from the Ama...
Sentiment  Analysis:  Predicting              the  Future  With  Social  Media            train the classifier using an n-g...
Sentiment  Analysis:  Predicting              the  Future  With  Social  Media        S.  Asur  and  B.Huberman.  Predicti...
Sentiment  Analysis:  Target-­‐‑specific  opinion            identification  &  Classification  of            Tweets-­‐‑Unsup...
Sentiment  Analysis:  Target-­‐‑specific  opinion             identification  &  Classification  of             Tweets-­‐‑Uns...
Sentiment  Analysis:  Target-­‐‑specific  opinion             identification  &  Classification  of             Tweets-­‐‑Uns...
Sentiment  Analysis:  Target-­‐‑specific  opinion             identification  &  Classification  of             Tweets-­‐‑Uns...
Sentiment  Analysis:  Target-­‐‑specific  opinion             identification  &  Classification  of             Tweets-­‐‑Uns...
Sentiment  Analysis:  Target-­‐‑specific  opinion            identification  &  Classification  of            Tweets-­‐‑Unsup...
Sentiment  Analysis:  Target-­‐‑         specific  opinion  identification  &               Classification  of  Tweets-­‐‑   ...
Sentiment  Analysis:  Target-­‐‑         specific  opinion  identification  &               Classification  of  Tweets-­‐‑   ...
Content  Analysis:  Context                     Extraction,  Utilization        URL  Extraction  is  for  Tweets        Fo...
Content  Analysis:                          URL  extraction       Resolution       Semantic Context RelevanceMonday, June ...
Author  Categorization:  Using               Content  to  derive  additional                    People  metadata       Per...
People  Analysis:  Using  Network               to  derive  People  metadata       Interesting questions to ask:      ‣   ...
People  Analysis:  Influence       By Link Analysis Algorithms       Hits [K-99] & variants         PageRank [BP-97] & vari...
People  Analysis:  InfluenceMonday, June 6, 2011
People  Analysis:  Influence          Flavor of Context Analysis (activity level)          Popularity NOT = Influence!      ...
People  Analysis:  User  types                          &  Affiliation       Blogger, Scientist, Journalist, Artist, Trustee...
People  Analysis:  User  types                          &  AffiliationMonday, June 6, 2011
People  Analysis:  User  types                          &  Affiliation      Semantic analysis of profile description       ‣ ...
People  Analysis:                               Social  Engagement             Source: http://www.syscomminternational.com...
Network  Analysis             Foundation  of  network:            •Nodes          •Connections/Relationships    Interestin...
Network  Analysis:  Methods                                     Source: http://www.kudos-                                 ...
Network  Analysis:  Methods           Network  Structure  metrics      Centrality,  Connected  Component,  Avg.    Degree,...
Network  Analysis:  Algorithms         Community Discovery, growth, evolution      ‣ Based on relationship types (e.g., si...
Network  Analysis:  Algorithms         Graph Partitioning & Traversal       Best time-complexity & reachability       Foll...
Network  Analysis:  Methods       Network Modeling Approaches       ‣     Random graph model (Erdos-Renyi model)      ‣   ...
Network  Analysis:                         Diffusion  &  Homophily       Information Flow: Diffusion      ‣ Maximizing Spre...
QueryingMonday, June 6, 2011
Analysis  &  Visualization  Tools       (Network WorkBench)NWB       Truthy       Graph-tool       Orange       Pajek     ...
Event  DetectionMonday, June 6, 2011
Citizen  Sensing  in  Real-­‐‑timeMonday, June 6, 2011
Real-­‐‑Time  Motivation       People cant wait for Information       500 years ago      ‣     Single life time       20 y...
Real-­‐‑Time  Social  Media       Is Real-Time the future of Web?       Social Media for Real-Time Web      ‣ Disaster Man...
           Scenario        The	  Guardian          Feb	  2010Monday, June 6, 2011
           Scenario        The	  Guardian          Feb	  2010Monday, June 6, 2011
           Scenario        The	  Guardian          Feb	  2010                                      JournalistMonday, June ...
Challenges       Information Overload      ‣ Can we aggregate, organize and collectively analyze data       Real Time     ...
A  Semantic  Web  Approach       Expressive description of Information need      ‣ Using SPARQL (Instead of traditional ke...
Concept  FeedMonday, June 6, 2011
ArchitectureMonday, June 6, 2011
Social  Sensor  ServerMonday, June 6, 2011
Metadata  Extractions                           (Social  Sensor  Server)       Named Entity Recognition      ‣ 2 Million E...
Metadata  Extractions                           (Social  Sensor  Server)       URL, HashTag Extraction      ‣ Regex extrac...
Metadata  Extractions                           (Social  Sensor  Server)Monday, June 6, 2011
Metadata  Extractions                           (Social  Sensor  Server)    Other Metadata provided by Twitter     ‣ User ...
Structured  Data                       (Social  Sensor  Server)       RDF Annotation      ‣ Common RDF/OWL Vocabularies   ...
Structured  Data                       (Social  Sensor  Server)Monday, June 6, 2011
Structured  Data                       (Social  Sensor  Server)                                  A snippet of the annotati...
Semantic  PublisherMonday, June 6, 2011
Semantic  Publisher       Virtuoso to store triples       Queries formulated by the users are stored       SPARQL protocol...
Application  Server  &  Distribution                       HubMonday, June 6, 2011
Application  Server  &  Distribution                       Hub          Distribution  Hub      ‣       PUSH  Model  -­‐‑  ...
Brand  Tracking  -­‐‑  Example                                                                  Background  Knowledge  (e....
Brand  Tracking  -­‐‑  Example                                                          Background  Knowledge  (e.g.  DBpe...
Brand  Tracking  -­‐‑  Example                                                          Background  Knowledge  (e.g.  DBpe...
Brand  Tracking  -­‐‑  Example                                                          Background  Knowledge  (e.g.  DBpe...
1242  Articles  from  Nytimes        Around  800,000  tweetsMonday, June 6, 2011
President  Obama          1242  Articles  from  Nytimes     lays  out  plan  for          Around  800,000  tweets         ...
President  Obama          1242  Articles  from  Nytimes     lays  out  plan  for          Around  800,000  tweets         ...
Twarql  on  Linked  Open  DataMonday, June 6, 2011
Twarql  on  Linked  Open  DataMonday, June 6, 2011
Emerging  Research  Areas  Monday, June 6, 2011
Spam  in  Social  Networks       Reasons for spamming include:      ‣ Gaining Popularity        ‣ Use of popular topic rel...
Spam  in  Social  Networks       Gaining popularity using trending keywords:       This tweet uses #Cairo but refers to a ...
Spam  in  Social  Networks       Gaining popularity using trending keywords:       This tweet uses #Cairo but refers to a ...
Spam  in  Social  Networks       Gaining popularity using trending keywords:       This tweet uses #Cairo but refers to a ...
Spam  in  Social  Networks       Gaining popularity using trending keywords:       This tweet uses #Cairo but refers to a ...
Spam  in  Social  Networks       Gaining popularity using trending keywords:       This tweet uses #Cairo but refers to a ...
Spam  in  Social  Networks       Gaining popularity using trending keywords:       This tweet uses #Cairo but refers to a ...
Spam  in  Social  Networks       Gaining popularity using trending keywords:       This tweet uses #Cairo but refers to a ...
Spam  in  Social  Networks       Gaining popularity using trending keywords:       This tweet uses #Cairo but refers to a ...
Spam  in  Social  Networks       Gaining popularity using trending keywords:       This tweet uses #Cairo but refers to a ...
Spam  in  Social  Networks       Spam detection      ‣ Content-based features        ‣ Content Size, URL type, spam words ...
Trust  in  Social  Networks       Reputation, Policy, Evidence, and Provenance used       to derive trustworthiness.      ...
Trust  in  Social  Networks       We propose trust ontology[AHTS-10] that      ‣ Captures semantics of trust.      ‣ Enabl...
Trust  in  Social  Networks       Gleaning primitive (edge) trust      ‣ Trust value between two nodes is quantified using ...
Integrating  Social  And                            Sensor  Networks       Machine sensor observations are quantitative in...
Integrating  Social  And                            Sensor  Networks       Applications of integrating heterogeneous senso...
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Upcoming SlideShare
Loading in...5
×

Citizen Sensing, Social Media Analytics, and Applications

4,961

Published on

Description: http://semtech2011.semanticweb.com/sessionPop.cfm?confid=62&proposalid=3845

Original version: http://slidesha.re/social-WWW

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,961
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
97
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Citizen Sensing, Social Media Analytics, and Applications

  1. 1. Citizen  Sensor  Data  Mining,     Social  Media  Analytics  and   Development  Centric  Web  Applications. Tutorial  at   Semantic  Technology  Conference,   San  Francisco,  CA. Karthik Gomadam Amit Sheth Selvam Velmurugan Accenture Technology Labs, Kno.e.sis @ eMoksha, Kiirti San Jose Wright State UniversityMonday, June 6, 2011
  2. 2. Meena Nagarajan Selvam Velmurugan (Content Analysis) (Kiirti, eMoksha NGOs) Hemant Purohit Amit Sheth (People & Network analysis) (Semantic Web) Ashutosh Jadhav (Event Analysis) Lu Chen Pramod Anantharam (Sentiment Analysis) (Social & Sensor web) Pavan Kapanipathi (Real Time Web)Monday, June 6, 2011
  3. 3. A  Quick  Word Much  of  the  work  discussed  in  this  tutorial  is   primarily  the  doctoral  research  by  Dr.  Meena   Nagarajan,  currently  at  IBM  Almaden.  It  also   includes  current  work  done  at  kno.e.sis  center  at   Wright  State  University.Monday, June 6, 2011
  4. 4. Outline Citizen  Sensing:  Role,  Enablers,  Apps     Systematic  Study  Social  Media Citizen  Sensing  @  Real-­‐‑time Emerging  Research  Areas ‣ Spam  and  Trust  in  Social  Media,  Mobile  Social  Computing Research  Application:  Twitris Tutorial  part  2  Monday, June 6, 2011
  5. 5. Citizen  Sensing Everyday users of Web2.0 and social networks: Citizens of an Internet- or Web-enabled social community Observation and Information reported by citizens => Citizen Sensing Human-in-the-loop (participatory) sensing + Web 2.0 + mobile computing = emergence of  " citizen-sensor networksMonday, June 6, 2011
  6. 6. Social  Signals The activity of observing, reporting, disseminating information via text, audio, video and built in device sensor (and smart devices), ‣ Creating social signals through aggregation, enhancement, analysis, visualization, and interpretation. Immense potential to disseminate information quickly and in real-timeMonday, June 6, 2011
  7. 7. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Mobile device fast emerging as our primary tool ‣ Redefines the way we engage with people, information, etc. Global, Ubiquitous, always available Sense where you are, how you are, …Monday, June 6, 2011
  8. 8. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Global, Ubiquitous, always available Sense where you are, how you are, …Monday, June 6, 2011
  9. 9. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Sense where you are, how you are, …Monday, June 6, 2011
  10. 10. Enablers:  Mobile  Devices  &   Ubiquitous  ConnectivityMonday, June 6, 2011
  11. 11. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Mobile Platforms Hit Critical Mass  ‣ Over 5 billion users ‣ 1+B with internet connected mobile devices (2010) ‣ Smartphones > Notebooks + Netbooks (2010E) ‣ 500K+ mobile phone applications ‣ 74% of mobile phone users (2.4B) worldwide texted (2007)Monday, June 6, 2011
  12. 12. Enablers:  Web  2.0  &  Social  Media 500M+ Facebook Users 100M+ Twitter users, 85M+ tweets/day Internet Users: 1.8 Bln Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes)Monday, June 6, 2011
  13. 13. Enablers:  Web  2.0  &  Social  Media 100M+ Twitter users, 85M+ tweets/day Internet Users: 1.8 Bln Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes)Monday, June 6, 2011
  14. 14. Enablers:  Web  2.0  &  Social  Media Internet Users: 1.8 Bln Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes)Monday, June 6, 2011
  15. 15. Enablers:  Web  2.0  &  Social  Media Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes)Monday, June 6, 2011
  16. 16. Enablers:  Web  2.0  &  Social  MediaMonday, June 6, 2011
  17. 17. Enablers:  Web  2.0  &  Social  Media Types of UGC: Twitter(text/microblogs), Facebook (multimedia),YouTube(videos), Flicker(images), Blogs(text),  Ping: (Social network for music) Monday, June 6, 2011
  18. 18. Enablers:  Web  2.0  &  Social  Media Flicker(images), Blogs(text),  Ping: (Social network for music) Monday, June 6, 2011
  19. 19. Enablers:  Web  2.0  &  Social  Media Ping: (Social network for music) Monday, June 6, 2011
  20. 20. Enablers:  Web  2.0  &  Social  MediaMonday, June 6, 2011
  21. 21. Citizen  Sensors  in  Action Iran election Haiti Earthquake US healthcare debateMonday, June 6, 2011
  22. 22. Revolution  2.0    Political/Social  Activism “If you want to liberate a government, give them the internet.” - Wael Ghonim (Egyptian social activist) When Blitzer asked “Tunisia, then Egypt, what’s next?,” Ghonim replied succinctly “Ask Facebook.”Monday, June 6, 2011
  23. 23. Revolution  2.0    Political/Social  Activism When Blitzer asked “Tunisia, then Egypt, what’s next?,” Ghonim replied succinctly “Ask Facebook.”Monday, June 6, 2011
  24. 24. Revolution  2.0    Political/Social  ActivismMonday, June 6, 2011
  25. 25. Citizen  Journalism Twitter JournalismMonday, June 6, 2011
  26. 26. Social  Media  Influence:   Intelligence,  News  &  Analysis   Many media companies use Facebook and Twitter as news-delivery platform. Many individuals rely on them as news source. News is increasingly social.Monday, June 6, 2011
  27. 27. Business  Intelligence  Trend   SpoTing,  Forecasting,  Brand   Tracking    and  Crisis  Management Sysomos  : http://www.sysomos.com/ Trendspotting  : http://trendspotting.com Simplify : http://simplify360.com/ Shoutlet  : http://www.shoutlet.com/ Reputation (Defender)  : http://www.reputationdefender.com/Monday, June 6, 2011
  28. 28. Development   (Education,  Health,  eGov) LiveMocha  (http://www.livemocha.com/) ‣ Online Language learning tool with social engagement  ‣ bridging the gap!! Soliya (http://www.soliya.net/) ‣ Dialogue between students from diverse " backgrounds across the globe using latest multimedia technologies Project Einstein (http://digital-democracy.org/what-we-do/programs/)  ‣ A photography-based digital penpal program connecting youths in refugee camps to the worldMonday, June 6, 2011
  29. 29. Development   (Education,  Health,  eGov) Soliya (http://www.soliya.net/) ‣ Dialogue between students from diverse " backgrounds across the globe using latest multimedia technologies Project Einstein (http://digital-democracy.org/what-we-do/programs/)  ‣ A photography-based digital penpal program connecting youths in refugee camps to the worldMonday, June 6, 2011
  30. 30. Development   (Education,  Health,  eGov) Project Einstein (http://digital-democracy.org/what-we-do/programs/)  ‣ A photography-based digital penpal program connecting youths in refugee camps to the worldMonday, June 6, 2011
  31. 31. Development   (Education,  Health,  eGov)Monday, June 6, 2011
  32. 32. Development   (Education,  Health,  eGov) PatientsLikeMe (http://mashable.com/2010/07/13/social-media-health-trends/)   TrialX (http://trialx.com) Image:  hMp://www.dragonsearchmarketing.com/ blog/ social-­‐‑media-­‐‑development-­‐‑through-­‐‑visual-­‐‑aids-­‐‑ tools/  Monday, June 6, 2011
  33. 33. Why  People-­‐‑Content-­‐‑Network   metadata?Monday, June 6, 2011
  34. 34. Dimensions  of  Systematic  Study   of  Social  Media Spatio - Temporal -Thematic + People - Content - NetworkMonday, June 6, 2011
  35. 35. Social  Information Processing "Who says what, to whom, why, to what extent and with what effect?" [Laswell] Network: Social structure emerges from the aggregate of relationships (ties) People: poster identities, the active effort of accomplishing interaction Content : studying the content of ommunication. Monday, June 6, 2011
  36. 36. Studying  Online  Human  Social   Dynamics How  does  the  (semantics  or  style  of)  content  fit   into  the  observations  made  about  the  network? ‣ Often,  the  three-­‐‑dimensional  dynamic  of  people,   content  and  link  structure  is  what  shapes  the  social   dynamic.  Monday, June 6, 2011
  37. 37. Studying  Online  Human  Social   DynamicsMonday, June 6, 2011
  38. 38. Studying  Online  Human  Social   Dynamics Example:  how  does  the  topic  of  discussion,   emotional  charge  of  a  conversation,  the  presence  of  an   expert  and  connections  between  participants;  together   explain  information  propagation  in  a  social  network?  Monday, June 6, 2011
  39. 39. Studying  Online  Human  Social   DynamicsMonday, June 6, 2011
  40. 40. Metadata/Annotations Metadata: an organized way to study ‣ types ‣ creation/extraction and storage ‣ useMonday, June 6, 2011
  41. 41. The  Anatomy  of  a  TweetMonday, June 6, 2011
  42. 42. People  Metadata:  Variety  of   Self-­‐‑expression  Modes  on    Multiple   Social  Media  Platforms Explicit  information  from  user  profiles   ‣ User  Names,  Pictures,  Videos,  Links,  Demographic   Information,  Group  memberships... ‣ Often  is  not  updated         Implicit  information  from  user  a+ention  metadata ‣ Page  views,  Facebook  ʹLikesʹ,  Comments;  TwiMer   ʹFollowsʹ,  Retweets,  Replies.. Monday, June 6, 2011
  43. 43. People  Metadata:  Various  Levels Demographic Interests Activity NetworkMonday, June 6, 2011
  44. 44. People  Metadata:  Continued User Demographic Metadata Interest Level Metadata •User-id •Author type   •Screen/Display-name of •Trustee/donor, journalist, user blogger, scientist etc. •Real name of user • Favorite tweets •Location • Types of lists subscribed •Profile Creation Date • Style of Writing – •User description personality indicator •User Bio • No. of Followees •URL • Author type trend of FolloweesMonday, June 6, 2011
  45. 45. People  Metadata:  Continued Activity  Level  Metadata Influence  Level  Metadata   (Inferring  People  Metadata  from  Network  level  Information) •Age  of  the  profile •No.  of  Followers  –  normal,  influential •Frequency  of  posts •No.  of  Mentions •Timestamp  of  last  status •No.  of  Retweets/Forwards •No.  of  Posts •No.  of  Replies •No.  of  Lists/groups  created •No.  of  Lists/groups  following   •No.  of  Lists/groups  subscribed •No.  of  people  following  back •Authority  &  Hub  Scores Web Presence: •User affiliations •KLOUT Score – influence measure (www.klout.com)Monday, June 6, 2011
  46. 46. Content  Metadata Content Independent metadata ‣" date, location, author etc Content Dependent metadata ‣ Direct content-based metadata ‣ Explicit/Mentioned Content metadata ‣ named entities in content ‣ Implicit/Inferred Content Metadata ‣ related named entities from knowledge sources ‣ Indirect content-based metadata (External metadata) ‣ context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)Monday, June 6, 2011
  47. 47. Content  Metadata Content Dependent metadata ‣ Direct content-based metadata ‣ Explicit/Mentioned Content metadata ‣ named entities in content ‣ Implicit/Inferred Content Metadata ‣ related named entities from knowledge sources ‣ Indirect content-based metadata (External metadata) ‣ context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)Monday, June 6, 2011
  48. 48. Content  MetadataMonday, June 6, 2011
  49. 49. Content  Independent  Metadata For Tweets ‣ Published date and time ‣ Location (where tweet was generated from) ‣ Tweet posting method (smart-phone, twitter.com, clients for twitter) ‣ Author informationMonday, June 6, 2011
  50. 50. Content  Independent  MetadataMonday, June 6, 2011
  51. 51. Content  Independent  Metadata For Text messages ‣ Published date and time ‣ Origin location ‣ Recipient ‣ Carrier informationMonday, June 6, 2011
  52. 52. Content  Independent  MetadataMonday, June 6, 2011
  53. 53. Content  Independent  MetadataMonday, June 6, 2011
  54. 54. Content  Dependent  Metadata  (Tweet)   Direct  Content-­‐‑based  Metadata Direct Content-based Metadata Indirect content-based metadata (External metadata)Monday, June 6, 2011
  55. 55. Content  Dependent  Metadata   Direct  Content-­‐‑based  MetadataMonday, June 6, 2011
  56. 56. Network  Metadata Connections/Relationships (foundation for the network) matter! Structure  Level  Metadata Relationship  Level  Metadata •Community  Size •Type  of  Relationship •Community  growth  rate •Relationship  strength •Largest  Strongly  Connected   •User  Homophily  based  on   Component  size certain  characteristic  (e.g.,   •Weakly  Connected  Components   Location,  interest  etc.) &  Max.  size •Reciprocity:  mutual  relationship •Average  Degree  of  Separation •Active  Community/  Ties •Clustering  Coefficient  Monday, June 6, 2011
  57. 57. Metadata:  Creation,  Extraction   and  StorageMonday, June 6, 2011
  58. 58. Metadata  Creation  &  Extraction Extracted Metadata ‣ Directly visible information from the user profile, tweet content & community structure Created Metadata ‣ After processing information in the user profile, content and/or network structureMonday, June 6, 2011
  59. 59. An  Example Length: 144 characters; General topic: Egypt protest  This poor {sentiment_expression: {target:”Lara Logan”, polarity:”negative”}} woman! RT @THR CBS News{entity:{type=”News Agency”}} Lara Logan {entity:{type=”Person”}} Released From Hospital {entity:{type=”Location”}} After Egypt{entity: {type=”Country”} Assault{type=”topic”} http://bit.ly/dKWTY0 {external_URL}Monday, June 6, 2011
  60. 60. Why  Semantic  Web  is  a  standard     for  social  metadata? Rich  Snippet,  RDFa,  open  graph,  semantic  web   based  social  data  standards Relationships/connections  play  central  role ‣ Relationships  as  first  class  object  is  importantMonday, June 6, 2011
  61. 61. Semantic  Web:  A  Very  Short   PrimerMonday, June 6, 2011
  62. 62. Semantic  Web:  A  Very  Short   Primer Representation ‣ RDF ‣ relationships as first class object <subject, predicate,object> ‣ OWL ‣ Representing Knowledge  and Agreements: nomenclature, taxonomy, folksonomy, ontologyMonday, June 6, 2011
  63. 63. Semantic  Web:  A  Very  Short   PrimerMonday, June 6, 2011
  64. 64. Semantic  Web:  A  Very  Short   Primer Annotation ‣ RDFa, Xlink, model referenceMonday, June 6, 2011
  65. 65. Semantic  Web:  A  Very  Short   Primer Annotation ‣ RDFa, Xlink, model reference Web of Data ‣ Linked Open Data Monday, June 6, 2011
  66. 66. Semantic  Web:  A  Very  Short   Primer Annotation ‣ RDFa, Xlink, model reference Web of Data ‣ Linked Open Data  Querying ‣ SPARQL; Rules: SWRL, RIFMonday, June 6, 2011
  67. 67. How  to  save  and  use  metadata? Store metadata as data and use standard database techniques Use filtering and clustering, summarization, statistics - implicit semanticsMonday, June 6, 2011
  68. 68. How  to  save  and  use  metadata? Use filtering and clustering, summarization, statistics - implicit semanticsMonday, June 6, 2011
  69. 69. How  to  save  and  use  metadata?Monday, June 6, 2011
  70. 70. How  to  save  and  use  metadata?Monday, June 6, 2011
  71. 71. How  to  save  and  use  metadata? Use explicit semantics and Semantic Web standards and technologies ‣semantics = meaning ‣richer representation, support for relationships, context ‣supports use of background knowledge ‣better integration, powerful analysis  Semantics- the implicit, the formal and the powerful Social metadata on the WebMonday, June 6, 2011
  72. 72. Metadata  Extraction  from   Informal  Text Meena Nagarajan, Understanding User-Generated Content on Social Media, Ph.D. Dissertation, Wright State University, 2010Monday, June 6, 2011
  73. 73. Characteristics  of  Text  on  Social   MediaMonday, June 6, 2011
  74. 74. The  Formality  of  TextMonday, June 6, 2011
  75. 75. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Recognize key entities mentioned in content ‣ Information Extraction (entity recognition, anaphora resolution, entity classification..) ‣ Discovery of Semantic Associations between entities Topic Classification, Aboutness of content  ‣ What is the content about? Intention Analysis  ‣ Why did they share this content?Monday, June 6, 2011
  76. 76. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Topic Classification, Aboutness of content  ‣ What is the content about? Intention Analysis  ‣ Why did they share this content?Monday, June 6, 2011
  77. 77. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Intention Analysis  ‣ Why did they share this content?Monday, June 6, 2011
  78. 78. Content  Analysis-­‐‑Typical  Sub-­‐‑tasksMonday, June 6, 2011
  79. 79. Content  Analysis-­‐‑Typical  Sub-­‐‑tasksMonday, June 6, 2011
  80. 80. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Sentiment Analysis ‣What opinions are people conveying via the content? Author Profiling ‣What can we infer about the author from the content he posts? Context (external to content) extraction ‣URL extraction, analyzing external contentMonday, June 6, 2011
  81. 81. Research  Efforts,  Contributions  in   this  space.. Examining usefulness of multiple context cues for text mining algorithms ‣ Compensating for for informal, highly variable language, lack of context ‣ Using context cues: Document corpus, syntactic, structural cues, social medium, external domain knowledge… In this talk, highlighting sample metadata creation tasks: NER, Key Phrase Extraction, Intention, Sentiment/Opinion MiningMonday, June 6, 2011
  82. 82. Part  1.  NER,                                                              Key   Phrase  Extraction Named Entity Recognition ‣ I loved <movie> the hangover </movie>! Key Phrase ExtractionMonday, June 6, 2011
  83. 83. Multiple  Context  Cues  Utilized  for   NER  in  Blogs  and  MySpace  Monday, June 6, 2011
  84. 84. Multiple  Context  Cues  Utilized  for   Keyphrase  Extraction  from  TwiTer,   Facebook  and  MySpaceMonday, June 6, 2011
  85. 85. Focus,  Impact Techniques focus on ‣ relatively less explored content aspects on social media platforms Combination of top-down, bottom-up analysis for informal text ‣ Statistical NLP, ML algorithms over large corpora ‣ Models and rich knowledge bases in a domainMonday, June 6, 2011
  86. 86. NAMED  ENTITY   RECOGNITIONMonday, June 6, 2011
  87. 87. NAMED  ENTITY   RECOGNITION I loved your music Yesterday! “It was THE HANGOVER of the year..lasted forever.. So I went to the movies..badchoice picking “GI Jane”worse now”Monday, June 6, 2011
  88. 88. NAMED  ENTITY   RECOGNITION Identifying and classifying tokensMonday, June 6, 2011
  89. 89. NER  in  prior  work  vs.  NER  for   Informal  TextMonday, June 6, 2011
  90. 90. Cultural  Named  Entities  NER  focus  in  this  work:  Cultural  Named   Entities Artifacts  of  Culture   ‣ Name  of  a  books,  music  albums,  films,  video  games,   etc. Common  words  in  a  language ‣ The  Lord  of  the  Rings,  Lips,  Crash,  Up,  Wanted,   Today,  Twilight,  Dark  Knight…Monday, June 6, 2011
  91. 91. Characteristics  of  Cultural  Entities Varied senses, several poorly documented ‣ Merry Christmas covered by 60+ artists Star Trek: movies, TV series, media franchise.. and cuisines !! Changing contexts with recent events ‣ The Dark Knight reference to Obama, health care reform Unrealistic expectations ‣ Comprehensive sense definitions, enumeration of contexts, labeled corpora for all senses .. ‣ NER Relaxing the closed-world sense assumptionsMonday, June 6, 2011
  92. 92. NER  in  prior  work  vs.     NER  for  Informal  TextMonday, June 6, 2011
  93. 93. A  Spot  and  Disambiguate   Paradigm NER generally a sequential prediction problem ‣ NER system that achieves 90.8 F1 score on the CoNLL-2003 NER shared task (PER, LOC, ORGN entities) [Lev Ratinov, Dan Roth] Focus of approach: Spot and Disambiguate Paradigm Starting off with a dictionary or list of entities we want to spotMonday, June 6, 2011
  94. 94. A  Spot  and  Disambiguate   Paradigm Spot, then disambiguate in context (natural language, domain knowledge cues) Binary Classification Is this mention of “the hangover” in a sentence referring to a movie?Monday, June 6, 2011
  95. 95. NER  in  prior  work  vs.                         NER  for  Informal  TextMonday, June 6, 2011
  96. 96. Algorithmic  Contributions   Supervised  AlgorithmsMonday, June 6, 2011
  97. 97. Algorithmic  Contributions   Supervised  Algorithms Examples: “I am watching Pattinson scenes in <movie id=2341> Twilight</movie> for the nth time.” “I spent a romantic evening watching the Twilight by the bay..” “I love <artist id=357688>Lily’s</artist> songMonday, June 6, 2011
  98. 98. Multiple  Senses  in  the  Same   DomainMonday, June 6, 2011
  99. 99. Algorithm  Preliminaries Problem Defn ‣ Cultural Entity Identification : Music album, tracks ‣ Smile (Lilly Allen), Celebration (Madonna) Corpus: MySpace comments ‣ Context-poor utterances " “Happy 25th Lilly, Alfieis funny”Monday, June 6, 2011
  100. 100. Algorithm  Preliminaries Corpus: MySpace comments ‣ Context-poor utterances " “Happy 25th Lilly, Alfieis funny”Monday, June 6, 2011
  101. 101. Algorithm  Preliminaries " “Happy 25th Lilly, Alfieis funny”Monday, June 6, 2011
  102. 102. Algorithm  Preliminaries Goal:  Semantic  Annotation  of   music  named  entities  (w.r.t   MusicBrainz)Monday, June 6, 2011
  103. 103. Using  a  Knowledge  Resource  for   NER  is  not  straight-­‐‑forward..Monday, June 6, 2011
  104. 104. Approach  Overview   Scoped Relationship graphs ‣Using context cues from the content, webpage title, url… new Merry Christmas tune ‣Reduce potential entity spot size new albums/songs ‣Generate candidate entities ‣Spot and DisambiguateMonday, June 6, 2011
  105. 105. Sample  Real-­‐‑world  Constraints Career Restrictions ‣“release your third album already..” Recent Album restrictions ‣“I loved your new album..” Artist age restrictions ‣”happy 25th rihanna, loved alfie btw..” etc.Monday, June 6, 2011
  106. 106. Non-­‐‑Music  Mentions Challenge 1: Several senses in the same domain ‣ Scoping relationship graphs narrows possible senses ‣ Solves the named entity identification problem partially Challenge 2: Non-music mentions ‣ Got your new album Smile. Loved it! ‣ Keep your SMILE on! " " " " " " " "Monday, June 6, 2011
  107. 107. Non-­‐‑Music  Mentions Challenge 1: Several senses in the same domain ‣ Scoping relationship graphs narrows possible senses ‣ Solves the named entity identification problem partially Challenge 2: Non-music mentions ‣ Got your new album Smile. Loved it! ‣ Keep your SMILE on! " " " " " " " "Monday, June 6, 2011
  108. 108. Using  Language  Features  to   eliminate  incorrect  mentions.. Syntactic features ‣ POS Tags, Typed dependencies.. ‣ Example here Word-level features ‣ Capitalization, Quotes Domain-level featuresMonday, June 6, 2011
  109. 109. Supervised  LearnersMonday, June 6, 2011
  110. 110. Hand  Labeling  -­‐‑  Fairly  Subjective 1800+  spots  in  MySpace  user  comments  from   artist  pages   Keep  your  SMILE  on! –good  spot,  bad  spot,  inconclusive? 4-­‐‑way  annotator  agreements –Madonna  90%  agreement –Rihanna  84%  agreement –Lily  Allen  53%  agreementMonday, June 6, 2011
  111. 111. Dictionary  SpoTer  +  NLP  Step   Daniel  Gruhl,  Meena  Nagarajan,  Jan  Pieper,  Christine  Robson,  Amit  Sheth,  Context  and  Domain   Knowledge  Enhanced  Entity  SpoMing  in  Informal  Text,  The  8th  International  Semantic  Web  Conference,   2009:  260-­‐‑276  Monday, June 6, 2011
  112. 112. NER  on  Social  Media  Text  using   Domain  Knowledge Highlights issues with using a domain knowledge for an IE task Two stage approach: chaining NL learners over results of domain model based spotters Improves accuracy up to a further 50% ‣ allows the more time-intensive NLP analytics to run on less than the full set of input dataMonday, June 6, 2011
  113. 113. BBC  SoundIndex  (IBM  Almaden):   Pulse  of  the  Online  Music   " "   Daniel  Gruhl,  Meenakshi  Nagarajan,  Jan  Pieper,  Christine  Robson,  Amit  Sheth:  “Multimodal  Social   Intelligence  in  a  Real-­‐‑Time  Dashboard  System,”  special  issue  of  the  VLDB  Journal  on  "ʺData  Management   and  Mining  for  Social  Networks  and  Social  Media"ʺ,  2010    CHECK    hMp://www.almaden.ibm.com/cs/ projects/iis/sound/Monday, June 6, 2011
  114. 114. The  Vision http://www.almaden.ibm.com/cs/projects/iis/sound/Monday, June 6, 2011
  115. 115. Monday, June 6, 2011
  116. 116. Several  Insights Trending  popularity  of  artists Trending  topics  in  artist  pages Only  4%  -­‐‑ve  sentiments,  perhaps  ignore  the  Sentiment Ignoring  Spam  can  change  ordering    Annotator  on  this  data  source? of  popular  artistsMonday, June 6, 2011
  117. 117. Predictive  Power  of  Data Billboards Top 50 Singles chart during the week of Sept 22-28 ’07 vs. MySpace popularity charts. User study indicated 2:1 and upto 7:1 (younger age groups) preference for MySpace list. Challenging traditional polling methods!Monday, June 6, 2011
  118. 118. Key  Phrase  ExtractionMonday, June 6, 2011
  119. 119. Key  Phrase  Extraction:  Example Key phrases extracted from prominent discussions on Twitter around the 2009 Health Care Reform debate and 2008 Mumbai Terror Attack on one dayMonday, June 6, 2011
  120. 120. Key  Phrase  Extraction  from  SM   Text Different from Information Extraction Extracting vs. Assigning Key Phrases " Focus: Key Phrase Extraction Prior work focus: extracting phrases that summarize a document -- a news article, a web page, a journal article, a book.. Focus: summarize multiple documents (UGC) around same event/topic of interestMonday, June 6, 2011
  121. 121. Key  Phrase  Extraction  on  SM   Content Focus: Summarizing Social Perceptions via key phrase extraction Preserving/Isolating the social behind the social data ‣"What is said in Egypt vs. the USA should be viewed in isolationMonday, June 6, 2011
  122. 122. Key  Phrase  Extraction  on  SM   Content ‣ Accounting for redundancy, variability, off-topic content " “Met up with mom for lunch, she looks lovely as ever, good genes .. Thanks Nike, I love my new Gladiators ..smooth as a feather. I burnt all the calories of Italian joy in one run.. if you are looking for good Italian food on Main, Bucais the place to go.”Monday, June 6, 2011
  123. 123. Social  and  Cultural  Logic  in  SMC Thematic components ‣ similar messages convey similar ideas Space, time metadata ‣ role of community and geography in communication Poster attributes ‣ age, gender, socio-economic status reflect similar perceptionsMonday, June 6, 2011
  124. 124. Feature  Space  (common  to  several   efforts) Focus: n-grams, spatio-temporal metadata (social components) Syntactic Cues: In quotes, italics, bold; in document headers; phrases collocated with acronymsMonday, June 6, 2011
  125. 125. Feature  Space  (common  to  several   efforts) Document and Structural Cues: Two word phrases, appearing in the beginning of a document, frequency, presence in multiple similar documents etc. Linguistic Cues: Stemmed form of a phrase, phrases that are simple and compound nouns in sentences etc.Monday, June 6, 2011
  126. 126. Key  Phrase  Extraction:  Overview“President Obama in trying to regain control of the health-care debate will likely shift his pitch in September”" 1-grams: President, Obama, in, trying, to, regain, ..." 2-grams: “President Obama”, “Obama in”, “in trying”, “tryingMonday, June 6, 2011
  127. 127. A descriptor is an n-gram weighted by: ‣ Thematic Importance ‣ TFIDF, stop words, noun phrases ‣ Redundancy: statistically discriminatory in nature ‣ variability: contextually important ‣ Spatial Importance (local vs. global popularity) ‣ Temporal Importance (always popular vs. currently trending)Monday, June 6, 2011
  128. 128. Monday, June 6, 2011
  129. 129. Eliminating Off-topic Content [WISE2009] Frequency based heuristics will not eliminate off-topic content that is ALSO POPULARMonday, June 6, 2011
  130. 130. Approach  Overview “Yeah i know this a bit off topic but the other electronics forum is dead right now. im looking for a good camcorder, somethin not to large that can record in full HD only ones so far that ive seen are sonys” “CanonHV20.Great little cameras under $1000.”Monday, June 6, 2011
  131. 131. Approach  Overview Assume one or more seed words (from domain knowledge base) C1 -[camcorder] Extracted Key words / phrases C2 -[electronics forum, hd, camcorder, somethin, ive, canon, little camera, canon hv20, cameras, offtopic] Gradually expand C1 by adding phrases from C2 that are strongly associated with C1 Mutual Information based algorithm [WISE2009]Monday, June 6, 2011
  132. 132. Key  Phrases  and  Aboutness   Evaluations Are the key phrases we extracted topical and good indicators of what the content is about? ‣ If it is, it should act as an effective index/search phrase and return relevant content Evaluation Application: Targeted Content DeliveryMonday, June 6, 2011
  133. 133. Targeted  Content   Delivery  -­‐‑Evaluations 12K posts from MySpace and Facebook Electronics forums ‣ Baseline phrases: Yahoo Term Extractor ‣ Our method phrases: Key phrase extraction, elimination Targeted Content from Google AdSenseMonday, June 6, 2011
  134. 134. Targeted  Content  for  all  content   vs.  extracted  key  phrasesMonday, June 6, 2011
  135. 135. User  Studies  and  ResultsMonday, June 6, 2011
  136. 136. Impact  and  Contributions TFIDF + social contextual cues yield more useful phrases that preserve social perceptions Corpus + seeds from a domain knowledge base eliminate off-topic phrases effectivelyMonday, June 6, 2011
  137. 137. Intention  MiningMonday, June 6, 2011
  138. 138. Targeted  Content  Delivery  via             Intention  Mining On social networks Use case for this talk ‣" Targeted content = content-based " advertisements ‣ " Target = user profiles Content-based advertisements CBAs ‣" Well-known monetization model for online contentMonday, June 6, 2011
  139. 139. Circa.  2009  Content-­‐‑based  AdsMonday, June 6, 2011
  140. 140. Circa.  2009  -­‐‑Ads  on  ProfilesMonday, June 6, 2011
  141. 141. What  is  going  on  here Interests do not translate to purchase intents ‣" Interests are often outdated.. ‣ " Intents are rarely stated on a profile.. Cases that do seem to work ‣" New store openings, sales ‣ " Highly demographic-targeted adsMonday, June 6, 2011
  142. 142. Intents  in  User  Monday, June 6, 2011
  143. 143. Content  Ads  Outside  ProfilesMonday, June 6, 2011
  144. 144. Targeted  Content-­‐‑based   Advertising   Non-trivial ‣ Non-policed content Brand image, Unfavorable sentiments ‣ People are there to network User attention to ads is not guaranteed ‣ Informal, casual nature of content ‣ People are sharing experiences and events Main message overloaded with off topic content"Monday, June 6, 2011
  145. 145. Targeted  Content-­‐‑based   Advertising  Monday, June 6, 2011
  146. 146. Targeted  Content-­‐‑based   Advertising   I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008Monday, June 6, 2011
  147. 147. Preliminary  Results  in…   Identifying intents behind user posts on social networks ‣ Identify Content with monetization potential Identifying keywords for advertising in user- generated content ‣ Considering interpersonal communication & off-topic chatterMonday, June 6, 2011
  148. 148. Investigations User studies ‣ Hard to compare activity based ads to s.o.t.a ‣ Impressions to Clickthroughs ‣ How well are we able to identify monetizable posts ‣ How targeted are ads generated using our " keywords vs. entire user generated contentMonday, June 6, 2011
  149. 149. Identifying  Monetizable  Intents Scribe Intent not same as Web Search Intent 1B. People write sentences, not keywords or phrases Presence of a keyword does not imply navigational / transactional intents ‣ ‘am thinking of getting X’ (transactional) ‣ ‘I like my new X’ (information sharing) ‣ ‘what do you think about X’ (information seeking) 1B. J. Jansen, D. L. Booth, and A. Spink, “Determining the informational, navigational, and transactional intent of web queries,”Inf. Process. Manage., vol. 44, no. 3, 2008.Monday, June 6, 2011
  150. 150. From  X  to  Action  PaTerns Action patterns surrounding an entity ‣ How questions are asked and not topic words that indicate what the question is about ‣ “where can I find a chottopspcam” ‣ User post also has an entityMonday, June 6, 2011
  151. 151. Conceptual  Overview   Bootstrapping  to  learn  IS  paTerns Set of user posts from SNSs Not annotated for presence or absence of any intentMonday, June 6, 2011
  152. 152. Bootstrapping  to   learn  IS  paTerns Generate  a  universal  set  of  n-­‐‑  gram  paMerns;  freq  >  f S  =  set  of  all  4-­‐‑grams;  freq  >  3Monday, June 6, 2011
  153. 153. Bootstrapping  to   learn  IS  paTerns ! ! Generate  set  of  candidate  paMerns  from  seed  words   (why,when,where,how,what) Sc=  all  4-­‐‑grams  in  S  that  extract  seed  wordsMonday, June 6, 2011
  154. 154. Bootstrapping  to   learn  IS  paTerns ! ! User  picks  10  seed  paMerns  from  Sc Sis=  ‘does  anyone  know  how’,  ‘where  do  I  find’,   ‘someone  tell  me  where’… Monday, June 6, 2011
  155. 155. Bootstrapping  to   learn  IS  paTerns ! ! ! !     Gradually  expand  Sis  by  adding     Information   Seeking  paDerns  from  ScMonday, June 6, 2011
  156. 156. Bootstrapping  to   learn  IS  paTerns ! ! ! ! For  every  pis  in  Sis  generate  set  of  filler  paMernsMonday, June 6, 2011
  157. 157. Bootstrapping  to   learn  IS  paTerns ‘.*  anyone  know  how’‘          does  .*  know  how’         ‘does  anyone  .*  how’                                  ‘does  anyone   know  .*’Monday, June 6, 2011
  158. 158. Extracting  and  Scoring  PaTernsMonday, June 6, 2011
  159. 159. Extracting  and  Scoring  PaTerns •‘does  *  know  how’  –‘does  someone  know  how’    •Functional  Compatibility  -­‐‑Impersonal  pronouns    •Empirical  Support  –1/3  –‘does  somebody  know  how’    •Functional  Compatibility  -­‐‑Impersonal  pronouns    •Empirical  Support  –0    •PaMern  Retained  –‘does  john  know  how’    •PaMern  discardedMonday, June 6, 2011
  160. 160. Extracting  and  Scoring  PaTerns Sc=  {‘does  anyone  know  how’,  ‘where  do  I  find’,     ‘someone  tell  me  where’}  pis=  `does  anyone  know  how’Monday, June 6, 2011
  161. 161. Extracting  and  Scoring  PaTerns  pis=  `does  anyone  know  how’Monday, June 6, 2011
  162. 162. Extracting  and  Scoring  PaTernsMonday, June 6, 2011
  163. 163. Expanding  the  PaTern  Pool Functional  properties  /  communicative  functions   of  words From  a  subset  of  LIWC –cognitive  mechanical  (e.g.,  if,  whether,  wondering,  find)   •‘I  am  thinking  about  geMing  X’   –adverbs(e.g.,  how,  somehow,  where)   –  (e.g.,  someone,  anybody,  whichever) •‘Someone  tell  me  where  can  I  find  X’           1Linguistic  Inquiry  Word  Count,  LIWC,  hMp://liwc.netMonday, June 6, 2011
  164. 164. Details  in  [WISE2009]  for.. Over  iterations,  single-­‐‑word  substitutions,   functional  usage  and  empirical  support   conservatively  expands  Sis Infusing  new  paMerns  and  seed  words Stopping  conditionsMonday, June 6, 2011
  165. 165. Sample  Extracted  PaTernsMonday, June 6, 2011
  166. 166. Identifying  Monetizable  Posts Information  Seeking  paMerns  generated  offline Information  seeking  intent  score  of  a  post ‣ Extract  and  compare  paMerns  in  posts  with   extracted  paMerns ‣ Transactional  intent  score  of  a  post ‣ LIWC  ‘Money’  dictionary  -­‐‑  173  words  and   word  forms  indicative    of  transactions,  e.g.,   trade,  deal,  buy,  sell,  worth,  price  etc.Monday, June 6, 2011
  167. 167. Keywords  for  Advertizing Identifying keywords in monetizable posts " –Plethora of work in this space Off-topic noise removal is our focus " I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions,Monday, June 6, 2011
  168. 168. Keywords  for  Advertising Identifying keywords in monetizable posts ‣ Plethora of work in this space Off-topic noise removal is our focus ‣ I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :(Monday, June 6, 2011
  169. 169. Conceptual  Overview   (also  see  slides  88,89)   Topical hints ‣ C1 -[camcorder] Keywords in post ‣ C2 -[electronics forum, hd, camcorder, somethin, ive, canon, little camera, canon hv20, cameras, offtopic] Move strongly related keywords from C2 to C1 one-by-one ‣ Relatedness determined using information gain ‣ Using the Web as a corpus, domain independentMonday, June 6, 2011
  170. 170. Off-­‐‑topic  ChaTer C1 -[camcorder] C2 -[electronics forum, hd, camcorder, somethin, ive, canon, little camera, canon hv20, cameras, offtopic] Informative words ‣ [camcorder, canon hv20, little camera, hd, cameras, canon]Monday, June 6, 2011
  171. 171. Evaluations  -­‐‑User  Study Keywords from 60 monetizable user posts ‣ Monetizable intent, at least 3 keywords in content 45 MySpace Forums, 15 Facebook Marketplace, 30 graduate students ‣ 10 sets of 6 posts each ‣ Each set evaluated by 3 randomly selected users Monetizable intents? ‣ All 60 posts voted as unambiguously information seeking in intentMonday, June 6, 2011
  172. 172. 1.  Effectiveness  of  using   topical  keywords Google AdSenseads for user post vs. extracted topical keywordsMonday, June 6, 2011
  173. 173. Instructions  –User  StudyMonday, June 6, 2011
  174. 174. Result  -­‐‑2X  Relevant  Impressions Users picked ads relevant to the post ‣ At least 50% inter-evaluator agreement For the 60 posts ‣ Total of 144 ad impressions ‣ 17% of ads picked as relevant For the topical keywords ‣ Total of 162 ad impressions ‣ 40% of ads picked as relevantMonday, June 6, 2011
  175. 175. 2.  Profile  Ads  vs.  Activity  Ads User’s profile information ‣ Interests, hobbies, TV shows.. ‣ Non-demographic information Submit a post Looking to buy and why (induced noise) Ads that generate interest, captured attentionMonday, June 6, 2011
  176. 176. Result  -­‐‑8X  Generated  Interest Using profile ads ‣ Total of 56 ad impressions ‣ 7% of ads generated interest Using authored posts ‣ Total of 56 ad impressions ‣ 43% of ads generated interest •" Using topical keywords from authored posts ‣ Total of 59 ad impressions ‣ 59% of ads generated interestMonday, June 6, 2011
  177. 177. To  note… User studies small and preliminary, clearly suggest ‣ Monetization potential in user activity ‣ Improvement for Ad programs in terms of relevant impressions Evaluations based on forum, marketplace ‣ Verbose content ‣ Status updates, notes, community and event memberships… ‣ One size may not fit allMonday, June 6, 2011
  178. 178. To  note… A world between relevant impressions and click throughs ‣ Objectionable content, vocabulary impedance, Ad placement, network behavior In a pipeline of other community efforts No profile information taken into account Cannot custom send information to Google AdSenseMonday, June 6, 2011
  179. 179. SENTIMENT  /  OPINION   MININGMonday, June 6, 2011
  180. 180. Content  Analysis:  Sentiment   Analysis/Opinion  Mining Two main types of information we can learn from user-generated content: fact vs. opinion Much of what we read in social media (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions.   For example, " Latest news: Mobile web services not working in #Bahrain and Internet is extremely slow #feb14 {fact}... looks like they "learned" from #Egypt {opinion}"Monday, June 6, 2011
  181. 181. Sentiment  Analysis  Motivation Why do Which movie What customers people oppose should I see? complain about? health care reform?Monday, June 6, 2011
  182. 182. Sentiment  Analysis:  Tasks Example: ‣ How awful that many #Egyptian artifacts are in danger of being destroyed. ‣ What Zahi Hawass must be thinking #jan25 (read in the tone of “what were YOU thinking”Monday, June 6, 2011
  183. 183. Sentiment  Analysis:  TasksMonday, June 6, 2011
  184. 184. Sentiment  Analysis:  Tasks Classification: overall sentiment polarity: positive/ neutral/negative ‣Example: “How awful that many #Egyptian artifacts are in danger of being destroyed.” ‣overall polarity is negative ‣Target-specific sentiment polarity: positive/neutral/ negative ‣ Example: for target "egyptian artifacts", polarity is "negative“ for target "Zahi Hawass", polarity is "neutral“Monday, June 6, 2011
  185. 185. Sentiment  Analysis:  TasksMonday, June 6, 2011
  186. 186. Sentiment  Analysis:  Tasks Identification & Extraction: opinion, opinion holder, opinion target Example: opinion="awful", opinion holder="the author", target="egyptian artifacts are in danger" Opinion="must be thinking", opinion holder="the author", target="Zahi Hawass"Monday, June 6, 2011
  187. 187. Sentiment  Analysis:  Approaches Classification: ‣ Supervised:  ‣ labeled training data ‣ features, differ from traditional topic classification tasks ‣ learning strategies ‣ Unsupervised: ‣ lexicon-based approach ‣ BootstrappingMonday, June 6, 2011
  188. 188. Sentiment  Analysis:  ApproachesMonday, June 6, 2011
  189. 189. Sentiment  Analysis:  Approaches Identification & Extraction: ‣utilizing the relations between opinion and opinion target, ‣proximity, ‣syntactic dependency, ‣co-occurrence and ‣prepared patterns/rulesMonday, June 6, 2011
  190. 190. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010 Lexicon-based approach for sentiment analysis of tweets: subjective lexicon from OpinionFinder (Wilson et al., 2005) Within topic tweets, count messages containing these positive and negative words defined by the lexiconMonday, June 6, 2011
  191. 191. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010 subjective lexicon from OpinionFinder (Wilson et al., 2005) Within topic tweets, count messages containing these positive and negative words defined by the lexiconMonday, June 6, 2011
  192. 192. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010 Within topic tweets, count messages containing these positive and negative words defined by the lexiconMonday, June 6, 2011
  193. 193. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010  B.O’Connor,  R.Balasubramanyan,  B.R.Routledge,  and   N.A.Smith.  From  Tweets  to  polls:  Linking  text  sentiment  to  public   opinion  time  series.  In  Intl.AAAI  Conference  on  Weblogs  and   Social  Media,  Washington,D.C.,2010.Monday, June 6, 2011
  194. 194. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media Corpus: 2.89 million tweets referring to 24 movies released over a period of three months Sentiment Analysis Classifier:  DynamicLMClassifier provided by LingPipe linguistic analysis package thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  195. 195. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media Sentiment Analysis Classifier:  DynamicLMClassifier provided by LingPipe linguistic analysis package thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  196. 196. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media  DynamicLMClassifier provided by LingPipe linguistic analysis package thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  197. 197. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  198. 198. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  199. 199. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  200. 200. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach Simple  lexicon-­‐‑based  method  doesnʹt  work. Observations: The opinions may not contribute toward the given target (1,2,3,6) The subjectivity and polarity of opinion clues are domain- dependent (5,7) Single words are not enough (4,7,8)Monday, June 6, 2011
  201. 201. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach General  subjective  lexicon ‣ Commonly  used  subjective  lexicon  +  popular  slangs  learned  from   Urban  Dictionary Domain-­‐‑dependent  sentiment  lexicon ‣ Learned  from  domain-­‐‑specific  corpus ‣ bootstrapping   ‣ More  than  words  (word/phrase/paMern) ‣ n-­‐‑gram  +  statistical  model  Monday, June 6, 2011
  202. 202. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach General  subjective  lexicon ‣ Commonly  used  subjective  lexicon  +  popular  slangs  learned  from   Urban  Dictionary Domain-­‐‑dependent  sentiment  lexicon ‣ Learned  from  domain-­‐‑specific  corpus ‣ bootstrapping   ‣ More  than  words  (word/phrase/paMern) ‣ n-­‐‑gram  +  statistical  model  Monday, June 6, 2011
  203. 203. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach Domain-­‐‑dependent  sentiment  lexicon ‣ Learned  from  domain-­‐‑specific  corpus ‣ bootstrapping   ‣ More  than  words  (word/phrase/paMern) ‣ n-­‐‑gram  +  statistical  model  Monday, June 6, 2011
  204. 204. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach  Monday, June 6, 2011
  205. 205. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  ApproachMonday, June 6, 2011
  206. 206. Sentiment  Analysis:  Target-­‐‑ specific  opinion  identification  &   Classification  of  Tweets-­‐‑ Unsupervised  ApproachMonday, June 6, 2011
  207. 207. Sentiment  Analysis:  Target-­‐‑ specific  opinion  identification  &   Classification  of  Tweets-­‐‑ Unsupervised  Approach Target-­‐‑specific  opinion  identification/extraction ‣ Shallow  syntactic  analysis ‣ Rules  +  ProximityMonday, June 6, 2011
  208. 208. Content  Analysis:  Context   Extraction,  Utilization URL  Extraction  is  for  Tweets FourSquare  in  Facebook,  TwiMer   What  is  it  in  other  mediums/SMS?Monday, June 6, 2011
  209. 209. Content  Analysis:   URL  extraction Resolution Semantic Context RelevanceMonday, June 6, 2011
  210. 210. Author  Categorization:  Using   Content  to  derive  additional   People  metadata Personality Signals Blogs, Style of Writing Psychometric analysis of content Sample study: Gendered writing styles onlineMonday, June 6, 2011
  211. 211. People  Analysis:  Using  Network   to  derive  People  metadata Interesting questions to ask: ‣ Who are the most popular people* in the network ‣ Who are the most influential people in the network ‣ Who are the most active people in the network ‣ What are the types of people in communities of the network ‣ Who are the bridges between communities in the networkMonday, June 6, 2011
  212. 212. People  Analysis:  Influence By Link Analysis Algorithms Hits [K-99] & variants   PageRank [BP-97] & variants  etc.. Links not sufficient! ‣ Million Follower Fallacy [C-10] Source : informing-artsMonday, June 6, 2011
  213. 213. People  Analysis:  InfluenceMonday, June 6, 2011
  214. 214. People  Analysis:  Influence Flavor of Context Analysis (activity level) Popularity NOT = Influence! ‣ Influence & Passivity [RGAH-10] Interest Similarity ‣ TwitterRank: Reciprocity & Homophily [WLJH-10] Klout Score - True Reach, Amplification [Klout]Monday, June 6, 2011
  215. 215. People  Analysis:  User  types   &  Affiliation Blogger, Scientist, Journalist, Artist, Trustee, Company X in  Domain Y.. ‣ Multiple types and affiliations! User interest mining ‣ Key Phrase Extraction followed by semantic association on user bio, tweets, lists, favorite posts Source: kahunainstitute.com ‣ Twitter Study [BCDMJNRM-09]Monday, June 6, 2011
  216. 216. People  Analysis:  User  types   &  AffiliationMonday, June 6, 2011
  217. 217. People  Analysis:  User  types   &  Affiliation Semantic analysis of profile description ‣ Web Presence: Use of Web & Knowledge bases (Wikipedia, Blogs) to build context for user types ‣ Entity Spotting & Extraction, followed by Semantic Association and Similarity with user-type contextMonday, June 6, 2011
  218. 218. People  Analysis:   Social  Engagement Source: http://www.syscomminternational.com/ Frequency  Distribution  Analysis  of  user  activity ‣ posting,  retweet,  reply,  mentions,  lists  etc.  Monday, June 6, 2011
  219. 219. Network  Analysis   Foundation  of  network:   •Nodes •Connections/Relationships Interesting  questions  to  ask: How  communities  form  around  topics-­‐‑  growth  &  evolution   What  are  the  effects  of  presence  of  influential  participants  in  the   communities What  are  the  effects  of  content  nature  (or  sentiment,  opinions)   flowing  in  network  on  the  community  life What  is  the  community  structure:  degree  of  separation  and  sub-­‐‑ communitiesMonday, June 6, 2011
  220. 220. Network  Analysis:  Methods Source: http://www.kudos- dynamics.com/Monday, June 6, 2011
  221. 221. Network  Analysis:  Methods Network  Structure  metrics Centrality,  Connected  Component,  Avg.   Degree,  Clustering  Coefficient,  Avg.  Path  Length,   Bridge,  Cohesion,  Prestige,  Reciprocity   Important  Literature:                   [AB-­‐‑02,  WS-­‐‑98,    BW-­‐‑00;  NW-­‐‑06,  WF-­‐‑92,  MW-­‐‑10] Source: http://www.kudos- dynamics.com/Monday, June 6, 2011
  222. 222. Network  Analysis:  Algorithms   Community Discovery, growth, evolution ‣ Based on relationship types (e.g., signed network), geography/location based etc. Hierarchical clustering algorithms – Top-down, bottom-up Modularity Maximization [NW-06] Algorithms comparison survey [B-06]Monday, June 6, 2011
  223. 223. Network  Analysis:  Algorithms   Graph Partitioning & Traversal Best time-complexity & reachability Follow Greedy paths ‣ K-way multilevel Partitioning , ‣ Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS, MST "ʺWe  dream  in  Graph  and   We  analyze  in  Matrix”-­‐‑   Barry  Wellman,  INSNA  Monday, June 6, 2011
  224. 224. Network  Analysis:  Methods Network Modeling Approaches  ‣ Random graph model (Erdos-Renyi model) ‣ Small-world model (Small World Phenomenon)  ‣ Scale-free model (led to Power-Law degree distribution) ‣ Social Network Analysis methods ‣ Centrality (Degree, Eigenvector, Betweenness, Closeness) ‣ Clusters (Cliques and extensions, Communities) Source: http://www.kudos- dynamics.com/Monday, June 6, 2011
  225. 225. Network  Analysis:   Diffusion  &  Homophily Information Flow: Diffusion ‣ Maximizing Spread (Opinion, Innovation, Recommendation) ‣ Outbreak Detection (e.g., disease) Social Network: No info about user action– Understanding dynamics is challenging! Power Law distribution [LAH-07] Factors impacting flow: ‣ Sampling strategy, user Homophily, content nature [CLSCK-10, NPS-10]Monday, June 6, 2011
  226. 226. QueryingMonday, June 6, 2011
  227. 227. Analysis  &  Visualization  Tools (Network WorkBench)NWB Truthy Graph-tool Orange Pajek Source:  hMp://truthy.indiana.edu/ Tulip http://en.wikipedia.org/wiki/ social_network_analysis_softwareMonday, June 6, 2011
  228. 228. Event  DetectionMonday, June 6, 2011
  229. 229. Citizen  Sensing  in  Real-­‐‑timeMonday, June 6, 2011
  230. 230. Real-­‐‑Time  Motivation People cant wait for Information 500 years ago ‣ Single life time 20 years ago ‣ Next day or two ‣ Television, News papers Presently ‣ Minutes are not considered fast enough ‣ Digital media, Social media Monday, June 6, 2011
  231. 231. Real-­‐‑Time  Social  Media Is Real-Time the future of Web? Social Media for Real-Time Web ‣ Disaster Management ‣ Ushahidi ‣ Real-Time Markets ‣ Examples ‣ Brand Tracking ‣ Twarql ‣ Movie reviewsMonday, June 6, 2011
  232. 232.            Scenario The  Guardian Feb  2010Monday, June 6, 2011
  233. 233.            Scenario The  Guardian Feb  2010Monday, June 6, 2011
  234. 234.            Scenario The  Guardian Feb  2010 JournalistMonday, June 6, 2011
  235. 235. Challenges Information Overload ‣ Can we aggregate, organize and collectively analyze data Real Time ‣ Can we deliver the data as it is generatedMonday, June 6, 2011
  236. 236. A  Semantic  Web  Approach Expressive description of Information need ‣ Using SPARQL (Instead of traditional keyword search)  Flexibility on the point of view ‣ Ability to "slice and dice" the data in several dimensions: thematic, spatial, temporal, sentiment etc.. Streaming data with Background Knowledge ‣ Enables automatic evolution and serendipity Scalable Real-Time delivery  ‣ Using sparqlPuSH (SFSW10)Monday, June 6, 2011
  237. 237. Concept  FeedMonday, June 6, 2011
  238. 238. ArchitectureMonday, June 6, 2011
  239. 239. Social  Sensor  ServerMonday, June 6, 2011
  240. 240. Metadata  Extractions     (Social  Sensor  Server) Named Entity Recognition ‣ 2 Million Entities from DBPedia ‣ Load as Trie for efficiency ‣ N-grams matched ‣ Example: Obama, Barack ObamaMonday, June 6, 2011
  241. 241. Metadata  Extractions     (Social  Sensor  Server) URL, HashTag Extraction ‣ Regex extraction ‣ Resolution ‣ URL Resolution: Follows http redirects for resolution ‣ HashTag Resolution: Tagdef, Tagal,WTHashTag.comMonday, June 6, 2011
  242. 242. Metadata  Extractions     (Social  Sensor  Server)Monday, June 6, 2011
  243. 243. Metadata  Extractions     (Social  Sensor  Server) Other Metadata provided by Twitter ‣ User profile: User Name, Location, Time etc.. ‣ Tweet: RT, reply etc..Monday, June 6, 2011
  244. 244. Structured  Data (Social  Sensor  Server) RDF Annotation ‣ Common RDF/OWL Vocabularies ‣ FOAF - (foaf-project.org) Friend of a Friend ‣ SIOC - (sioc-project.org) Semantically Interlinked Online Communities ‣ OPO - (online-presence.net) Online Presence Ontology ‣ MOAT - (moat-project.org) — Meaning Of A TagMonday, June 6, 2011
  245. 245. Structured  Data (Social  Sensor  Server)Monday, June 6, 2011
  246. 246. Structured  Data (Social  Sensor  Server) A snippet of the annotation <http://twitter.com/ bob/statuses/123456789>   rdf:type   sioct:MicroblogPost ;   sioc:content  ”Fingers crossed for the upcoming #hcrvote”   sioc:hascreator   <http://twitter.com/bob> ;   foaf:maker    <http://example.org/bob> ;   moat:taggedWith   dbpedia:Healthcare_reform . <http://twitter.com/bob> geonames:locatedIn Dbpedia:Ohio .Monday, June 6, 2011
  247. 247. Semantic  PublisherMonday, June 6, 2011
  248. 248. Semantic  Publisher Virtuoso to store triples Queries formulated by the users are stored SPARQL protocol over the HTTP to access rdf from the store Combine data from tweet with the background knowledge in the rdf store Monday, June 6, 2011
  249. 249. Application  Server  &  Distribution   HubMonday, June 6, 2011
  250. 250. Application  Server  &  Distribution   Hub Distribution  Hub ‣  PUSH  Model  -­‐‑  Pubsubhubbub  protocol ‣  Pushes  the  tweets  to  the  Application  Server Application  Server ‣  Delivers  data  to  the  Clients ‣  RSS  Enable  Concept  feedsMonday, June 6, 2011
  251. 251. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) @anonymized Lorem ipsum bla bla this is an example tweet ?category skos:subject ? skos:subject competitor skos:subject moat:taggedWith dbpedia:IPad ?tweetMonday, June 6, 2011
  252. 252. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) ?category skos:subject ? skos:subject competitor bla this is an example tweet @anonymized skos:subject Lorem ipsum bla moat:taggedWith dbpedia:IPad ?tweetMonday, June 6, 2011
  253. 253. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) category:Wi-Fi category:Touchscreen ?category skos:subject ? skos:subject competitor bla this is an example tweet @anonymized skos:subject Lorem ipsum bla moat:taggedWith dbpedia:IPad ?tweetMonday, June 6, 2011
  254. 254. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) IPhone HPTabletPC category:Wi-Fi category:Touchscreen ?category skos:subject ? skos:subject competitor bla this is an example tweet @anonymized skos:subject Lorem ipsum bla moat:taggedWith dbpedia:IPad ?tweetMonday, June 6, 2011
  255. 255. 1242  Articles  from  Nytimes Around  800,000  tweetsMonday, June 6, 2011
  256. 256. President  Obama   1242  Articles  from  Nytimes lays  out  plan  for   Around  800,000  tweets Health  care  reform   in  Speech  to  Joint   Session  of  Congress   (10th  Sept   Timeline.com)Monday, June 6, 2011
  257. 257. President  Obama   1242  Articles  from  Nytimes lays  out  plan  for   Around  800,000  tweets Health  care  reform   in  Speech  to  Joint   Session  of  Congress   (10th  Sept   Timeline.com) Obama  taking  an   active  role  in  Health   talks  in  pursuing  his   proposed  overhaul   of  health  care   system.  (13th  Aug  Monday, June 6, 2011
  258. 258. Twarql  on  Linked  Open  DataMonday, June 6, 2011
  259. 259. Twarql  on  Linked  Open  DataMonday, June 6, 2011
  260. 260. Emerging  Research  Areas  Monday, June 6, 2011
  261. 261. Spam  in  Social  Networks Reasons for spamming include: ‣ Gaining Popularity ‣ Use of popular topic related keywords (e.g. hashtags of trending topics) to propagate something off topic. Launching malicious attacks ‣ Phishing attacks, virus, malware etc. ‣ Misleading the masses ‣ Propagating false information [MM-10].Monday, June 6, 2011
  262. 262. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website.Monday, June 6, 2011
  263. 263. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website.Monday, June 6, 2011
  264. 264. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website.Monday, June 6, 2011
  265. 265. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  266. 266. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  267. 267. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  268. 268. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  269. 269. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  270. 270. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  271. 271. Spam  in  Social  Networks Spam detection ‣ Content-based features ‣ Content Size, URL type, spam words ‣ Metadata-based features ‣ Account information, behavior. ‣ Network-based features ‣ Provenance. (e.g. content from a reliable source)Monday, June 6, 2011
  272. 272. Trust  in  Social  Networks Reputation, Policy, Evidence, and Provenance used to derive trustworthiness. Illustrative examples of online cues used for trust assessment. ‣ Wikipedia: article size, number of references, author, edit history, age of the article, edit frequency etc. ‣ Product Reviews: number of helpful, very helpful ratings, author expertise, sentiments in comments received for a review etc.Monday, June 6, 2011
  273. 273. Trust  in  Social  Networks We propose trust ontology[AHTS-10] that ‣ Captures semantics of trust. ‣ Enables representation and reasoning with trust. Semantics of Trust specifies, for a given trustor and trustee, the following features. ‣ Type - Type of trust relationship. ‣ Scope - Context of the trust relationship. ‣ Value - Quantifies the trust relationship.Monday, June 6, 2011
  274. 274. Trust  in  Social  Networks Gleaning primitive (edge) trust ‣ Trust value between two nodes is quantified using numbers. E.g., [0,1] or [-1,1] or partial ordering[TAHS-09]. Gleaning composite (path) trust ‣  Propagation via chaining and aggregation (transitivity) Some popular algorithms for trust computation  ‣ Eigentrust, Spreading Activation, SUNNY etc.Monday, June 6, 2011
  275. 275. Integrating  Social  And   Sensor  Networks Machine sensor observations are quantitative in nature, while human observations can be both qualitative and quantitative. Benefits of combining observations from humans and machine sensors ‣ Complementary evidence. ‣ Corroborative evidenceMonday, June 6, 2011
  276. 276. Integrating  Social  And   Sensor  Networks Applications of integrating heterogeneous sensor observations ‣ Situation Awareness by using  human observations to interpret machine sensor observations. ‣ Enhancing trustworthiness using corroborative evidence.Monday, June 6, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×