Citizen Sensing, Social Media Analytics, and Applications
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Citizen Sensing, Social Media Analytics, and Applications

  • 5,413 views
Uploaded on

Description: http://semtech2011.semanticweb.com/sessionPop.cfm?confid=62&proposalid=3845...

Description: http://semtech2011.semanticweb.com/sessionPop.cfm?confid=62&proposalid=3845

Original version: http://slidesha.re/social-WWW

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,413
On Slideshare
5,410
From Embeds
3
Number of Embeds
2

Actions

Shares
Downloads
94
Comments
0
Likes
2

Embeds 3

http://www.linkedin.com 2
http://paper.li 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Citizen  Sensor  Data  Mining,     Social  Media  Analytics  and   Development  Centric  Web  Applications. Tutorial  at   Semantic  Technology  Conference,   San  Francisco,  CA. Karthik Gomadam Amit Sheth Selvam Velmurugan Accenture Technology Labs, Kno.e.sis @ eMoksha, Kiirti San Jose Wright State UniversityMonday, June 6, 2011
  • 2. Meena Nagarajan Selvam Velmurugan (Content Analysis) (Kiirti, eMoksha NGOs) Hemant Purohit Amit Sheth (People & Network analysis) (Semantic Web) Ashutosh Jadhav (Event Analysis) Lu Chen Pramod Anantharam (Sentiment Analysis) (Social & Sensor web) Pavan Kapanipathi (Real Time Web)Monday, June 6, 2011
  • 3. A  Quick  Word Much  of  the  work  discussed  in  this  tutorial  is   primarily  the  doctoral  research  by  Dr.  Meena   Nagarajan,  currently  at  IBM  Almaden.  It  also   includes  current  work  done  at  kno.e.sis  center  at   Wright  State  University.Monday, June 6, 2011
  • 4. Outline Citizen  Sensing:  Role,  Enablers,  Apps     Systematic  Study  Social  Media Citizen  Sensing  @  Real-­‐‑time Emerging  Research  Areas ‣ Spam  and  Trust  in  Social  Media,  Mobile  Social  Computing Research  Application:  Twitris Tutorial  part  2  Monday, June 6, 2011
  • 5. Citizen  Sensing Everyday users of Web2.0 and social networks: Citizens of an Internet- or Web-enabled social community Observation and Information reported by citizens => Citizen Sensing Human-in-the-loop (participatory) sensing + Web 2.0 + mobile computing = emergence of  " citizen-sensor networksMonday, June 6, 2011
  • 6. Social  Signals The activity of observing, reporting, disseminating information via text, audio, video and built in device sensor (and smart devices), ‣ Creating social signals through aggregation, enhancement, analysis, visualization, and interpretation. Immense potential to disseminate information quickly and in real-timeMonday, June 6, 2011
  • 7. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Mobile device fast emerging as our primary tool ‣ Redefines the way we engage with people, information, etc. Global, Ubiquitous, always available Sense where you are, how you are, …Monday, June 6, 2011
  • 8. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Global, Ubiquitous, always available Sense where you are, how you are, …Monday, June 6, 2011
  • 9. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Sense where you are, how you are, …Monday, June 6, 2011
  • 10. Enablers:  Mobile  Devices  &   Ubiquitous  ConnectivityMonday, June 6, 2011
  • 11. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Mobile Platforms Hit Critical Mass  ‣ Over 5 billion users ‣ 1+B with internet connected mobile devices (2010) ‣ Smartphones > Notebooks + Netbooks (2010E) ‣ 500K+ mobile phone applications ‣ 74% of mobile phone users (2.4B) worldwide texted (2007)Monday, June 6, 2011
  • 12. Enablers:  Web  2.0  &  Social  Media 500M+ Facebook Users 100M+ Twitter users, 85M+ tweets/day Internet Users: 1.8 Bln Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes)Monday, June 6, 2011
  • 13. Enablers:  Web  2.0  &  Social  Media 100M+ Twitter users, 85M+ tweets/day Internet Users: 1.8 Bln Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes)Monday, June 6, 2011
  • 14. Enablers:  Web  2.0  &  Social  Media Internet Users: 1.8 Bln Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes)Monday, June 6, 2011
  • 15. Enablers:  Web  2.0  &  Social  Media Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes)Monday, June 6, 2011
  • 16. Enablers:  Web  2.0  &  Social  MediaMonday, June 6, 2011
  • 17. Enablers:  Web  2.0  &  Social  Media Types of UGC: Twitter(text/microblogs), Facebook (multimedia),YouTube(videos), Flicker(images), Blogs(text),  Ping: (Social network for music) Monday, June 6, 2011
  • 18. Enablers:  Web  2.0  &  Social  Media Flicker(images), Blogs(text),  Ping: (Social network for music) Monday, June 6, 2011
  • 19. Enablers:  Web  2.0  &  Social  Media Ping: (Social network for music) Monday, June 6, 2011
  • 20. Enablers:  Web  2.0  &  Social  MediaMonday, June 6, 2011
  • 21. Citizen  Sensors  in  Action Iran election Haiti Earthquake US healthcare debateMonday, June 6, 2011
  • 22. Revolution  2.0    Political/Social  Activism “If you want to liberate a government, give them the internet.” - Wael Ghonim (Egyptian social activist) When Blitzer asked “Tunisia, then Egypt, what’s next?,” Ghonim replied succinctly “Ask Facebook.”Monday, June 6, 2011
  • 23. Revolution  2.0    Political/Social  Activism When Blitzer asked “Tunisia, then Egypt, what’s next?,” Ghonim replied succinctly “Ask Facebook.”Monday, June 6, 2011
  • 24. Revolution  2.0    Political/Social  ActivismMonday, June 6, 2011
  • 25. Citizen  Journalism Twitter JournalismMonday, June 6, 2011
  • 26. Social  Media  Influence:   Intelligence,  News  &  Analysis   Many media companies use Facebook and Twitter as news-delivery platform. Many individuals rely on them as news source. News is increasingly social.Monday, June 6, 2011
  • 27. Business  Intelligence  Trend   SpoTing,  Forecasting,  Brand   Tracking    and  Crisis  Management Sysomos  : http://www.sysomos.com/ Trendspotting  : http://trendspotting.com Simplify : http://simplify360.com/ Shoutlet  : http://www.shoutlet.com/ Reputation (Defender)  : http://www.reputationdefender.com/Monday, June 6, 2011
  • 28. Development   (Education,  Health,  eGov) LiveMocha  (http://www.livemocha.com/) ‣ Online Language learning tool with social engagement  ‣ bridging the gap!! Soliya (http://www.soliya.net/) ‣ Dialogue between students from diverse " backgrounds across the globe using latest multimedia technologies Project Einstein (http://digital-democracy.org/what-we-do/programs/)  ‣ A photography-based digital penpal program connecting youths in refugee camps to the worldMonday, June 6, 2011
  • 29. Development   (Education,  Health,  eGov) Soliya (http://www.soliya.net/) ‣ Dialogue between students from diverse " backgrounds across the globe using latest multimedia technologies Project Einstein (http://digital-democracy.org/what-we-do/programs/)  ‣ A photography-based digital penpal program connecting youths in refugee camps to the worldMonday, June 6, 2011
  • 30. Development   (Education,  Health,  eGov) Project Einstein (http://digital-democracy.org/what-we-do/programs/)  ‣ A photography-based digital penpal program connecting youths in refugee camps to the worldMonday, June 6, 2011
  • 31. Development   (Education,  Health,  eGov)Monday, June 6, 2011
  • 32. Development   (Education,  Health,  eGov) PatientsLikeMe (http://mashable.com/2010/07/13/social-media-health-trends/)   TrialX (http://trialx.com) Image:  hMp://www.dragonsearchmarketing.com/ blog/ social-­‐‑media-­‐‑development-­‐‑through-­‐‑visual-­‐‑aids-­‐‑ tools/  Monday, June 6, 2011
  • 33. Why  People-­‐‑Content-­‐‑Network   metadata?Monday, June 6, 2011
  • 34. Dimensions  of  Systematic  Study   of  Social  Media Spatio - Temporal -Thematic + People - Content - NetworkMonday, June 6, 2011
  • 35. Social  Information Processing "Who says what, to whom, why, to what extent and with what effect?" [Laswell] Network: Social structure emerges from the aggregate of relationships (ties) People: poster identities, the active effort of accomplishing interaction Content : studying the content of ommunication. Monday, June 6, 2011
  • 36. Studying  Online  Human  Social   Dynamics How  does  the  (semantics  or  style  of)  content  fit   into  the  observations  made  about  the  network? ‣ Often,  the  three-­‐‑dimensional  dynamic  of  people,   content  and  link  structure  is  what  shapes  the  social   dynamic.  Monday, June 6, 2011
  • 37. Studying  Online  Human  Social   DynamicsMonday, June 6, 2011
  • 38. Studying  Online  Human  Social   Dynamics Example:  how  does  the  topic  of  discussion,   emotional  charge  of  a  conversation,  the  presence  of  an   expert  and  connections  between  participants;  together   explain  information  propagation  in  a  social  network?  Monday, June 6, 2011
  • 39. Studying  Online  Human  Social   DynamicsMonday, June 6, 2011
  • 40. Metadata/Annotations Metadata: an organized way to study ‣ types ‣ creation/extraction and storage ‣ useMonday, June 6, 2011
  • 41. The  Anatomy  of  a  TweetMonday, June 6, 2011
  • 42. People  Metadata:  Variety  of   Self-­‐‑expression  Modes  on    Multiple   Social  Media  Platforms Explicit  information  from  user  profiles   ‣ User  Names,  Pictures,  Videos,  Links,  Demographic   Information,  Group  memberships... ‣ Often  is  not  updated         Implicit  information  from  user  a+ention  metadata ‣ Page  views,  Facebook  ʹLikesʹ,  Comments;  TwiMer   ʹFollowsʹ,  Retweets,  Replies.. Monday, June 6, 2011
  • 43. People  Metadata:  Various  Levels Demographic Interests Activity NetworkMonday, June 6, 2011
  • 44. People  Metadata:  Continued User Demographic Metadata Interest Level Metadata •User-id •Author type   •Screen/Display-name of •Trustee/donor, journalist, user blogger, scientist etc. •Real name of user • Favorite tweets •Location • Types of lists subscribed •Profile Creation Date • Style of Writing – •User description personality indicator •User Bio • No. of Followees •URL • Author type trend of FolloweesMonday, June 6, 2011
  • 45. People  Metadata:  Continued Activity  Level  Metadata Influence  Level  Metadata   (Inferring  People  Metadata  from  Network  level  Information) •Age  of  the  profile •No.  of  Followers  –  normal,  influential •Frequency  of  posts •No.  of  Mentions •Timestamp  of  last  status •No.  of  Retweets/Forwards •No.  of  Posts •No.  of  Replies •No.  of  Lists/groups  created •No.  of  Lists/groups  following   •No.  of  Lists/groups  subscribed •No.  of  people  following  back •Authority  &  Hub  Scores Web Presence: •User affiliations •KLOUT Score – influence measure (www.klout.com)Monday, June 6, 2011
  • 46. Content  Metadata Content Independent metadata ‣" date, location, author etc Content Dependent metadata ‣ Direct content-based metadata ‣ Explicit/Mentioned Content metadata ‣ named entities in content ‣ Implicit/Inferred Content Metadata ‣ related named entities from knowledge sources ‣ Indirect content-based metadata (External metadata) ‣ context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)Monday, June 6, 2011
  • 47. Content  Metadata Content Dependent metadata ‣ Direct content-based metadata ‣ Explicit/Mentioned Content metadata ‣ named entities in content ‣ Implicit/Inferred Content Metadata ‣ related named entities from knowledge sources ‣ Indirect content-based metadata (External metadata) ‣ context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)Monday, June 6, 2011
  • 48. Content  MetadataMonday, June 6, 2011
  • 49. Content  Independent  Metadata For Tweets ‣ Published date and time ‣ Location (where tweet was generated from) ‣ Tweet posting method (smart-phone, twitter.com, clients for twitter) ‣ Author informationMonday, June 6, 2011
  • 50. Content  Independent  MetadataMonday, June 6, 2011
  • 51. Content  Independent  Metadata For Text messages ‣ Published date and time ‣ Origin location ‣ Recipient ‣ Carrier informationMonday, June 6, 2011
  • 52. Content  Independent  MetadataMonday, June 6, 2011
  • 53. Content  Independent  MetadataMonday, June 6, 2011
  • 54. Content  Dependent  Metadata  (Tweet)   Direct  Content-­‐‑based  Metadata Direct Content-based Metadata Indirect content-based metadata (External metadata)Monday, June 6, 2011
  • 55. Content  Dependent  Metadata   Direct  Content-­‐‑based  MetadataMonday, June 6, 2011
  • 56. Network  Metadata Connections/Relationships (foundation for the network) matter! Structure  Level  Metadata Relationship  Level  Metadata •Community  Size •Type  of  Relationship •Community  growth  rate •Relationship  strength •Largest  Strongly  Connected   •User  Homophily  based  on   Component  size certain  characteristic  (e.g.,   •Weakly  Connected  Components   Location,  interest  etc.) &  Max.  size •Reciprocity:  mutual  relationship •Average  Degree  of  Separation •Active  Community/  Ties •Clustering  Coefficient  Monday, June 6, 2011
  • 57. Metadata:  Creation,  Extraction   and  StorageMonday, June 6, 2011
  • 58. Metadata  Creation  &  Extraction Extracted Metadata ‣ Directly visible information from the user profile, tweet content & community structure Created Metadata ‣ After processing information in the user profile, content and/or network structureMonday, June 6, 2011
  • 59. An  Example Length: 144 characters; General topic: Egypt protest  This poor {sentiment_expression: {target:”Lara Logan”, polarity:”negative”}} woman! RT @THR CBS News{entity:{type=”News Agency”}} Lara Logan {entity:{type=”Person”}} Released From Hospital {entity:{type=”Location”}} After Egypt{entity: {type=”Country”} Assault{type=”topic”} http://bit.ly/dKWTY0 {external_URL}Monday, June 6, 2011
  • 60. Why  Semantic  Web  is  a  standard     for  social  metadata? Rich  Snippet,  RDFa,  open  graph,  semantic  web   based  social  data  standards Relationships/connections  play  central  role ‣ Relationships  as  first  class  object  is  importantMonday, June 6, 2011
  • 61. Semantic  Web:  A  Very  Short   PrimerMonday, June 6, 2011
  • 62. Semantic  Web:  A  Very  Short   Primer Representation ‣ RDF ‣ relationships as first class object <subject, predicate,object> ‣ OWL ‣ Representing Knowledge  and Agreements: nomenclature, taxonomy, folksonomy, ontologyMonday, June 6, 2011
  • 63. Semantic  Web:  A  Very  Short   PrimerMonday, June 6, 2011
  • 64. Semantic  Web:  A  Very  Short   Primer Annotation ‣ RDFa, Xlink, model referenceMonday, June 6, 2011
  • 65. Semantic  Web:  A  Very  Short   Primer Annotation ‣ RDFa, Xlink, model reference Web of Data ‣ Linked Open Data Monday, June 6, 2011
  • 66. Semantic  Web:  A  Very  Short   Primer Annotation ‣ RDFa, Xlink, model reference Web of Data ‣ Linked Open Data  Querying ‣ SPARQL; Rules: SWRL, RIFMonday, June 6, 2011
  • 67. How  to  save  and  use  metadata? Store metadata as data and use standard database techniques Use filtering and clustering, summarization, statistics - implicit semanticsMonday, June 6, 2011
  • 68. How  to  save  and  use  metadata? Use filtering and clustering, summarization, statistics - implicit semanticsMonday, June 6, 2011
  • 69. How  to  save  and  use  metadata?Monday, June 6, 2011
  • 70. How  to  save  and  use  metadata?Monday, June 6, 2011
  • 71. How  to  save  and  use  metadata? Use explicit semantics and Semantic Web standards and technologies ‣semantics = meaning ‣richer representation, support for relationships, context ‣supports use of background knowledge ‣better integration, powerful analysis  Semantics- the implicit, the formal and the powerful Social metadata on the WebMonday, June 6, 2011
  • 72. Metadata  Extraction  from   Informal  Text Meena Nagarajan, Understanding User-Generated Content on Social Media, Ph.D. Dissertation, Wright State University, 2010Monday, June 6, 2011
  • 73. Characteristics  of  Text  on  Social   MediaMonday, June 6, 2011
  • 74. The  Formality  of  TextMonday, June 6, 2011
  • 75. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Recognize key entities mentioned in content ‣ Information Extraction (entity recognition, anaphora resolution, entity classification..) ‣ Discovery of Semantic Associations between entities Topic Classification, Aboutness of content  ‣ What is the content about? Intention Analysis  ‣ Why did they share this content?Monday, June 6, 2011
  • 76. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Topic Classification, Aboutness of content  ‣ What is the content about? Intention Analysis  ‣ Why did they share this content?Monday, June 6, 2011
  • 77. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Intention Analysis  ‣ Why did they share this content?Monday, June 6, 2011
  • 78. Content  Analysis-­‐‑Typical  Sub-­‐‑tasksMonday, June 6, 2011
  • 79. Content  Analysis-­‐‑Typical  Sub-­‐‑tasksMonday, June 6, 2011
  • 80. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Sentiment Analysis ‣What opinions are people conveying via the content? Author Profiling ‣What can we infer about the author from the content he posts? Context (external to content) extraction ‣URL extraction, analyzing external contentMonday, June 6, 2011
  • 81. Research  Efforts,  Contributions  in   this  space.. Examining usefulness of multiple context cues for text mining algorithms ‣ Compensating for for informal, highly variable language, lack of context ‣ Using context cues: Document corpus, syntactic, structural cues, social medium, external domain knowledge… In this talk, highlighting sample metadata creation tasks: NER, Key Phrase Extraction, Intention, Sentiment/Opinion MiningMonday, June 6, 2011
  • 82. Part  1.  NER,                                                              Key   Phrase  Extraction Named Entity Recognition ‣ I loved <movie> the hangover </movie>! Key Phrase ExtractionMonday, June 6, 2011
  • 83. Multiple  Context  Cues  Utilized  for   NER  in  Blogs  and  MySpace  Monday, June 6, 2011
  • 84. Multiple  Context  Cues  Utilized  for   Keyphrase  Extraction  from  TwiTer,   Facebook  and  MySpaceMonday, June 6, 2011
  • 85. Focus,  Impact Techniques focus on ‣ relatively less explored content aspects on social media platforms Combination of top-down, bottom-up analysis for informal text ‣ Statistical NLP, ML algorithms over large corpora ‣ Models and rich knowledge bases in a domainMonday, June 6, 2011
  • 86. NAMED  ENTITY   RECOGNITIONMonday, June 6, 2011
  • 87. NAMED  ENTITY   RECOGNITION I loved your music Yesterday! “It was THE HANGOVER of the year..lasted forever.. So I went to the movies..badchoice picking “GI Jane”worse now”Monday, June 6, 2011
  • 88. NAMED  ENTITY   RECOGNITION Identifying and classifying tokensMonday, June 6, 2011
  • 89. NER  in  prior  work  vs.  NER  for   Informal  TextMonday, June 6, 2011
  • 90. Cultural  Named  Entities  NER  focus  in  this  work:  Cultural  Named   Entities Artifacts  of  Culture   ‣ Name  of  a  books,  music  albums,  films,  video  games,   etc. Common  words  in  a  language ‣ The  Lord  of  the  Rings,  Lips,  Crash,  Up,  Wanted,   Today,  Twilight,  Dark  Knight…Monday, June 6, 2011
  • 91. Characteristics  of  Cultural  Entities Varied senses, several poorly documented ‣ Merry Christmas covered by 60+ artists Star Trek: movies, TV series, media franchise.. and cuisines !! Changing contexts with recent events ‣ The Dark Knight reference to Obama, health care reform Unrealistic expectations ‣ Comprehensive sense definitions, enumeration of contexts, labeled corpora for all senses .. ‣ NER Relaxing the closed-world sense assumptionsMonday, June 6, 2011
  • 92. NER  in  prior  work  vs.     NER  for  Informal  TextMonday, June 6, 2011
  • 93. A  Spot  and  Disambiguate   Paradigm NER generally a sequential prediction problem ‣ NER system that achieves 90.8 F1 score on the CoNLL-2003 NER shared task (PER, LOC, ORGN entities) [Lev Ratinov, Dan Roth] Focus of approach: Spot and Disambiguate Paradigm Starting off with a dictionary or list of entities we want to spotMonday, June 6, 2011
  • 94. A  Spot  and  Disambiguate   Paradigm Spot, then disambiguate in context (natural language, domain knowledge cues) Binary Classification Is this mention of “the hangover” in a sentence referring to a movie?Monday, June 6, 2011
  • 95. NER  in  prior  work  vs.                         NER  for  Informal  TextMonday, June 6, 2011
  • 96. Algorithmic  Contributions   Supervised  AlgorithmsMonday, June 6, 2011
  • 97. Algorithmic  Contributions   Supervised  Algorithms Examples: “I am watching Pattinson scenes in <movie id=2341> Twilight</movie> for the nth time.” “I spent a romantic evening watching the Twilight by the bay..” “I love <artist id=357688>Lily’s</artist> songMonday, June 6, 2011
  • 98. Multiple  Senses  in  the  Same   DomainMonday, June 6, 2011
  • 99. Algorithm  Preliminaries Problem Defn ‣ Cultural Entity Identification : Music album, tracks ‣ Smile (Lilly Allen), Celebration (Madonna) Corpus: MySpace comments ‣ Context-poor utterances " “Happy 25th Lilly, Alfieis funny”Monday, June 6, 2011
  • 100. Algorithm  Preliminaries Corpus: MySpace comments ‣ Context-poor utterances " “Happy 25th Lilly, Alfieis funny”Monday, June 6, 2011
  • 101. Algorithm  Preliminaries " “Happy 25th Lilly, Alfieis funny”Monday, June 6, 2011
  • 102. Algorithm  Preliminaries Goal:  Semantic  Annotation  of   music  named  entities  (w.r.t   MusicBrainz)Monday, June 6, 2011
  • 103. Using  a  Knowledge  Resource  for   NER  is  not  straight-­‐‑forward..Monday, June 6, 2011
  • 104. Approach  Overview   Scoped Relationship graphs ‣Using context cues from the content, webpage title, url… new Merry Christmas tune ‣Reduce potential entity spot size new albums/songs ‣Generate candidate entities ‣Spot and DisambiguateMonday, June 6, 2011
  • 105. Sample  Real-­‐‑world  Constraints Career Restrictions ‣“release your third album already..” Recent Album restrictions ‣“I loved your new album..” Artist age restrictions ‣”happy 25th rihanna, loved alfie btw..” etc.Monday, June 6, 2011
  • 106. Non-­‐‑Music  Mentions Challenge 1: Several senses in the same domain ‣ Scoping relationship graphs narrows possible senses ‣ Solves the named entity identification problem partially Challenge 2: Non-music mentions ‣ Got your new album Smile. Loved it! ‣ Keep your SMILE on! " " " " " " " "Monday, June 6, 2011
  • 107. Non-­‐‑Music  Mentions Challenge 1: Several senses in the same domain ‣ Scoping relationship graphs narrows possible senses ‣ Solves the named entity identification problem partially Challenge 2: Non-music mentions ‣ Got your new album Smile. Loved it! ‣ Keep your SMILE on! " " " " " " " "Monday, June 6, 2011
  • 108. Using  Language  Features  to   eliminate  incorrect  mentions.. Syntactic features ‣ POS Tags, Typed dependencies.. ‣ Example here Word-level features ‣ Capitalization, Quotes Domain-level featuresMonday, June 6, 2011
  • 109. Supervised  LearnersMonday, June 6, 2011
  • 110. Hand  Labeling  -­‐‑  Fairly  Subjective 1800+  spots  in  MySpace  user  comments  from   artist  pages   Keep  your  SMILE  on! –good  spot,  bad  spot,  inconclusive? 4-­‐‑way  annotator  agreements –Madonna  90%  agreement –Rihanna  84%  agreement –Lily  Allen  53%  agreementMonday, June 6, 2011
  • 111. Dictionary  SpoTer  +  NLP  Step   Daniel  Gruhl,  Meena  Nagarajan,  Jan  Pieper,  Christine  Robson,  Amit  Sheth,  Context  and  Domain   Knowledge  Enhanced  Entity  SpoMing  in  Informal  Text,  The  8th  International  Semantic  Web  Conference,   2009:  260-­‐‑276  Monday, June 6, 2011
  • 112. NER  on  Social  Media  Text  using   Domain  Knowledge Highlights issues with using a domain knowledge for an IE task Two stage approach: chaining NL learners over results of domain model based spotters Improves accuracy up to a further 50% ‣ allows the more time-intensive NLP analytics to run on less than the full set of input dataMonday, June 6, 2011
  • 113. BBC  SoundIndex  (IBM  Almaden):   Pulse  of  the  Online  Music   " "   Daniel  Gruhl,  Meenakshi  Nagarajan,  Jan  Pieper,  Christine  Robson,  Amit  Sheth:  “Multimodal  Social   Intelligence  in  a  Real-­‐‑Time  Dashboard  System,”  special  issue  of  the  VLDB  Journal  on  "ʺData  Management   and  Mining  for  Social  Networks  and  Social  Media"ʺ,  2010    CHECK    hMp://www.almaden.ibm.com/cs/ projects/iis/sound/Monday, June 6, 2011
  • 114. The  Vision http://www.almaden.ibm.com/cs/projects/iis/sound/Monday, June 6, 2011
  • 115. Monday, June 6, 2011
  • 116. Several  Insights Trending  popularity  of  artists Trending  topics  in  artist  pages Only  4%  -­‐‑ve  sentiments,  perhaps  ignore  the  Sentiment Ignoring  Spam  can  change  ordering    Annotator  on  this  data  source? of  popular  artistsMonday, June 6, 2011
  • 117. Predictive  Power  of  Data Billboards Top 50 Singles chart during the week of Sept 22-28 ’07 vs. MySpace popularity charts. User study indicated 2:1 and upto 7:1 (younger age groups) preference for MySpace list. Challenging traditional polling methods!Monday, June 6, 2011
  • 118. Key  Phrase  ExtractionMonday, June 6, 2011
  • 119. Key  Phrase  Extraction:  Example Key phrases extracted from prominent discussions on Twitter around the 2009 Health Care Reform debate and 2008 Mumbai Terror Attack on one dayMonday, June 6, 2011
  • 120. Key  Phrase  Extraction  from  SM   Text Different from Information Extraction Extracting vs. Assigning Key Phrases " Focus: Key Phrase Extraction Prior work focus: extracting phrases that summarize a document -- a news article, a web page, a journal article, a book.. Focus: summarize multiple documents (UGC) around same event/topic of interestMonday, June 6, 2011
  • 121. Key  Phrase  Extraction  on  SM   Content Focus: Summarizing Social Perceptions via key phrase extraction Preserving/Isolating the social behind the social data ‣"What is said in Egypt vs. the USA should be viewed in isolationMonday, June 6, 2011
  • 122. Key  Phrase  Extraction  on  SM   Content ‣ Accounting for redundancy, variability, off-topic content " “Met up with mom for lunch, she looks lovely as ever, good genes .. Thanks Nike, I love my new Gladiators ..smooth as a feather. I burnt all the calories of Italian joy in one run.. if you are looking for good Italian food on Main, Bucais the place to go.”Monday, June 6, 2011
  • 123. Social  and  Cultural  Logic  in  SMC Thematic components ‣ similar messages convey similar ideas Space, time metadata ‣ role of community and geography in communication Poster attributes ‣ age, gender, socio-economic status reflect similar perceptionsMonday, June 6, 2011
  • 124. Feature  Space  (common  to  several   efforts) Focus: n-grams, spatio-temporal metadata (social components) Syntactic Cues: In quotes, italics, bold; in document headers; phrases collocated with acronymsMonday, June 6, 2011
  • 125. Feature  Space  (common  to  several   efforts) Document and Structural Cues: Two word phrases, appearing in the beginning of a document, frequency, presence in multiple similar documents etc. Linguistic Cues: Stemmed form of a phrase, phrases that are simple and compound nouns in sentences etc.Monday, June 6, 2011
  • 126. Key  Phrase  Extraction:  Overview“President Obama in trying to regain control of the health-care debate will likely shift his pitch in September”" 1-grams: President, Obama, in, trying, to, regain, ..." 2-grams: “President Obama”, “Obama in”, “in trying”, “tryingMonday, June 6, 2011
  • 127. A descriptor is an n-gram weighted by: ‣ Thematic Importance ‣ TFIDF, stop words, noun phrases ‣ Redundancy: statistically discriminatory in nature ‣ variability: contextually important ‣ Spatial Importance (local vs. global popularity) ‣ Temporal Importance (always popular vs. currently trending)Monday, June 6, 2011
  • 128. Monday, June 6, 2011
  • 129. Eliminating Off-topic Content [WISE2009] Frequency based heuristics will not eliminate off-topic content that is ALSO POPULARMonday, June 6, 2011
  • 130. Approach  Overview “Yeah i know this a bit off topic but the other electronics forum is dead right now. im looking for a good camcorder, somethin not to large that can record in full HD only ones so far that ive seen are sonys” “CanonHV20.Great little cameras under $1000.”Monday, June 6, 2011
  • 131. Approach  Overview Assume one or more seed words (from domain knowledge base) C1 -[camcorder] Extracted Key words / phrases C2 -[electronics forum, hd, camcorder, somethin, ive, canon, little camera, canon hv20, cameras, offtopic] Gradually expand C1 by adding phrases from C2 that are strongly associated with C1 Mutual Information based algorithm [WISE2009]Monday, June 6, 2011
  • 132. Key  Phrases  and  Aboutness   Evaluations Are the key phrases we extracted topical and good indicators of what the content is about? ‣ If it is, it should act as an effective index/search phrase and return relevant content Evaluation Application: Targeted Content DeliveryMonday, June 6, 2011
  • 133. Targeted  Content   Delivery  -­‐‑Evaluations 12K posts from MySpace and Facebook Electronics forums ‣ Baseline phrases: Yahoo Term Extractor ‣ Our method phrases: Key phrase extraction, elimination Targeted Content from Google AdSenseMonday, June 6, 2011
  • 134. Targeted  Content  for  all  content   vs.  extracted  key  phrasesMonday, June 6, 2011
  • 135. User  Studies  and  ResultsMonday, June 6, 2011
  • 136. Impact  and  Contributions TFIDF + social contextual cues yield more useful phrases that preserve social perceptions Corpus + seeds from a domain knowledge base eliminate off-topic phrases effectivelyMonday, June 6, 2011
  • 137. Intention  MiningMonday, June 6, 2011
  • 138. Targeted  Content  Delivery  via             Intention  Mining On social networks Use case for this talk ‣" Targeted content = content-based " advertisements ‣ " Target = user profiles Content-based advertisements CBAs ‣" Well-known monetization model for online contentMonday, June 6, 2011
  • 139. Circa.  2009  Content-­‐‑based  AdsMonday, June 6, 2011
  • 140. Circa.  2009  -­‐‑Ads  on  ProfilesMonday, June 6, 2011
  • 141. What  is  going  on  here Interests do not translate to purchase intents ‣" Interests are often outdated.. ‣ " Intents are rarely stated on a profile.. Cases that do seem to work ‣" New store openings, sales ‣ " Highly demographic-targeted adsMonday, June 6, 2011
  • 142. Intents  in  User  Monday, June 6, 2011
  • 143. Content  Ads  Outside  ProfilesMonday, June 6, 2011
  • 144. Targeted  Content-­‐‑based   Advertising   Non-trivial ‣ Non-policed content Brand image, Unfavorable sentiments ‣ People are there to network User attention to ads is not guaranteed ‣ Informal, casual nature of content ‣ People are sharing experiences and events Main message overloaded with off topic content"Monday, June 6, 2011
  • 145. Targeted  Content-­‐‑based   Advertising  Monday, June 6, 2011
  • 146. Targeted  Content-­‐‑based   Advertising   I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008Monday, June 6, 2011
  • 147. Preliminary  Results  in…   Identifying intents behind user posts on social networks ‣ Identify Content with monetization potential Identifying keywords for advertising in user- generated content ‣ Considering interpersonal communication & off-topic chatterMonday, June 6, 2011
  • 148. Investigations User studies ‣ Hard to compare activity based ads to s.o.t.a ‣ Impressions to Clickthroughs ‣ How well are we able to identify monetizable posts ‣ How targeted are ads generated using our " keywords vs. entire user generated contentMonday, June 6, 2011
  • 149. Identifying  Monetizable  Intents Scribe Intent not same as Web Search Intent 1B. People write sentences, not keywords or phrases Presence of a keyword does not imply navigational / transactional intents ‣ ‘am thinking of getting X’ (transactional) ‣ ‘I like my new X’ (information sharing) ‣ ‘what do you think about X’ (information seeking) 1B. J. Jansen, D. L. Booth, and A. Spink, “Determining the informational, navigational, and transactional intent of web queries,”Inf. Process. Manage., vol. 44, no. 3, 2008.Monday, June 6, 2011
  • 150. From  X  to  Action  PaTerns Action patterns surrounding an entity ‣ How questions are asked and not topic words that indicate what the question is about ‣ “where can I find a chottopspcam” ‣ User post also has an entityMonday, June 6, 2011
  • 151. Conceptual  Overview   Bootstrapping  to  learn  IS  paTerns Set of user posts from SNSs Not annotated for presence or absence of any intentMonday, June 6, 2011
  • 152. Bootstrapping  to   learn  IS  paTerns Generate  a  universal  set  of  n-­‐‑  gram  paMerns;  freq  >  f S  =  set  of  all  4-­‐‑grams;  freq  >  3Monday, June 6, 2011
  • 153. Bootstrapping  to   learn  IS  paTerns ! ! Generate  set  of  candidate  paMerns  from  seed  words   (why,when,where,how,what) Sc=  all  4-­‐‑grams  in  S  that  extract  seed  wordsMonday, June 6, 2011
  • 154. Bootstrapping  to   learn  IS  paTerns ! ! User  picks  10  seed  paMerns  from  Sc Sis=  ‘does  anyone  know  how’,  ‘where  do  I  find’,   ‘someone  tell  me  where’… Monday, June 6, 2011
  • 155. Bootstrapping  to   learn  IS  paTerns ! ! ! !     Gradually  expand  Sis  by  adding     Information   Seeking  paDerns  from  ScMonday, June 6, 2011
  • 156. Bootstrapping  to   learn  IS  paTerns ! ! ! ! For  every  pis  in  Sis  generate  set  of  filler  paMernsMonday, June 6, 2011
  • 157. Bootstrapping  to   learn  IS  paTerns ‘.*  anyone  know  how’‘          does  .*  know  how’         ‘does  anyone  .*  how’                                  ‘does  anyone   know  .*’Monday, June 6, 2011
  • 158. Extracting  and  Scoring  PaTernsMonday, June 6, 2011
  • 159. Extracting  and  Scoring  PaTerns •‘does  *  know  how’  –‘does  someone  know  how’    •Functional  Compatibility  -­‐‑Impersonal  pronouns    •Empirical  Support  –1/3  –‘does  somebody  know  how’    •Functional  Compatibility  -­‐‑Impersonal  pronouns    •Empirical  Support  –0    •PaMern  Retained  –‘does  john  know  how’    •PaMern  discardedMonday, June 6, 2011
  • 160. Extracting  and  Scoring  PaTerns Sc=  {‘does  anyone  know  how’,  ‘where  do  I  find’,     ‘someone  tell  me  where’}  pis=  `does  anyone  know  how’Monday, June 6, 2011
  • 161. Extracting  and  Scoring  PaTerns  pis=  `does  anyone  know  how’Monday, June 6, 2011
  • 162. Extracting  and  Scoring  PaTernsMonday, June 6, 2011
  • 163. Expanding  the  PaTern  Pool Functional  properties  /  communicative  functions   of  words From  a  subset  of  LIWC –cognitive  mechanical  (e.g.,  if,  whether,  wondering,  find)   •‘I  am  thinking  about  geMing  X’   –adverbs(e.g.,  how,  somehow,  where)   –  (e.g.,  someone,  anybody,  whichever) •‘Someone  tell  me  where  can  I  find  X’           1Linguistic  Inquiry  Word  Count,  LIWC,  hMp://liwc.netMonday, June 6, 2011
  • 164. Details  in  [WISE2009]  for.. Over  iterations,  single-­‐‑word  substitutions,   functional  usage  and  empirical  support   conservatively  expands  Sis Infusing  new  paMerns  and  seed  words Stopping  conditionsMonday, June 6, 2011
  • 165. Sample  Extracted  PaTernsMonday, June 6, 2011
  • 166. Identifying  Monetizable  Posts Information  Seeking  paMerns  generated  offline Information  seeking  intent  score  of  a  post ‣ Extract  and  compare  paMerns  in  posts  with   extracted  paMerns ‣ Transactional  intent  score  of  a  post ‣ LIWC  ‘Money’  dictionary  -­‐‑  173  words  and   word  forms  indicative    of  transactions,  e.g.,   trade,  deal,  buy,  sell,  worth,  price  etc.Monday, June 6, 2011
  • 167. Keywords  for  Advertizing Identifying keywords in monetizable posts " –Plethora of work in this space Off-topic noise removal is our focus " I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions,Monday, June 6, 2011
  • 168. Keywords  for  Advertising Identifying keywords in monetizable posts ‣ Plethora of work in this space Off-topic noise removal is our focus ‣ I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :(Monday, June 6, 2011
  • 169. Conceptual  Overview   (also  see  slides  88,89)   Topical hints ‣ C1 -[camcorder] Keywords in post ‣ C2 -[electronics forum, hd, camcorder, somethin, ive, canon, little camera, canon hv20, cameras, offtopic] Move strongly related keywords from C2 to C1 one-by-one ‣ Relatedness determined using information gain ‣ Using the Web as a corpus, domain independentMonday, June 6, 2011
  • 170. Off-­‐‑topic  ChaTer C1 -[camcorder] C2 -[electronics forum, hd, camcorder, somethin, ive, canon, little camera, canon hv20, cameras, offtopic] Informative words ‣ [camcorder, canon hv20, little camera, hd, cameras, canon]Monday, June 6, 2011
  • 171. Evaluations  -­‐‑User  Study Keywords from 60 monetizable user posts ‣ Monetizable intent, at least 3 keywords in content 45 MySpace Forums, 15 Facebook Marketplace, 30 graduate students ‣ 10 sets of 6 posts each ‣ Each set evaluated by 3 randomly selected users Monetizable intents? ‣ All 60 posts voted as unambiguously information seeking in intentMonday, June 6, 2011
  • 172. 1.  Effectiveness  of  using   topical  keywords Google AdSenseads for user post vs. extracted topical keywordsMonday, June 6, 2011
  • 173. Instructions  –User  StudyMonday, June 6, 2011
  • 174. Result  -­‐‑2X  Relevant  Impressions Users picked ads relevant to the post ‣ At least 50% inter-evaluator agreement For the 60 posts ‣ Total of 144 ad impressions ‣ 17% of ads picked as relevant For the topical keywords ‣ Total of 162 ad impressions ‣ 40% of ads picked as relevantMonday, June 6, 2011
  • 175. 2.  Profile  Ads  vs.  Activity  Ads User’s profile information ‣ Interests, hobbies, TV shows.. ‣ Non-demographic information Submit a post Looking to buy and why (induced noise) Ads that generate interest, captured attentionMonday, June 6, 2011
  • 176. Result  -­‐‑8X  Generated  Interest Using profile ads ‣ Total of 56 ad impressions ‣ 7% of ads generated interest Using authored posts ‣ Total of 56 ad impressions ‣ 43% of ads generated interest •" Using topical keywords from authored posts ‣ Total of 59 ad impressions ‣ 59% of ads generated interestMonday, June 6, 2011
  • 177. To  note… User studies small and preliminary, clearly suggest ‣ Monetization potential in user activity ‣ Improvement for Ad programs in terms of relevant impressions Evaluations based on forum, marketplace ‣ Verbose content ‣ Status updates, notes, community and event memberships… ‣ One size may not fit allMonday, June 6, 2011
  • 178. To  note… A world between relevant impressions and click throughs ‣ Objectionable content, vocabulary impedance, Ad placement, network behavior In a pipeline of other community efforts No profile information taken into account Cannot custom send information to Google AdSenseMonday, June 6, 2011
  • 179. SENTIMENT  /  OPINION   MININGMonday, June 6, 2011
  • 180. Content  Analysis:  Sentiment   Analysis/Opinion  Mining Two main types of information we can learn from user-generated content: fact vs. opinion Much of what we read in social media (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions.   For example, " Latest news: Mobile web services not working in #Bahrain and Internet is extremely slow #feb14 {fact}... looks like they "learned" from #Egypt {opinion}"Monday, June 6, 2011
  • 181. Sentiment  Analysis  Motivation Why do Which movie What customers people oppose should I see? complain about? health care reform?Monday, June 6, 2011
  • 182. Sentiment  Analysis:  Tasks Example: ‣ How awful that many #Egyptian artifacts are in danger of being destroyed. ‣ What Zahi Hawass must be thinking #jan25 (read in the tone of “what were YOU thinking”Monday, June 6, 2011
  • 183. Sentiment  Analysis:  TasksMonday, June 6, 2011
  • 184. Sentiment  Analysis:  Tasks Classification: overall sentiment polarity: positive/ neutral/negative ‣Example: “How awful that many #Egyptian artifacts are in danger of being destroyed.” ‣overall polarity is negative ‣Target-specific sentiment polarity: positive/neutral/ negative ‣ Example: for target "egyptian artifacts", polarity is "negative“ for target "Zahi Hawass", polarity is "neutral“Monday, June 6, 2011
  • 185. Sentiment  Analysis:  TasksMonday, June 6, 2011
  • 186. Sentiment  Analysis:  Tasks Identification & Extraction: opinion, opinion holder, opinion target Example: opinion="awful", opinion holder="the author", target="egyptian artifacts are in danger" Opinion="must be thinking", opinion holder="the author", target="Zahi Hawass"Monday, June 6, 2011
  • 187. Sentiment  Analysis:  Approaches Classification: ‣ Supervised:  ‣ labeled training data ‣ features, differ from traditional topic classification tasks ‣ learning strategies ‣ Unsupervised: ‣ lexicon-based approach ‣ BootstrappingMonday, June 6, 2011
  • 188. Sentiment  Analysis:  ApproachesMonday, June 6, 2011
  • 189. Sentiment  Analysis:  Approaches Identification & Extraction: ‣utilizing the relations between opinion and opinion target, ‣proximity, ‣syntactic dependency, ‣co-occurrence and ‣prepared patterns/rulesMonday, June 6, 2011
  • 190. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010 Lexicon-based approach for sentiment analysis of tweets: subjective lexicon from OpinionFinder (Wilson et al., 2005) Within topic tweets, count messages containing these positive and negative words defined by the lexiconMonday, June 6, 2011
  • 191. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010 subjective lexicon from OpinionFinder (Wilson et al., 2005) Within topic tweets, count messages containing these positive and negative words defined by the lexiconMonday, June 6, 2011
  • 192. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010 Within topic tweets, count messages containing these positive and negative words defined by the lexiconMonday, June 6, 2011
  • 193. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010  B.O’Connor,  R.Balasubramanyan,  B.R.Routledge,  and   N.A.Smith.  From  Tweets  to  polls:  Linking  text  sentiment  to  public   opinion  time  series.  In  Intl.AAAI  Conference  on  Weblogs  and   Social  Media,  Washington,D.C.,2010.Monday, June 6, 2011
  • 194. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media Corpus: 2.89 million tweets referring to 24 movies released over a period of three months Sentiment Analysis Classifier:  DynamicLMClassifier provided by LingPipe linguistic analysis package thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  • 195. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media Sentiment Analysis Classifier:  DynamicLMClassifier provided by LingPipe linguistic analysis package thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  • 196. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media  DynamicLMClassifier provided by LingPipe linguistic analysis package thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  • 197. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  • 198. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  • 199. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699Monday, June 6, 2011
  • 200. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach Simple  lexicon-­‐‑based  method  doesnʹt  work. Observations: The opinions may not contribute toward the given target (1,2,3,6) The subjectivity and polarity of opinion clues are domain- dependent (5,7) Single words are not enough (4,7,8)Monday, June 6, 2011
  • 201. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach General  subjective  lexicon ‣ Commonly  used  subjective  lexicon  +  popular  slangs  learned  from   Urban  Dictionary Domain-­‐‑dependent  sentiment  lexicon ‣ Learned  from  domain-­‐‑specific  corpus ‣ bootstrapping   ‣ More  than  words  (word/phrase/paMern) ‣ n-­‐‑gram  +  statistical  model  Monday, June 6, 2011
  • 202. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach General  subjective  lexicon ‣ Commonly  used  subjective  lexicon  +  popular  slangs  learned  from   Urban  Dictionary Domain-­‐‑dependent  sentiment  lexicon ‣ Learned  from  domain-­‐‑specific  corpus ‣ bootstrapping   ‣ More  than  words  (word/phrase/paMern) ‣ n-­‐‑gram  +  statistical  model  Monday, June 6, 2011
  • 203. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach Domain-­‐‑dependent  sentiment  lexicon ‣ Learned  from  domain-­‐‑specific  corpus ‣ bootstrapping   ‣ More  than  words  (word/phrase/paMern) ‣ n-­‐‑gram  +  statistical  model  Monday, June 6, 2011
  • 204. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach  Monday, June 6, 2011
  • 205. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  ApproachMonday, June 6, 2011
  • 206. Sentiment  Analysis:  Target-­‐‑ specific  opinion  identification  &   Classification  of  Tweets-­‐‑ Unsupervised  ApproachMonday, June 6, 2011
  • 207. Sentiment  Analysis:  Target-­‐‑ specific  opinion  identification  &   Classification  of  Tweets-­‐‑ Unsupervised  Approach Target-­‐‑specific  opinion  identification/extraction ‣ Shallow  syntactic  analysis ‣ Rules  +  ProximityMonday, June 6, 2011
  • 208. Content  Analysis:  Context   Extraction,  Utilization URL  Extraction  is  for  Tweets FourSquare  in  Facebook,  TwiMer   What  is  it  in  other  mediums/SMS?Monday, June 6, 2011
  • 209. Content  Analysis:   URL  extraction Resolution Semantic Context RelevanceMonday, June 6, 2011
  • 210. Author  Categorization:  Using   Content  to  derive  additional   People  metadata Personality Signals Blogs, Style of Writing Psychometric analysis of content Sample study: Gendered writing styles onlineMonday, June 6, 2011
  • 211. People  Analysis:  Using  Network   to  derive  People  metadata Interesting questions to ask: ‣ Who are the most popular people* in the network ‣ Who are the most influential people in the network ‣ Who are the most active people in the network ‣ What are the types of people in communities of the network ‣ Who are the bridges between communities in the networkMonday, June 6, 2011
  • 212. People  Analysis:  Influence By Link Analysis Algorithms Hits [K-99] & variants   PageRank [BP-97] & variants  etc.. Links not sufficient! ‣ Million Follower Fallacy [C-10] Source : informing-artsMonday, June 6, 2011
  • 213. People  Analysis:  InfluenceMonday, June 6, 2011
  • 214. People  Analysis:  Influence Flavor of Context Analysis (activity level) Popularity NOT = Influence! ‣ Influence & Passivity [RGAH-10] Interest Similarity ‣ TwitterRank: Reciprocity & Homophily [WLJH-10] Klout Score - True Reach, Amplification [Klout]Monday, June 6, 2011
  • 215. People  Analysis:  User  types   &  Affiliation Blogger, Scientist, Journalist, Artist, Trustee, Company X in  Domain Y.. ‣ Multiple types and affiliations! User interest mining ‣ Key Phrase Extraction followed by semantic association on user bio, tweets, lists, favorite posts Source: kahunainstitute.com ‣ Twitter Study [BCDMJNRM-09]Monday, June 6, 2011
  • 216. People  Analysis:  User  types   &  AffiliationMonday, June 6, 2011
  • 217. People  Analysis:  User  types   &  Affiliation Semantic analysis of profile description ‣ Web Presence: Use of Web & Knowledge bases (Wikipedia, Blogs) to build context for user types ‣ Entity Spotting & Extraction, followed by Semantic Association and Similarity with user-type contextMonday, June 6, 2011
  • 218. People  Analysis:   Social  Engagement Source: http://www.syscomminternational.com/ Frequency  Distribution  Analysis  of  user  activity ‣ posting,  retweet,  reply,  mentions,  lists  etc.  Monday, June 6, 2011
  • 219. Network  Analysis   Foundation  of  network:   •Nodes •Connections/Relationships Interesting  questions  to  ask: How  communities  form  around  topics-­‐‑  growth  &  evolution   What  are  the  effects  of  presence  of  influential  participants  in  the   communities What  are  the  effects  of  content  nature  (or  sentiment,  opinions)   flowing  in  network  on  the  community  life What  is  the  community  structure:  degree  of  separation  and  sub-­‐‑ communitiesMonday, June 6, 2011
  • 220. Network  Analysis:  Methods Source: http://www.kudos- dynamics.com/Monday, June 6, 2011
  • 221. Network  Analysis:  Methods Network  Structure  metrics Centrality,  Connected  Component,  Avg.   Degree,  Clustering  Coefficient,  Avg.  Path  Length,   Bridge,  Cohesion,  Prestige,  Reciprocity   Important  Literature:                   [AB-­‐‑02,  WS-­‐‑98,    BW-­‐‑00;  NW-­‐‑06,  WF-­‐‑92,  MW-­‐‑10] Source: http://www.kudos- dynamics.com/Monday, June 6, 2011
  • 222. Network  Analysis:  Algorithms   Community Discovery, growth, evolution ‣ Based on relationship types (e.g., signed network), geography/location based etc. Hierarchical clustering algorithms – Top-down, bottom-up Modularity Maximization [NW-06] Algorithms comparison survey [B-06]Monday, June 6, 2011
  • 223. Network  Analysis:  Algorithms   Graph Partitioning & Traversal Best time-complexity & reachability Follow Greedy paths ‣ K-way multilevel Partitioning , ‣ Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS, MST "ʺWe  dream  in  Graph  and   We  analyze  in  Matrix”-­‐‑   Barry  Wellman,  INSNA  Monday, June 6, 2011
  • 224. Network  Analysis:  Methods Network Modeling Approaches  ‣ Random graph model (Erdos-Renyi model) ‣ Small-world model (Small World Phenomenon)  ‣ Scale-free model (led to Power-Law degree distribution) ‣ Social Network Analysis methods ‣ Centrality (Degree, Eigenvector, Betweenness, Closeness) ‣ Clusters (Cliques and extensions, Communities) Source: http://www.kudos- dynamics.com/Monday, June 6, 2011
  • 225. Network  Analysis:   Diffusion  &  Homophily Information Flow: Diffusion ‣ Maximizing Spread (Opinion, Innovation, Recommendation) ‣ Outbreak Detection (e.g., disease) Social Network: No info about user action– Understanding dynamics is challenging! Power Law distribution [LAH-07] Factors impacting flow: ‣ Sampling strategy, user Homophily, content nature [CLSCK-10, NPS-10]Monday, June 6, 2011
  • 226. QueryingMonday, June 6, 2011
  • 227. Analysis  &  Visualization  Tools (Network WorkBench)NWB Truthy Graph-tool Orange Pajek Source:  hMp://truthy.indiana.edu/ Tulip http://en.wikipedia.org/wiki/ social_network_analysis_softwareMonday, June 6, 2011
  • 228. Event  DetectionMonday, June 6, 2011
  • 229. Citizen  Sensing  in  Real-­‐‑timeMonday, June 6, 2011
  • 230. Real-­‐‑Time  Motivation People cant wait for Information 500 years ago ‣ Single life time 20 years ago ‣ Next day or two ‣ Television, News papers Presently ‣ Minutes are not considered fast enough ‣ Digital media, Social media Monday, June 6, 2011
  • 231. Real-­‐‑Time  Social  Media Is Real-Time the future of Web? Social Media for Real-Time Web ‣ Disaster Management ‣ Ushahidi ‣ Real-Time Markets ‣ Examples ‣ Brand Tracking ‣ Twarql ‣ Movie reviewsMonday, June 6, 2011
  • 232.            Scenario The  Guardian Feb  2010Monday, June 6, 2011
  • 233.            Scenario The  Guardian Feb  2010Monday, June 6, 2011
  • 234.            Scenario The  Guardian Feb  2010 JournalistMonday, June 6, 2011
  • 235. Challenges Information Overload ‣ Can we aggregate, organize and collectively analyze data Real Time ‣ Can we deliver the data as it is generatedMonday, June 6, 2011
  • 236. A  Semantic  Web  Approach Expressive description of Information need ‣ Using SPARQL (Instead of traditional keyword search)  Flexibility on the point of view ‣ Ability to "slice and dice" the data in several dimensions: thematic, spatial, temporal, sentiment etc.. Streaming data with Background Knowledge ‣ Enables automatic evolution and serendipity Scalable Real-Time delivery  ‣ Using sparqlPuSH (SFSW10)Monday, June 6, 2011
  • 237. Concept  FeedMonday, June 6, 2011
  • 238. ArchitectureMonday, June 6, 2011
  • 239. Social  Sensor  ServerMonday, June 6, 2011
  • 240. Metadata  Extractions     (Social  Sensor  Server) Named Entity Recognition ‣ 2 Million Entities from DBPedia ‣ Load as Trie for efficiency ‣ N-grams matched ‣ Example: Obama, Barack ObamaMonday, June 6, 2011
  • 241. Metadata  Extractions     (Social  Sensor  Server) URL, HashTag Extraction ‣ Regex extraction ‣ Resolution ‣ URL Resolution: Follows http redirects for resolution ‣ HashTag Resolution: Tagdef, Tagal,WTHashTag.comMonday, June 6, 2011
  • 242. Metadata  Extractions     (Social  Sensor  Server)Monday, June 6, 2011
  • 243. Metadata  Extractions     (Social  Sensor  Server) Other Metadata provided by Twitter ‣ User profile: User Name, Location, Time etc.. ‣ Tweet: RT, reply etc..Monday, June 6, 2011
  • 244. Structured  Data (Social  Sensor  Server) RDF Annotation ‣ Common RDF/OWL Vocabularies ‣ FOAF - (foaf-project.org) Friend of a Friend ‣ SIOC - (sioc-project.org) Semantically Interlinked Online Communities ‣ OPO - (online-presence.net) Online Presence Ontology ‣ MOAT - (moat-project.org) — Meaning Of A TagMonday, June 6, 2011
  • 245. Structured  Data (Social  Sensor  Server)Monday, June 6, 2011
  • 246. Structured  Data (Social  Sensor  Server) A snippet of the annotation <http://twitter.com/ bob/statuses/123456789>   rdf:type   sioct:MicroblogPost ;   sioc:content  ”Fingers crossed for the upcoming #hcrvote”   sioc:hascreator   <http://twitter.com/bob> ;   foaf:maker    <http://example.org/bob> ;   moat:taggedWith   dbpedia:Healthcare_reform . <http://twitter.com/bob> geonames:locatedIn Dbpedia:Ohio .Monday, June 6, 2011
  • 247. Semantic  PublisherMonday, June 6, 2011
  • 248. Semantic  Publisher Virtuoso to store triples Queries formulated by the users are stored SPARQL protocol over the HTTP to access rdf from the store Combine data from tweet with the background knowledge in the rdf store Monday, June 6, 2011
  • 249. Application  Server  &  Distribution   HubMonday, June 6, 2011
  • 250. Application  Server  &  Distribution   Hub Distribution  Hub ‣  PUSH  Model  -­‐‑  Pubsubhubbub  protocol ‣  Pushes  the  tweets  to  the  Application  Server Application  Server ‣  Delivers  data  to  the  Clients ‣  RSS  Enable  Concept  feedsMonday, June 6, 2011
  • 251. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) @anonymized Lorem ipsum bla bla this is an example tweet ?category skos:subject ? skos:subject competitor skos:subject moat:taggedWith dbpedia:IPad ?tweetMonday, June 6, 2011
  • 252. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) ?category skos:subject ? skos:subject competitor bla this is an example tweet @anonymized skos:subject Lorem ipsum bla moat:taggedWith dbpedia:IPad ?tweetMonday, June 6, 2011
  • 253. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) category:Wi-Fi category:Touchscreen ?category skos:subject ? skos:subject competitor bla this is an example tweet @anonymized skos:subject Lorem ipsum bla moat:taggedWith dbpedia:IPad ?tweetMonday, June 6, 2011
  • 254. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) IPhone HPTabletPC category:Wi-Fi category:Touchscreen ?category skos:subject ? skos:subject competitor bla this is an example tweet @anonymized skos:subject Lorem ipsum bla moat:taggedWith dbpedia:IPad ?tweetMonday, June 6, 2011
  • 255. 1242  Articles  from  Nytimes Around  800,000  tweetsMonday, June 6, 2011
  • 256. President  Obama   1242  Articles  from  Nytimes lays  out  plan  for   Around  800,000  tweets Health  care  reform   in  Speech  to  Joint   Session  of  Congress   (10th  Sept   Timeline.com)Monday, June 6, 2011
  • 257. President  Obama   1242  Articles  from  Nytimes lays  out  plan  for   Around  800,000  tweets Health  care  reform   in  Speech  to  Joint   Session  of  Congress   (10th  Sept   Timeline.com) Obama  taking  an   active  role  in  Health   talks  in  pursuing  his   proposed  overhaul   of  health  care   system.  (13th  Aug  Monday, June 6, 2011
  • 258. Twarql  on  Linked  Open  DataMonday, June 6, 2011
  • 259. Twarql  on  Linked  Open  DataMonday, June 6, 2011
  • 260. Emerging  Research  Areas  Monday, June 6, 2011
  • 261. Spam  in  Social  Networks Reasons for spamming include: ‣ Gaining Popularity ‣ Use of popular topic related keywords (e.g. hashtags of trending topics) to propagate something off topic. Launching malicious attacks ‣ Phishing attacks, virus, malware etc. ‣ Misleading the masses ‣ Propagating false information [MM-10].Monday, June 6, 2011
  • 262. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website.Monday, June 6, 2011
  • 263. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website.Monday, June 6, 2011
  • 264. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website.Monday, June 6, 2011
  • 265. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  • 266. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  • 267. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  • 268. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  • 269. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  • 270. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt ProtestsMonday, June 6, 2011
  • 271. Spam  in  Social  Networks Spam detection ‣ Content-based features ‣ Content Size, URL type, spam words ‣ Metadata-based features ‣ Account information, behavior. ‣ Network-based features ‣ Provenance. (e.g. content from a reliable source)Monday, June 6, 2011
  • 272. Trust  in  Social  Networks Reputation, Policy, Evidence, and Provenance used to derive trustworthiness. Illustrative examples of online cues used for trust assessment. ‣ Wikipedia: article size, number of references, author, edit history, age of the article, edit frequency etc. ‣ Product Reviews: number of helpful, very helpful ratings, author expertise, sentiments in comments received for a review etc.Monday, June 6, 2011
  • 273. Trust  in  Social  Networks We propose trust ontology[AHTS-10] that ‣ Captures semantics of trust. ‣ Enables representation and reasoning with trust. Semantics of Trust specifies, for a given trustor and trustee, the following features. ‣ Type - Type of trust relationship. ‣ Scope - Context of the trust relationship. ‣ Value - Quantifies the trust relationship.Monday, June 6, 2011
  • 274. Trust  in  Social  Networks Gleaning primitive (edge) trust ‣ Trust value between two nodes is quantified using numbers. E.g., [0,1] or [-1,1] or partial ordering[TAHS-09]. Gleaning composite (path) trust ‣  Propagation via chaining and aggregation (transitivity) Some popular algorithms for trust computation  ‣ Eigentrust, Spreading Activation, SUNNY etc.Monday, June 6, 2011
  • 275. Integrating  Social  And   Sensor  Networks Machine sensor observations are quantitative in nature, while human observations can be both qualitative and quantitative. Benefits of combining observations from humans and machine sensors ‣ Complementary evidence. ‣ Corroborative evidenceMonday, June 6, 2011
  • 276. Integrating  Social  And   Sensor  Networks Applications of integrating heterogeneous sensor observations ‣ Situation Awareness by using  human observations to interpret machine sensor observations. ‣ Enhancing trustworthiness using corroborative evidence.Monday, June 6, 2011
  • 277. Mobile  Social  Computing Instant  Discovery:  Geo-­‐‑tagging  and  location-­‐‑ aware  services,  in  combination  with  search,  have   made  discovery  a  two-­‐‑way  street. Compressed  Expression:  Mobile  makes  social   networking  even  more  compelling Outsourced  Memory:  Cloud-­‐‑based  servers  to   store  all  of  their  mobile  applications  and   databasesMonday, June 6, 2011
  • 278. Mobile  Social  Computing Compressed  Expression:  Mobile  makes  social   networking  even  more  compelling Outsourced  Memory:  Cloud-­‐‑based  servers  to   store  all  of  their  mobile  applications  and   databasesMonday, June 6, 2011
  • 279. Mobile  Social  Computing Outsourced  Memory:  Cloud-­‐‑based  servers  to   store  all  of  their  mobile  applications  and   databasesMonday, June 6, 2011
  • 280. Mobile  Social  ComputingMonday, June 6, 2011
  • 281. Mobile  Social  ComputingMonday, June 6, 2011
  • 282. Mobile  Social  Computing Automated Decisions: Smart apps helps to make faster decisions or even apps makes decisions for us Peer Power: Mobiles can create social movements based on peer influenceMonday, June 6, 2011
  • 283. Mobile  Social  Computing  (Cont.) Personalized Branding: advertising are rapidly becoming personalized based on individuals needs and preferences  Mobiles in social development becoming an integral part of development  ‣ Coordination in disaster situations ‣ Health care delivery, especially in developing countries ‣ Elections and other forms of political expressionMonday, June 6, 2011
  • 284. Research  Application:  TwitrisMonday, June 6, 2011
  • 285. Twitris  -­‐‑  Motivation 1. Information Overload Multiple events around us WHAT to be aware of Multiple Storylines about same event!!Monday, June 6, 2011
  • 286. Twitris  -­‐‑  Motivation 2. Evolution of Citizen Observation ‣ with location and time Monday, June 6, 2011
  • 287. Twitris  -­‐‑  Motivation   3. Semantics of Social perceptions ‣ What is being said about an event (theme) ‣ where (spatial) ‣ When (temporal ) Twitris lets you browse citizen reports using social perceptions as the fulcrumMonday, June 6, 2011
  • 288. Twitris:  Semantic  Social  Web   Mash-­‐‑up Facilitates  understanding  of  multi-­‐‑dimensional  social  perceptions  over   SMS,  Tweets,  multimedia  Web  content,  electronic  news  mediaMonday, June 6, 2011
  • 289. Twitris:  ArchitectureMonday, June 6, 2011
  • 290. Twitris:  Functional   OverviewMonday, June 6, 2011
  • 291. Twitris:  Functional   OverviewMonday, June 6, 2011
  • 292. Twitris:  Event  Summarization  1Monday, June 6, 2011
  • 293. Twitris:  Event  Summarization  2   Sentiment Analysis ‣ using statistical and machine learning techniques Monday, June 6, 2011
  • 294. Twitris:  Event  Summarization  3 Entity-relationship graph  ‣ using semantically annotated DBpedia entities mentioned in the tweets Monday, June 6, 2011
  • 295. Twitris:  Demo,  Quick  Show   http://twitris.knoesis.org/ http://knoesis1.wright.edu/sidfot/Monday, June 6, 2011
  • 296. Twitris:  On  going  workMonday, June 6, 2011
  • 297. Twitris:  Knowledge-­‐‑Enabled   Computing Domain models to enhance understanding of the contentMonday, June 6, 2011
  • 298. Twitris:  Coordination Great role in military and NGO rescue operations during emergencies: Haiti and Chile EarthquakesMonday, June 6, 2011
  • 299. Twitris:  Coordination Coordinating needs and resources in disaster situation ‣ Analyze SMS and Web reports from disaster location ‣ Use domain models for efficient and timely coordinationMonday, June 6, 2011
  • 300. Twitris:  Socio-­‐‑Cultural-­‐‑Behavior       Model  as  Lens Modeling relationships between social behavior, roles, social and cultural values, etc.Monday, June 6, 2011
  • 301. Collaboration We “simply do not have enough genes to program the brain fully in advance,” we must work together, extending and supporting our own intelligence with “social prosthetic” systems that make up for our missing cognitive and emotional capacities: “Evolution has allowed our brains to be configured during development so that we are ‘plug compatible’ with other humans, so that others can help us extend ourselves.” - Harvard "Group Brain Project"Monday, June 6, 2011
  • 302. Beginnings Open Source  ‣ Linux, Apache, ... Social Networks ‣ Facebook, Twitter, ... Crowd Sourcing ‣ Wikipedia, Kiva, Ushahidi, Kiirti, SwiftRiver, Sahana... Collaborative Governance Peer-to-Patent, ...Monday, June 6, 2011
  • 303. http://gomadam.org/tutorial @namelessnerdMonday, June 6, 2011
  • 304. Popular  Initiatives Facebook + Twitter ‣ Iran post-election protests ‣ Tunisia, Egypt, Libya, Bahrain, ...    Ushahidi ‣ Kenya Violence ‣ India, Lebanon, Afghanistan, and Sudan elections ‣ Haiti Earthquake ‣ Pakistan FloodsMonday, June 6, 2011
  • 305. Popular  Initiatives Kiirti ‣ BBMP election monitoring ‣ Bangalore AutoWatch Monday, June 6, 2011
  • 306. FixOurCity  Process  Flow FixOurCity allows citizens to report, view and discuss civic issues in their locality.Monday, June 6, 2011
  • 307. FixOurCity  Backend Built on top of FixMyCity open-source codebase Stage I ‣  Report by Area/Ward and Street ‣  Integration with Google Map ‣ Displays Ward member name/contact details ‣ Select category of issue, description and severity ‣ Confirmation through email to avoid misuseMonday, June 6, 2011
  • 308. FixOurCity  Backend Stage II/III ‣  Normalize incoming reports to official wards and categories ‣ Integration with Corporation website to allow auto- forwarding and updating of reportsMonday, June 6, 2011
  • 309. Ushahidi  Features Information Collection: SMS (FrontlineSMS, Clickatell), Email, Web Visualization/Interactive Mapping: Timeline, Category, Geo-spatial Alerts: Geo-spatial Admin: User Management, Report Moderation / Creation, Site StatisticsMonday, June 6, 2011
  • 310. SwiftRiver  Architecture  -­‐‑  I Enables filtering and verification of real-time data from channels like Twitter, SMS, Email and RSS feeds.Monday, June 6, 2011
  • 311. Kiirti  Features Kiirti  allows  you  to  set  up  your  own  instance  of   the  Ushahidi  Platform  without  having  to  install   it  on  your  own  web  server.  And,  it  provides  pre-­‐‑ integrated  Voice  and  SMS  reporting  capabilities   within  India.Monday, June 6, 2011
  • 312. Kiirti  -­‐‑  Flywheel  of  EngagementMonday, June 6, 2011
  • 313. Sahana  Features Sahana: a Free and Open Source Disaster Management system. A web based collaboration tool that addresses the common coordination problems during a disaster between Government groups, the civil society (NGOs) and the victims themselves.Monday, June 6, 2011
  • 314. Sahana  FeaturesMonday, June 6, 2011
  • 315. Sahana  Features Requests Management: Tracks requests for aid and matches them against donors who have pledged aid. Volunteer Management: Manage volunteers by capturing their skills, availability and allocation.Monday, June 6, 2011
  • 316. Sahana  Features Volunteer Management: Manage volunteers by capturing their skills, availability and allocation.Monday, June 6, 2011
  • 317. Sahana  FeaturesMonday, June 6, 2011
  • 318. Sahana  FeaturesMonday, June 6, 2011
  • 319. Sahana  Features Missing Persons Registry: Report and Search for Missing Persons. Disaster Victim Identification. Shelter Registry - Tracks the location, distribution, capacity and breakdown of victims in Shelters.Monday, June 6, 2011
  • 320. Sahana  Features Hospital Management System - Hospitals can share information on resources & needs. Organization Registry - "Who is doing What & Where". Allows relief agencies to coordinate their activities. Ticketing - Master Message Log to process incoming reports & requests. Delphi Decision Maker - Supports the decision making of large groups of Experts.Monday, June 6, 2011
  • 321. Sahana  Features Organization Registry - "Who is doing What & Where". Allows relief agencies to coordinate their activities. Ticketing - Master Message Log to process incoming reports & requests. Delphi Decision Maker - Supports the decision making of large groups of Experts.Monday, June 6, 2011
  • 322. Sahana  Features Ticketing - Master Message Log to process incoming reports & requests. Delphi Decision Maker - Supports the decision making of large groups of Experts.Monday, June 6, 2011
  • 323. Sahana  Features Delphi Decision Maker - Supports the decision making of large groups of Experts.Monday, June 6, 2011
  • 324. Sahana  FeaturesMonday, June 6, 2011
  • 325. Sahana  FeaturesMonday, June 6, 2011
  • 326. Sahana  FeaturesMonday, June 6, 2011
  • 327. Sahana  Features Mapping - Situation Awareness & Geospatial Analysis. Messaging - Sends & Receives Alerts via Email & SMS. Document Library - A library of digital resources, such as Photos & Office documents.Monday, June 6, 2011
  • 328. Peer  to  Patent Peer To Patent is a historic initiative by the United States Patent and Trademark Office (USPTO) that opens the patent examination process to public participation for the first time. Peer to Patent is an online system that aims to improve the quality of issued patents by enabling the public to supply the USPTO with information relevant to assessing the claims of pending patent applications.Monday, June 6, 2011
  • 329. Twitris  Architecture Twitris  2.0,  a  Semantic  Web  application  that   facilitates  understanding  of  social  perceptions  by   Semantics-­‐‑based  processing  of  massive  amounts   of  event-­‐‑centric  data.  Twitris  2.0  addresses   challenges  in  large  scale  processing  of  social   data,  preserving  spatio-­‐‑temporal-­‐‑thematic   properties.  Monday, June 6, 2011
  • 330. Future  Possibilities Online Dispute Resolution ‣ 30M+ pending cases in Indias courts Public Policy Reviews Crisis Management Effective Local GovernanceMonday, June 6, 2011
  • 331. References http://www.nascio.org/events/2009Midyear/documents/NASCIO- KeynoteNoveck.pdf http://citizensensing.posterous.com/ [MM-10] Eni Mustafaraj, Panagiotis Metaxas, From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search, In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line (April 2010). [AHTS-10] Pramod Anantharam, Cory A. Henson, Krishnaprasad Thirunarayan and, Amit P. Sheth, Trust Model for Semantic Sensor and Social Networks: A Preliminary Report, National Aerospace & Electronics Conference (NAECON), Dayton Ohio, July 14-16th, 2010. [TAHS-09] K. Thirunarayan, Dharan K. Althuru, Cory A. Henson, and Amit P. Sheth, A Local Qualitative Approach to Referral and Functional Trust, In: Proceedings of the The 4th Indian International Conference on Artificial Intelligence (IICAI-09), pp. 574-588, December 2009.Monday, June 6, 2011
  • 332. References B.O’Connor, R.Balasubramanyan, B.R.Routledge, and N.A.Smith. From Tweets to polls: Linking text sentiment to public opinion time series.In International AAAI Conference on Weblogs and Social Media, Washington,D.C.,2010. Sitaram Asur and Bernardo A.Huberman. Predicting the Future With Social Media. 2010. http://arxiv.org/abs/1003.5699 A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009 M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009, Poland Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social Intelligence in a Real-Time Dashboard System to appear in a special issue of the VLDB Journal on Data Management and Mining for Social Networks and Social Media, 2010Monday, June 6, 2011
  • 333. References A. Sheth, C. Thomas, and P. Mehra, Continuous Semantics to Analyze Real-Time Data, IEEE Internet Computing, November-December 2010, pp. 80-85 [NPS-10] M. Nagarajan, H. Purohit, and A. Sheth.  A Qualitative Examination of Topical Tweet and Retweet Practices, 4th Intl AAAI Conference on Weblogs and Social Media, ICWSM 2010 [RGAH-10] D. Romero, W. Galuba, S. Asur, and B. Huberman. Influence and Passivity in Social Media. Arxiv preprint, arXiv:1008.1253, 2010 [LLDM-10] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29{123, 2009. [CHBG-10] M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi. Measuring user influence in twitter: The million follower fallacy. In ICWSM04, 2010. [BP-98] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, Vol 30, 1-7, 1998.Monday, June 6, 2011
  • 334. References [K-99] Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (5): 604 -632, 1999. [AB-02] R. Albert and A.L. Barabasi. Statistical Mechanics of Complex Networks. Rev. Modem Physics, vol. 74, no. 1, pp. 47-97, 2002. [WLJH-10] Jianshu Weng and Ee-Peng Lim and Jing Jiang and Qi He. TwitterRank: nding topic-sensitive influential twitterers. WSDM, 2010. [BCDMJNRM-09] N. Banerjee, D. Chakraborty, K. Dasgupta, S. Mittal, A. Joshi, S. Nagar, A. Rai, and S. Madan. User interests in social media sites: an exploration with micro-blogs. CIKM 09. [RCD-10] A. Ritter, C. Cherry, and B. Dolan. 2010. Unsupervised modeling of Twitter conversations. InHuman Language Technologies: ACL (HLT 10). [WS-10] D.J. Watts; S.H. Strogatz. Collective dynamics of small-world networks. Nature 393 (6684): 409–10, 1998Monday, June 6, 2011
  • 335. References [NW-06] M. E. J. Newman, D. J. Watts The structure and dynamics of network, Princeton University Press, 2006 [WF-92] Wasserman & Faust, Social Network Analysis, 1992 [EK-10] D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010 [MW-10] A. Marin and B. Wellman. Handbook of Social Network Analysis, 2010 [B-06] H. Balakrishnan. Algorithms for Discovering Communities in Complex Networks. Ph.D. Dissertation. University of Central Florida, Orlando, FL, USA. Advisor(s) Narsingh Deo. 2006 [CLSCK-10] M. D. Choudhury, , Y-R. Lin, H. Sundaram, K. S. Candan, L. Xie, A. Kelliher. How Does the Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?. ICWSM 2010 [LAH-07] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM Trans. Web 1, 1, Article 5, May 2007.Monday, June 6, 2011