Citizen SensingOpportunities and Challenges in Mining Social Signals and PerceptionsAmit P. Shethamit@knoesis.orgLexisNexis Ohio Eminent ScholarOhio Center of Excellence in Knowledge enabled Computing (Kno.e.sis)Wright State University, Dayton, OHhttp://knoesis.orgThanks: Kno.e.sis team, esp. Wenbo Wang, Chen Lu, Cory, Hemant, Pavan
Ohio Center of Excellence in Knowledge-enabled ComputingSemantics as core enabler, enhancer @ Kno.e.sisone of the two largest academic groups in Semantic Web; multidisciplinary
Semantics & Semantic Web in 1999-2002
ATTRIBUTE & KEYWORDQUERYINGSEMANTIC BROWSINGuniform view of worldwide distributed assets of similar typeTargeted e-shopping/e-commerceassets accessTaalee Semantic/Faceted Search & Browsing (1999-2001)BLENDED BROWSING & QUERYINGTaalee Semantic Search ….
Links to news on companies that compete against Commerce OneLinks to news on companies Commerce One competes against(To view news on Ariba, click on the link for Ariba)Crucial news on Commerce One’s competitors (Ariba) can be accessed easily and automaticallySearch for company ‘Commerce One’Semantic Search/Browsing/Directory (2001-….)
Semantic Search/Browsing/Directory (2001-….)System recognizes ENTITY & CATEGORYRelevant portionof the Directory is automatically presented.
Semantic Search/Browsing/Directory (2001-….)Users can exploreSemantically relatedInformation.
Extracting Semantic Metadata from Semistructured and Structured Sources (1999 – 2002)Semagix Freedom for building ontology-driven information systemManaging Semantic Content on the Web
Fast forward to 2010-2011
SearchIntegrationAnalysisDiscoveryQuestion   AnsweringSituational    AwarenessDomain ModelsPatterns / Inference / ReasoningRDBRelationship WebMetadata ExtractionMeta data / Semantic AnnotationsText(formal/Informal)Structured and Semi-structured dataSensor DataMultimedia Content and Web data
Let Us Start with Social Data
Jan. 2011Egypt ProtestImage:http://bit.ly/g4yPXSImage:http://bit.ly/qHI7wIImage:http://bit.ly/qmDocA
http://bit.ly/nP1E4qImage:http://bit.ly/fl4gEJMar. 2011 http://cnet.co/jdQgMEJapan Earthquake and Tsunamihttp://bit.ly/gWboibRecently funded NSF proposal: Social Media Enhanced  Organizational Sensemakingin Emergency Response
Image:http://bit.ly/nqq6WjJul. 2011I-75 Traffic Jam in US
Citizen SensingWho?An interconnected network of peopleWhat?Observe, report, collect, analyze, and disseminate informationHow?Via text, audio, video and built in device sensor (and smart devices)Image: http://bit.ly/nvm2iP "Citizen Sensing, Social Signals, and Enriching Human Experience”, Internet Computing, July-Aug. 2009
Enablers: Mobile Devices & Ubiquitous ConnectivityMobile Platforms Hit Critical Mass, Over 5 billions users 1+B with internet connected mobile devices (2010)Smartphones > PCs + Notebooks > Notebooks + Netbooks (2010E)500K+ mobile phone applications74% of mobile phone users (2.4B)worldwide used SMS (2007)Mobile is Global; Ubiquitous; 24x7Built in sensorsenvironmental,   biometric/biomedical,...Image: http://bit.ly/mYqcPF
Enablers: Web 2.0 & Social MediaA huge number of users750M+ active Facebook Users1+B tweets/wk; 175M+ Twitter usersInternet Users:2BlnImage: http://bit.ly/euLETTImage: http://bit.ly/euLETTImage: http://bit.ly/euLETTImage: http://bit.ly/euLETTImage: http://bit.ly/euLETT
Role of Semantics in Citizen SensingKey of citizen sensing: extract metadata/annotate different types of metadata (depend on application need)Spatial, temporal, thematic: key phrase, named entity, relationship, topic/category, event descriptors, sentiment…People, network, contentSemantics: provide the meaning of datavarious forms of semantic models: core vocabularies/nomenclatures,community created dictionaries/folksonomies/reference databases, automatically extracted domain models, manually created taxonomies, formal ontologiesdeal with complexities of user generated data; supplement well-known statistical and natural language processing (NLP) techniques
Research Application: Twitris
Twitris - MotivationImage: http://bit.ly/etFezlWhat were people in U.S.A. saying about 		Bin Laden’s death?How about people in Egypt?How about people in India?TOO MANY tweets to be read each day!!!TwitrisNow: WHEN, WHERE, People are talking WHATFuture: socio, cultural, behavioral studies‘Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data – Challenges and Experiences’, 2009
Twitris: Semantic Social Web Mash-upSelect topicSelect dateN-gram summariesTopic treeSpatial MarkerTweet trafficSentiment AnalysisImages & VideosReference newsRelated tweetsWikipedia articlesMore: TWITRIS
Analyzing Events from Temporal PerspectiveHow did tweets in United States on the death of Bin Laden evolve over time?May 2ndMay 4th“RT @TWlTTERWHALE: Please do not click on any links saying Osama Bin Laden EXECUTION Video! This is a virus that hacks accounts. ”RT @ReallyVirtual: Here's a picture of OBL'shideout in Abbottabad, as shared by a friend @Rahat http://yfrog.com/h7w4izmjImg:http://www.twitpic.com/4t1mt0
Analyzing Events from Spatial PerspectiveTweets (Death of Bin Laden) in Egypt VS tweets in IndiaMay 2ndEgyptMay 2ndIndia“RT @mvatlarge: U.S. has given Pakistani military nearly $20 billion since 9/11 for the privilege of housing bin Laden: http://is.gd/xegnFm”“#Egypt foreign minister: Egygov't has no official comment but we condemn all forms of violence in international relations. #osama #obl”
A sample of current research @ Kno.e.sis demonstrating role of semantics in Citizen Sensing & Social Media Analysis
User-communityEngagement Analysis
User-community EngagementHow do we understand  the phenomenon of user participation (engagement) in topic discussions?How communities form during the product launch?What factors can attract users to engage in these communities, therefore further spreading the message?How quickly we can disseminate information between resource providers and people in need of resources in case of emergency?Image: http://itcilo.wordpress.com
Analysis Framework:People-Content-network Analysis (PCNA)28Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter. SoME @ WWW2011.
Three Sets of FeaturesWhich set is more important?Content features[Characteristics of tweets posted by active friends of U]:keywords: number of event-relevant keywordshashtags: number of event-relevant hashtagsretweet: number of retweetsmention: number of mentionsurl: number of relevancy-adjust hyperlinksIrrelevant hyperlink is given number -1subjectivity: Subjectivity scores for words and emoticonsLinguistic Cues (LIWC1 analysis): Features for the language usage. Top-3 transformed features using Principle Component Analysis (PCA) extractedCommunity features: [Characteristics of the active community/network under consideration]wccSize: size of the weakly-connected component (WCC) which U’s friends belongs to in the active network.wccPercent: ratio of wccSizeto the size of the active network.connectivity: number of active friends (i.e. followees) in the community.communitySize: size of the active community.Author features [Characteristics of friends that U is following]:Only friends in the active community are considered.logFollower: logarithm of follower countlogFollowee: logarithm of followee countKlout[1]: a integrated measure of user influence and popularityOther profile information and activity history[2].
Experiments: ResultsContent is the key to understand user community engagementHighLowPerformanceNetworkContentAllPeopleEvent-TypeSummary of Prediction Accuracy (%)Statistical significant results are in bold30
Background Knowledge to improve Social Data AnalysisEvent oriented CommunityExternal Knowledge basesDynamic Domain Model for the eventSEMANTIC ASSOCIATION TO UNDERSTAND ENGAGEMENT LEVEL &IMPROVE IESocial NetworkUser ProfilesMined User Interests and User Types
Analysing the Content can be Hard…Is Merry Christmas a song? If it is, which ‘Merry Christmas’since there are 60 songs of the same name.‘So Good’ is also a song!Using a domain model (E.g., MusicBrainz)Using context cues from the contente.g. new Merry Christmas tuneReduce potential entity spot size  (with restrictions)         e.g. new albums/songs  ‘Multimodal Social Intelligence in a Real-Time Dashboard System’ VLDB Journal 2010
Real Time Social Media Data Analysis
MotivationPeople can’t wait for InformationDisaster ManagementUshahidi (www.ushahidi.org)Real-Time MarketsRealTimeMarkets (http://www.realtimemarkets.com/) Brand TrackingTwarql (http://wiki.knoesis.org/index.php/Twarql)Movie reviewsFlicktweets (www.flicktweets.com)Journalism
ScenariosBrand TrackingGive me a stream of locations whereKinectis being mentioned right nowGive me all people that have said negative things aboutKinectHow can we do this?
Twarql (Twitter Feeds through SPARQL)Semantically annotate tweets with entities, hashtags, URLs, sentiments, etc.Encode content in a structured format (RDF) using shared vocabularies (FOAT, SIOC, MOAT, etc.)Structured querying of tweetsSubscribe to a stream of tweets that match a given queryReal-time delivery of streaming data.More: TWARQL
Twarql Architecture
Back to the ScenarioGive me a stream of locations where Kinect is being mentioned nowGive me all people that have said negative things about Kinect
Dynamic Domain Models for Semantic Analysis of Real-Time DataakaContinuous Semantics
MotivationSemantic processing using a model of the domainBut it is difficult to model dynamic domains on social webspontaneous (arising suddenly)real-time data requiring continuous searching and analysisdistributed participants with fragmented and opinionated informationdiverse viewpoints involving topical or contentious subjectsfeature context colored by local knowledge as well as perceptions based on different observations and their socio-cultural background.More: Continuous Semantics
Dynamic Model CreationHeliopolis is a suburb of Cairo.
Dynamic Evolving Models to underpin Semantics“Reports from Azadi Square - 4 people killed by police, people killed police who shot. More shots being fired #iranelections”“Both Ahmadinejad& Mousavideclare victory in Iranian Elections.”“situation in tehran University is so worrisome. police have attacked to girls dormitory #tehran #iranelection”EventsJune 13 2009June 15 2009June 12 2009Ahmadinejad & Mousavi are politicians in IranTehran University is a University in IranAzadi Square is a city square in TehranKey phrasesModels
Sentiment/Opinion Extraction
ChallengesDomain/Topic-dependency: spotting the target of the sentiment is as important as finding sentiment itselfE.g., “longriver” (no sentiment), “long battery life” (positive), or “long time for downloading” (negative).Context-awareness : encoding the context information into the extracted sentimentE.g., “mustwatch a movie today” (no sentiment) and “this movie is a must see” (positive). Informal language (abbreviations, misspelling, slang...): using Urban Dictionary
The Usage of Background Knowledge
Real Time Feature Stream
Real-Time Sensor, Social, Multi-media data2010’sDynamic User Generated Content2000’s1990’sStatic Document and files Web DATA evolved over time
Semantic AbstractionDoes -1,-2,-4,-4 make any sense to you?Overwhelming amount of raw sensor data doesn’t make much sense to decision makersFreezing temperatureRainWhat if the data is from sensors on a highway?Freezing temperature + rain => icy roadClose the highway? OR 	Spread salt on the road to melt ice?So what?Semantic Sensor Web Demo
A cross-country flight from New York to Los Angeles on a Boeing 737 plane generates a massive 240 terabytes of data- GigaOmni MediaBut a pilot or a ground engineer at the destination is interested in very smallnumber of events and associated observational data that are relevant to their work.Higginbotham, S. (2010, September). Sensor Networks Top Social Networks for Big Data.Gigaom.com. http://gigaom.com/cloud/sensor-networks-top-social-networks-for-big-data-2/.49
Background  KnowledgeBlizzardRain StormABSTRACTIONHuge amount of Raw Sensor DataFeatures representing Real-World eventsAbstractionMore: Semantic Sensor Web
Weather Alert ApplicationDetection of events, such as blizzards, from weather station observations
EvaluationData Used: Nevada Blizzard (April 1st – April 6th)Show me which places had blizzard in past 24 hrs70% Data clear30% Feature Observed
Evaluation (cont.)Abstraction can:Make sense out of raw dataGreatly reduce the size of dataData Used: Nevada Blizzard (April 1st – April 6th)Amount of Raw Sensor DataAmount of Abstraction DataSemantic Scalability
Traffic ApplicationReasons can be found via other types data: tweets, news papers, etc.Sensor data: 10 passing cars per minuteWhat might be the reason?
Integration and Abstraction of Traffic DataStratified explanationKnowledge Model Empowered AbstractionKnowledge ModelsDifferent types of Data
Take Home MessageAmount of citizen sensing (and machine sensing) data is huge,  varied, and growing rapidly.  Search and Sift won’t work.
Take Home Message (Cont.)Semantics play a key role in refering "meaning" behind the data. Requires progress from keywords -> entities -> relationships -> events, from raw data to human-centric abstractions.
Take Home Message (Cont.)Wide variety of semantic models and KBs(vocabularies, social dictionaries, community created semi-structured knowledge, domain-specific datasets, ontologies)empower semantic solutions. This can lead to Semantic Scalability – scalability that is meaningful to human activities and decision making.
Interested in more?Kno.e.sis Wiki for the following and more:Computing for Human ExperienceContinuous Semantics to Analyze Real-Time DataSemantic Modeling for Cloud ComputingCitizen Sensing, Social Signals, and Enriching Human ExperienceSemantics-Empowered Social ComputingSemantic  Sensor  Web  Traveling the Semantic Web through Space, Theme and Time Relationship Web: Blazing Semantic Trails between Web Resources SA-REST: Semantically Interoperable and Easier-to-Use Services and MashupsSemantically Annotating a Web ServiceTutorial: Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications (WWW2011)Partial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and DAGSI (Semantic Sensor Web), Microsoft Research (Semantic Search) and IBM Research (Analysis of Social Media Content),and HP Researh(Knowledge Extraction from Community-Generated Content).

Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Perceptions

  • 2.
    Citizen SensingOpportunities andChallenges in Mining Social Signals and PerceptionsAmit P. Shethamit@knoesis.orgLexisNexis Ohio Eminent ScholarOhio Center of Excellence in Knowledge enabled Computing (Kno.e.sis)Wright State University, Dayton, OHhttp://knoesis.orgThanks: Kno.e.sis team, esp. Wenbo Wang, Chen Lu, Cory, Hemant, Pavan
  • 3.
    Ohio Center ofExcellence in Knowledge-enabled ComputingSemantics as core enabler, enhancer @ Kno.e.sisone of the two largest academic groups in Semantic Web; multidisciplinary
  • 4.
    Semantics & SemanticWeb in 1999-2002
  • 5.
    ATTRIBUTE & KEYWORDQUERYINGSEMANTICBROWSINGuniform view of worldwide distributed assets of similar typeTargeted e-shopping/e-commerceassets accessTaalee Semantic/Faceted Search & Browsing (1999-2001)BLENDED BROWSING & QUERYINGTaalee Semantic Search ….
  • 6.
    Links to newson companies that compete against Commerce OneLinks to news on companies Commerce One competes against(To view news on Ariba, click on the link for Ariba)Crucial news on Commerce One’s competitors (Ariba) can be accessed easily and automaticallySearch for company ‘Commerce One’Semantic Search/Browsing/Directory (2001-….)
  • 7.
    Semantic Search/Browsing/Directory (2001-….)Systemrecognizes ENTITY & CATEGORYRelevant portionof the Directory is automatically presented.
  • 8.
    Semantic Search/Browsing/Directory (2001-….)Userscan exploreSemantically relatedInformation.
  • 9.
    Extracting Semantic Metadatafrom Semistructured and Structured Sources (1999 – 2002)Semagix Freedom for building ontology-driven information systemManaging Semantic Content on the Web
  • 10.
  • 11.
    SearchIntegrationAnalysisDiscoveryQuestion AnsweringSituational AwarenessDomain ModelsPatterns / Inference / ReasoningRDBRelationship WebMetadata ExtractionMeta data / Semantic AnnotationsText(formal/Informal)Structured and Semi-structured dataSensor DataMultimedia Content and Web data
  • 12.
    Let Us Startwith Social Data
  • 13.
  • 14.
    http://bit.ly/nP1E4qImage:http://bit.ly/fl4gEJMar. 2011 http://cnet.co/jdQgMEJapanEarthquake and Tsunamihttp://bit.ly/gWboibRecently funded NSF proposal: Social Media Enhanced Organizational Sensemakingin Emergency Response
  • 15.
  • 16.
    Citizen SensingWho?An interconnectednetwork of peopleWhat?Observe, report, collect, analyze, and disseminate informationHow?Via text, audio, video and built in device sensor (and smart devices)Image: http://bit.ly/nvm2iP "Citizen Sensing, Social Signals, and Enriching Human Experience”, Internet Computing, July-Aug. 2009
  • 17.
    Enablers: Mobile Devices& Ubiquitous ConnectivityMobile Platforms Hit Critical Mass, Over 5 billions users 1+B with internet connected mobile devices (2010)Smartphones > PCs + Notebooks > Notebooks + Netbooks (2010E)500K+ mobile phone applications74% of mobile phone users (2.4B)worldwide used SMS (2007)Mobile is Global; Ubiquitous; 24x7Built in sensorsenvironmental, biometric/biomedical,...Image: http://bit.ly/mYqcPF
  • 18.
    Enablers: Web 2.0& Social MediaA huge number of users750M+ active Facebook Users1+B tweets/wk; 175M+ Twitter usersInternet Users:2BlnImage: http://bit.ly/euLETTImage: http://bit.ly/euLETTImage: http://bit.ly/euLETTImage: http://bit.ly/euLETTImage: http://bit.ly/euLETT
  • 19.
    Role of Semanticsin Citizen SensingKey of citizen sensing: extract metadata/annotate different types of metadata (depend on application need)Spatial, temporal, thematic: key phrase, named entity, relationship, topic/category, event descriptors, sentiment…People, network, contentSemantics: provide the meaning of datavarious forms of semantic models: core vocabularies/nomenclatures,community created dictionaries/folksonomies/reference databases, automatically extracted domain models, manually created taxonomies, formal ontologiesdeal with complexities of user generated data; supplement well-known statistical and natural language processing (NLP) techniques
  • 20.
  • 21.
    Twitris - MotivationImage:http://bit.ly/etFezlWhat were people in U.S.A. saying about Bin Laden’s death?How about people in Egypt?How about people in India?TOO MANY tweets to be read each day!!!TwitrisNow: WHEN, WHERE, People are talking WHATFuture: socio, cultural, behavioral studies‘Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data – Challenges and Experiences’, 2009
  • 22.
    Twitris: Semantic Social WebMash-upSelect topicSelect dateN-gram summariesTopic treeSpatial MarkerTweet trafficSentiment AnalysisImages & VideosReference newsRelated tweetsWikipedia articlesMore: TWITRIS
  • 23.
    Analyzing Events fromTemporal PerspectiveHow did tweets in United States on the death of Bin Laden evolve over time?May 2ndMay 4th“RT @TWlTTERWHALE: Please do not click on any links saying Osama Bin Laden EXECUTION Video! This is a virus that hacks accounts. ”RT @ReallyVirtual: Here's a picture of OBL'shideout in Abbottabad, as shared by a friend @Rahat http://yfrog.com/h7w4izmjImg:http://www.twitpic.com/4t1mt0
  • 24.
    Analyzing Events fromSpatial PerspectiveTweets (Death of Bin Laden) in Egypt VS tweets in IndiaMay 2ndEgyptMay 2ndIndia“RT @mvatlarge: U.S. has given Pakistani military nearly $20 billion since 9/11 for the privilege of housing bin Laden: http://is.gd/xegnFm”“#Egypt foreign minister: Egygov't has no official comment but we condemn all forms of violence in international relations. #osama #obl”
  • 25.
    A sample ofcurrent research @ Kno.e.sis demonstrating role of semantics in Citizen Sensing & Social Media Analysis
  • 26.
  • 27.
    User-community EngagementHow dowe understand the phenomenon of user participation (engagement) in topic discussions?How communities form during the product launch?What factors can attract users to engage in these communities, therefore further spreading the message?How quickly we can disseminate information between resource providers and people in need of resources in case of emergency?Image: http://itcilo.wordpress.com
  • 28.
    Analysis Framework:People-Content-network Analysis(PCNA)28Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter. SoME @ WWW2011.
  • 29.
    Three Sets ofFeaturesWhich set is more important?Content features[Characteristics of tweets posted by active friends of U]:keywords: number of event-relevant keywordshashtags: number of event-relevant hashtagsretweet: number of retweetsmention: number of mentionsurl: number of relevancy-adjust hyperlinksIrrelevant hyperlink is given number -1subjectivity: Subjectivity scores for words and emoticonsLinguistic Cues (LIWC1 analysis): Features for the language usage. Top-3 transformed features using Principle Component Analysis (PCA) extractedCommunity features: [Characteristics of the active community/network under consideration]wccSize: size of the weakly-connected component (WCC) which U’s friends belongs to in the active network.wccPercent: ratio of wccSizeto the size of the active network.connectivity: number of active friends (i.e. followees) in the community.communitySize: size of the active community.Author features [Characteristics of friends that U is following]:Only friends in the active community are considered.logFollower: logarithm of follower countlogFollowee: logarithm of followee countKlout[1]: a integrated measure of user influence and popularityOther profile information and activity history[2].
  • 30.
    Experiments: ResultsContent isthe key to understand user community engagementHighLowPerformanceNetworkContentAllPeopleEvent-TypeSummary of Prediction Accuracy (%)Statistical significant results are in bold30
  • 31.
    Background Knowledge toimprove Social Data AnalysisEvent oriented CommunityExternal Knowledge basesDynamic Domain Model for the eventSEMANTIC ASSOCIATION TO UNDERSTAND ENGAGEMENT LEVEL &IMPROVE IESocial NetworkUser ProfilesMined User Interests and User Types
  • 32.
    Analysing the Contentcan be Hard…Is Merry Christmas a song? If it is, which ‘Merry Christmas’since there are 60 songs of the same name.‘So Good’ is also a song!Using a domain model (E.g., MusicBrainz)Using context cues from the contente.g. new Merry Christmas tuneReduce potential entity spot size (with restrictions) e.g. new albums/songs ‘Multimodal Social Intelligence in a Real-Time Dashboard System’ VLDB Journal 2010
  • 33.
    Real Time SocialMedia Data Analysis
  • 34.
    MotivationPeople can’t waitfor InformationDisaster ManagementUshahidi (www.ushahidi.org)Real-Time MarketsRealTimeMarkets (http://www.realtimemarkets.com/) Brand TrackingTwarql (http://wiki.knoesis.org/index.php/Twarql)Movie reviewsFlicktweets (www.flicktweets.com)Journalism
  • 35.
    ScenariosBrand TrackingGive mea stream of locations whereKinectis being mentioned right nowGive me all people that have said negative things aboutKinectHow can we do this?
  • 36.
    Twarql (Twitter Feedsthrough SPARQL)Semantically annotate tweets with entities, hashtags, URLs, sentiments, etc.Encode content in a structured format (RDF) using shared vocabularies (FOAT, SIOC, MOAT, etc.)Structured querying of tweetsSubscribe to a stream of tweets that match a given queryReal-time delivery of streaming data.More: TWARQL
  • 37.
  • 38.
    Back to theScenarioGive me a stream of locations where Kinect is being mentioned nowGive me all people that have said negative things about Kinect
  • 39.
    Dynamic Domain Modelsfor Semantic Analysis of Real-Time DataakaContinuous Semantics
  • 40.
    MotivationSemantic processing usinga model of the domainBut it is difficult to model dynamic domains on social webspontaneous (arising suddenly)real-time data requiring continuous searching and analysisdistributed participants with fragmented and opinionated informationdiverse viewpoints involving topical or contentious subjectsfeature context colored by local knowledge as well as perceptions based on different observations and their socio-cultural background.More: Continuous Semantics
  • 41.
  • 42.
    Dynamic Evolving Modelsto underpin Semantics“Reports from Azadi Square - 4 people killed by police, people killed police who shot. More shots being fired #iranelections”“Both Ahmadinejad& Mousavideclare victory in Iranian Elections.”“situation in tehran University is so worrisome. police have attacked to girls dormitory #tehran #iranelection”EventsJune 13 2009June 15 2009June 12 2009Ahmadinejad & Mousavi are politicians in IranTehran University is a University in IranAzadi Square is a city square in TehranKey phrasesModels
  • 43.
  • 44.
    ChallengesDomain/Topic-dependency: spotting thetarget of the sentiment is as important as finding sentiment itselfE.g., “longriver” (no sentiment), “long battery life” (positive), or “long time for downloading” (negative).Context-awareness : encoding the context information into the extracted sentimentE.g., “mustwatch a movie today” (no sentiment) and “this movie is a must see” (positive). Informal language (abbreviations, misspelling, slang...): using Urban Dictionary
  • 45.
    The Usage ofBackground Knowledge
  • 46.
  • 47.
    Real-Time Sensor, Social,Multi-media data2010’sDynamic User Generated Content2000’s1990’sStatic Document and files Web DATA evolved over time
  • 48.
    Semantic AbstractionDoes -1,-2,-4,-4make any sense to you?Overwhelming amount of raw sensor data doesn’t make much sense to decision makersFreezing temperatureRainWhat if the data is from sensors on a highway?Freezing temperature + rain => icy roadClose the highway? OR Spread salt on the road to melt ice?So what?Semantic Sensor Web Demo
  • 49.
    A cross-country flightfrom New York to Los Angeles on a Boeing 737 plane generates a massive 240 terabytes of data- GigaOmni MediaBut a pilot or a ground engineer at the destination is interested in very smallnumber of events and associated observational data that are relevant to their work.Higginbotham, S. (2010, September). Sensor Networks Top Social Networks for Big Data.Gigaom.com. http://gigaom.com/cloud/sensor-networks-top-social-networks-for-big-data-2/.49
  • 50.
    Background KnowledgeBlizzardRainStormABSTRACTIONHuge amount of Raw Sensor DataFeatures representing Real-World eventsAbstractionMore: Semantic Sensor Web
  • 51.
    Weather Alert ApplicationDetectionof events, such as blizzards, from weather station observations
  • 52.
    EvaluationData Used: NevadaBlizzard (April 1st – April 6th)Show me which places had blizzard in past 24 hrs70% Data clear30% Feature Observed
  • 53.
    Evaluation (cont.)Abstraction can:Makesense out of raw dataGreatly reduce the size of dataData Used: Nevada Blizzard (April 1st – April 6th)Amount of Raw Sensor DataAmount of Abstraction DataSemantic Scalability
  • 54.
    Traffic ApplicationReasons canbe found via other types data: tweets, news papers, etc.Sensor data: 10 passing cars per minuteWhat might be the reason?
  • 55.
    Integration and Abstractionof Traffic DataStratified explanationKnowledge Model Empowered AbstractionKnowledge ModelsDifferent types of Data
  • 56.
    Take Home MessageAmountof citizen sensing (and machine sensing) data is huge, varied, and growing rapidly. Search and Sift won’t work.
  • 57.
    Take Home Message(Cont.)Semantics play a key role in refering "meaning" behind the data. Requires progress from keywords -> entities -> relationships -> events, from raw data to human-centric abstractions.
  • 58.
    Take Home Message(Cont.)Wide variety of semantic models and KBs(vocabularies, social dictionaries, community created semi-structured knowledge, domain-specific datasets, ontologies)empower semantic solutions. This can lead to Semantic Scalability – scalability that is meaningful to human activities and decision making.
  • 59.
    Interested in more?Kno.e.sisWiki for the following and more:Computing for Human ExperienceContinuous Semantics to Analyze Real-Time DataSemantic Modeling for Cloud ComputingCitizen Sensing, Social Signals, and Enriching Human ExperienceSemantics-Empowered Social ComputingSemantic Sensor Web Traveling the Semantic Web through Space, Theme and Time Relationship Web: Blazing Semantic Trails between Web Resources SA-REST: Semantically Interoperable and Easier-to-Use Services and MashupsSemantically Annotating a Web ServiceTutorial: Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications (WWW2011)Partial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and DAGSI (Semantic Sensor Web), Microsoft Research (Semantic Search) and IBM Research (Analysis of Social Media Content),and HP Researh(Knowledge Extraction from Community-Generated Content).

Editor's Notes

  • #4 Kno.e.sis has 15 faculty in Computer Science, life sciences and health care, cognitive science and business. It has about 50 PhD students and post docs– about 2/3 of these in Computer Science. Its faculty members have 40 labs, and occupies a majority of 50K sqft Joshi Research Center. Its students are highly successful– eg tenure track faculty @ Case Western Reserve Univ or Researcher at IBM Almaden. It has received recent funding from funding from Microsoft Research. IBM Research, HP Labs, Google, and small companies (Janya, EZdi,…) and collaborates with many more (Yahoo! Labs, NLM, …).
  • #5 Taalee (subsequently Voquette and Semagix) was founded in 1999 as an Audio/Video Web Search Company (focus on A/V mainly for scalability and market focus reasons, servicename: MediaAnywhere). Domain models/ontologies were created in major areas (many more than what you can find on Bing in 2011) and automatically populated to build knowledge bases (populated ontologies or WorldModel) from a variety of structured and semistructured sources, and periodically kept up to date. This was than used for semantic annotation/metadata extraction to drive semantic search, browsing, etc applications over data crawled from Web sites.
  • #6 Mediaanywhere’s search snapshot.
  • #9 The important thing is that the system knew that Robert Duval is a movie actor, is a different person that David Duval who is a golfer and a sportsperson, and had understanding of a variety of relationships Robert Duval participates in – such as
  • #12 Many types of data: Web 2.0/microblogging/social networking with informal text, Blogs/News/Documents with structured text, structured and semistructured data, multimedia data, sensor data.Many types of domain models: formal ontologies, folksonomies, nomenclatures or core vocabularies, community created/managed dictionaries, …Extraction of semantic metadata (labels referenced with domain models) make it possible to build many applications.
  • #32 There is an event oriented community being formed during an eventDynamic domain model would give enhanced understanding of what is going on in the event, and hence, the communityCommunity has people, sub-community of users in the community will form an implicit network for ‘resource sharing’ and ‘resource provider’ How do we get to know such people? Semantic analysis of user profile description, with help of external knowledge bases to tell us, ‘what is a user interest I’ will tell us, ‘who is what’ – Blogger, news journalist, trustee, Red Cross Member etc. and also, ‘who’ is interested in ‘what’This information of mined user interests and types, from users profiles, can be leveraged to perform semantic association with dynamic domain model of the event
  • #38 Architecture of twarqlStream tweetsCovert it into RDFUse SPARQL Query to filter it
  • #41 Again SPARQL query is a subset of domain modelModifying this will help us to be updated with the informationTherefore we use Doozer
  • #46 We use background knowledge to help identifying the entity mentioned in the text, e.g., the knowledge from IMDB and Freebase is used to determine whether a noun phrase in the text is the name of a movie or a person. The lexical resources such as Urban Dictionary are used to help identifying the sentiment clues in the text. Urban Dictionary is a popular online slang dictionary with word definitions written by users. Each word is associated with a list of related words to interpret it, and many glossary definitions given by different users. Both the related words list and glossary definitions can be used to help determining the sentiment of the spotted word. E.g., the word “wicked” has a list of related words, and most of those words carry positive sentiment, so that we can infer that “wicked” is highly possible a positive sentiment clue. In addition, there is also a definition of “wicked” given by user saying that it has different meanings in different countries. Given this knowledge, if we know the location of the author who wrote the tweet, we can infer whether “wicked” in the tweet  is used as a sentiment clue, and whether it is positive, negative or neutral.
  • #50 Consider the number of observation records generated by a commercial flight. A single flight from New York to Los Angeles on a twin-engine Boeing 737 generates around 240 terabytes of data; and, on any given day, there are around 28,537 such commercial flights in the United States (Higginbotham, 2010). So, for only a single day, the amount of sensor data generated can reach the petabyte scale. But how much of this sensor data is actually useful for decision-making? The pilot may need to understand the general condition of the plane and the external environment during flight; the ground crew may need to know the speed, location and trajectory of the aircraft; and the mechanic may need to know of any anomalous behavior detected during the flight. Much of this knowledge must be derived from the low-level observational data, but only the high-level inferences may be required for actual decision-making.
  • #54 The architecture illustrates an abstraction (or explanation) of traffic data through a stratified interpretation of machine sensor observations, information on the Web, and real-time social data. This process involves:+ an integration of various domain (i.e., traffic) and domain-independent (i.e., time, space, sensor) ontologies to annotate and interpret the data+ observations from traffic speed sensors on the road to detect traffic jams (slow moving traffic),+ determine the cause of the traffic jam through a stratified explanation process: - first check if normal/everyday events could explain (i.e., rush-hour) - second check if planned events could explain (i.e., planned events such as road construction, music concerts, and sporting events) by accessing background knowledge on the Web (such as US Department of Transportation, BuckeyeTraffic, or Eventful) -third check if unplanned events could explain (i.e., traffic accident / wreck) by accessing social media (such as twitter)
  • #56 The architecture illustrates an abstraction (or explanation) of traffic data through a stratified interpretation of machine sensor observations, information on the Web, and real-time social data. This process involves:+ an integration of various domain (i.e., traffic) and domain-independent (i.e., time, space, sensor) ontologies to annotate and interpret the data+ observations from traffic speed sensors on the road to detect traffic jams (slow moving traffic),+ determine the cause of the traffic jam through a stratified explanation process: - first check if normal/everyday events could explain (i.e., rush-hour) - second check if planned events could explain (i.e., planned events such as road construction, music concerts, and sporting events) by accessing background knowledge on the Web (such as US Department of Transportation, BuckeyeTraffic, or Eventful) - third check if unplanned events could explain (i.e., traffic accident / wreck) by accessing social media (such as twitter)