mumbai, india
november 26, 2008
another chapter in the war against civilization
 and
 the world saw itThrough the eyes of the people
 the world read itThrough the words of the people
PEOPLE told their stories to PEOPLE
A powerful new era in Information dissemination had taken firm ground
Making it possible for us tocreate a global network of citizensCitizen Sensors – Citizens observing, processing, transmitting, reporting
12
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awarenessAmit P. Sheth, LexisNexis Eminent ScholarKno.e.sis  CenterWright State University
Geocoder(Reverse Geo-coding)Address to location database18 Hormusji Street, ColabaVasantViharImage Metadatalatitude: 18° 54′ 59.46″ N, longitude: 72° 49′ 39.65″ EStructured Meta ExtractionNariman HouseIncome Tax OfficeIdentify and extract information from tweetsSpatio-Temporal Analysis
Research Challenge #1Spatio Temporal and Thematic analysisWhat else happened “near” this event location?What events occurred “before” and “after” this event?Any message about “causes” for this event?
Spatial Analysis….Which tweets originated from an address near 18.916517°N 72.827682°E?
Which tweets originated during Nov 27th 2008,from 11PM to 12 PM
Giving usTweets originated from an address near 18.916517°N, 72.827682°E during time interval27th Nov 2008 between 11PM to 12PM?
Research Challenge #2:Understanding and Analyzing Casual TextCasual textMicroblogs are often written in SMS style languageSlangs, abbreviations
Understanding Casual TextNot the same as news articles or scientific literatureGrammatical errorsImplications on NL parser resultsInconsistent writing styleImplications on learning algorithms that generalize from corpus
Nature of MicroblogsAdditional constraint of limited contextMax. of x chars in a microblogContext often provided by the discourseEntity identification and disambiguationPre-requisite to other sophisticated information analytics
NL understanding is hard to begin with..Not so hard“commando raid appears to be nigh at Oberoinow”Oberoi = Oberoi Hotel, Nigh = highChallengingnew wing, live fire @ taj 2nd floor on iDesi TV streamFire on the second floor of the Taj hotel, not on iDesi TV
Social Context surrounding contentSocial context in which a message appears is also an added valuable resourcePost 1: “Hareemane Househostages said by eyewitnesses to be Jews. 7 Gunshots heard by reporters at Taj”Follow up postthat is Nariman House, not (Hareemane)
Research OpportunitiesNER, disambiguation in casual, informal text is a budding area of researchAnother important area of focus: Combining information of varied quality from a corpus (statistical NLP), domain knowledge (tags, folksonomies, taxonomies, ontologies), social context (explicit and implicit communities)
What Drives the Spatio-Temporal-Thematic Analysis and Casual Text UnderstandingSemantics with the help ofDomain ModelsDomain ModelsDomain Models(ontologies, folksonomies)
And who creates these models?YOU, ME,We DO!
Domain Knowledge: A key driverPlaces that are nearby ‘Nariman house’Spatial queryMessages originated around this placeTemporal analysisMessages about related events / placesThematic analysis
Research Challenge #3But Where does the Domain Knowledge come from?Community driven knowledge extraction How to create models that are “socially scalable”?How to organically grow and maintain this model?
The Wisdom of the CrowdsThe most comprehensive and up to date account of the present state of knowledge is given byEverybodyThe Web in general
Blogs
WikipediaWikipedia		=    Concise concept descriptions		+    An article title denotes a concept       +   Community takes care of     			  					 disambiguationCollecting Knowledge
Wikipedia		=    Concise concept descriptions		+    An article title denotes a concept       +   Community takes care of     			  					 disambiguation		+    Large, highly connected, sparsely 				annotated graph structure that                  connects   named entities		+    Category hierarchyCollecting Knowledge
Goal: Harness the Wisdom of the Crowds to Automatically define a domain with up-to-date conceptsWe can safely take advantage of existing (semi)structured knowledge sources
Collecting Instances
Creating a Hierarchy
Creating a Hierarchy
Hierarchy Creation - summary
Snapshot of final Topic Hierarchy
Great to know Explosion and Fire are related!But, knowing Explosion “causes” fire is powerfulRelationships at the heart of semantics!
Identifying relationships: Hard, harder than many hard things But NOT that Hard, When WE do it
Games with a purposeGet humans to give their solitaire time Solve real hard computational problemsImage tagging, Identifying part of an image Tag a tune, Squigl, Verbosity, and MatchinPioneered by Luis Von Ahn
OntoLablrRelationship Identification Gameleads to
causesExplosionTraffic congestion
And the infrastructureSemantic Sensor WebHow can we annotate and correlate the knowledge from machine sensors around the event location?
Research Challenge #4: Semantic Sensor Web
Semantically Annotated O&M<swe:component name="time">	<swe:Time definition="urn:ogc:def:phenomenon:time" uom="urn:ogc:def:unit:date-time">		<sa:swe rdfa:about="?time" rdfa:instanceof="time:Instant">			<sa:sml rdfa:property="xs:date-time"/>		</sa:swe>	</swe:Time></swe:component><swe:component name="measured_air_temperature">	<swe:Quantity definition="urn:ogc:def:phenomenon:temperature“ 			           		uom="urn:ogc:def:unit:fahrenheit">		<sa:swe rdfa:about="?measured_air_temperature“              			rdfa:instanceof=“senso:TemperatureObservation">			<sa:swe rdfa:property="weather:fahrenheit"/>			<sa:swe rdfa:rel="senso:occurred_when" resource="?time"/>			<sa:swe rdfa:rel="senso:observed_by" resource="senso:buckeye_sensor"/>		</sa:sml>					</swe:Quantity></swe:component><swe:value name=“weather-data">	2008-03-08T05:00:00,29.1</swe:value>
Semantic Sensor ML – Adding Ontological MetadataDomainOntologyPersonCompanySpatialOntologyCoordinatesCoordinate SystemTemporalOntologyTime UnitsTimezone45Mike Botts, "SensorML and Sensor Web Enablement," Earth System Science Center, UAB Huntsville
46Semantic QuerySemantic Temporal QueryModel-references from SML to OWL-Time ontology concepts provides the ability to perform semantic temporal queriesSupported semantic query operators include:contains:  user-specified interval falls wholly within a sensor reading interval (also called inside)within:  sensor reading interval falls wholly within the user-specified interval (inverse of contains or inside)overlaps:  user-specified interval overlaps the sensor reading intervalExample SPARQL query defining the temporal operator ‘within’
Kno.e.sis’ Semantic Sensor Web47
Synthetic but realistic scenarioan image taken from a raw satellite feed48
an image taken by a camera phone with an associated label, “explosion.”  Synthetic but realistic scenario49
Textual messages (such as tweets) using STT analysisSynthetic but realistic scenario50

Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness

  • 1.
  • 2.
  • 3.
    another chapter inthe war against civilization
  • 4.
  • 7.
    the worldsaw itThrough the eyes of the people
  • 8.
    the worldread itThrough the words of the people
  • 9.
    PEOPLE told theirstories to PEOPLE
  • 10.
    A powerful newera in Information dissemination had taken firm ground
  • 11.
    Making it possiblefor us tocreate a global network of citizensCitizen Sensors – Citizens observing, processing, transmitting, reporting
  • 12.
  • 13.
    Semantic Integration ofCitizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awarenessAmit P. Sheth, LexisNexis Eminent ScholarKno.e.sis CenterWright State University
  • 14.
    Geocoder(Reverse Geo-coding)Address tolocation database18 Hormusji Street, ColabaVasantViharImage Metadatalatitude: 18° 54′ 59.46″ N, longitude: 72° 49′ 39.65″ EStructured Meta ExtractionNariman HouseIncome Tax OfficeIdentify and extract information from tweetsSpatio-Temporal Analysis
  • 15.
    Research Challenge #1SpatioTemporal and Thematic analysisWhat else happened “near” this event location?What events occurred “before” and “after” this event?Any message about “causes” for this event?
  • 16.
    Spatial Analysis….Which tweetsoriginated from an address near 18.916517°N 72.827682°E?
  • 17.
    Which tweets originatedduring Nov 27th 2008,from 11PM to 12 PM
  • 18.
    Giving usTweets originatedfrom an address near 18.916517°N, 72.827682°E during time interval27th Nov 2008 between 11PM to 12PM?
  • 19.
    Research Challenge #2:Understandingand Analyzing Casual TextCasual textMicroblogs are often written in SMS style languageSlangs, abbreviations
  • 20.
    Understanding Casual TextNotthe same as news articles or scientific literatureGrammatical errorsImplications on NL parser resultsInconsistent writing styleImplications on learning algorithms that generalize from corpus
  • 21.
    Nature of MicroblogsAdditionalconstraint of limited contextMax. of x chars in a microblogContext often provided by the discourseEntity identification and disambiguationPre-requisite to other sophisticated information analytics
  • 22.
    NL understanding ishard to begin with..Not so hard“commando raid appears to be nigh at Oberoinow”Oberoi = Oberoi Hotel, Nigh = highChallengingnew wing, live fire @ taj 2nd floor on iDesi TV streamFire on the second floor of the Taj hotel, not on iDesi TV
  • 23.
    Social Context surroundingcontentSocial context in which a message appears is also an added valuable resourcePost 1: “Hareemane Househostages said by eyewitnesses to be Jews. 7 Gunshots heard by reporters at Taj”Follow up postthat is Nariman House, not (Hareemane)
  • 24.
    Research OpportunitiesNER, disambiguationin casual, informal text is a budding area of researchAnother important area of focus: Combining information of varied quality from a corpus (statistical NLP), domain knowledge (tags, folksonomies, taxonomies, ontologies), social context (explicit and implicit communities)
  • 25.
    What Drives theSpatio-Temporal-Thematic Analysis and Casual Text UnderstandingSemantics with the help ofDomain ModelsDomain ModelsDomain Models(ontologies, folksonomies)
  • 26.
    And who createsthese models?YOU, ME,We DO!
  • 27.
    Domain Knowledge: Akey driverPlaces that are nearby ‘Nariman house’Spatial queryMessages originated around this placeTemporal analysisMessages about related events / placesThematic analysis
  • 28.
    Research Challenge #3ButWhere does the Domain Knowledge come from?Community driven knowledge extraction How to create models that are “socially scalable”?How to organically grow and maintain this model?
  • 29.
    The Wisdom ofthe CrowdsThe most comprehensive and up to date account of the present state of knowledge is given byEverybodyThe Web in general
  • 30.
  • 31.
    WikipediaWikipedia = Concise concept descriptions + An article title denotes a concept + Community takes care of disambiguationCollecting Knowledge
  • 32.
    Wikipedia = Concise concept descriptions + An article title denotes a concept + Community takes care of disambiguation + Large, highly connected, sparsely annotated graph structure that connects named entities + Category hierarchyCollecting Knowledge
  • 33.
    Goal: Harness theWisdom of the Crowds to Automatically define a domain with up-to-date conceptsWe can safely take advantage of existing (semi)structured knowledge sources
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
    Snapshot of finalTopic Hierarchy
  • 39.
    Great to knowExplosion and Fire are related!But, knowing Explosion “causes” fire is powerfulRelationships at the heart of semantics!
  • 40.
    Identifying relationships: Hard,harder than many hard things But NOT that Hard, When WE do it
  • 41.
    Games with apurposeGet humans to give their solitaire time Solve real hard computational problemsImage tagging, Identifying part of an image Tag a tune, Squigl, Verbosity, and MatchinPioneered by Luis Von Ahn
  • 42.
  • 43.
  • 44.
    And the infrastructureSemanticSensor WebHow can we annotate and correlate the knowledge from machine sensors around the event location?
  • 45.
    Research Challenge #4:Semantic Sensor Web
  • 46.
    Semantically Annotated O&M<swe:componentname="time"> <swe:Time definition="urn:ogc:def:phenomenon:time" uom="urn:ogc:def:unit:date-time"> <sa:swe rdfa:about="?time" rdfa:instanceof="time:Instant"> <sa:sml rdfa:property="xs:date-time"/> </sa:swe> </swe:Time></swe:component><swe:component name="measured_air_temperature"> <swe:Quantity definition="urn:ogc:def:phenomenon:temperature“ uom="urn:ogc:def:unit:fahrenheit"> <sa:swe rdfa:about="?measured_air_temperature“ rdfa:instanceof=“senso:TemperatureObservation"> <sa:swe rdfa:property="weather:fahrenheit"/> <sa:swe rdfa:rel="senso:occurred_when" resource="?time"/> <sa:swe rdfa:rel="senso:observed_by" resource="senso:buckeye_sensor"/> </sa:sml> </swe:Quantity></swe:component><swe:value name=“weather-data"> 2008-03-08T05:00:00,29.1</swe:value>
  • 47.
    Semantic Sensor ML– Adding Ontological MetadataDomainOntologyPersonCompanySpatialOntologyCoordinatesCoordinate SystemTemporalOntologyTime UnitsTimezone45Mike Botts, "SensorML and Sensor Web Enablement," Earth System Science Center, UAB Huntsville
  • 48.
    46Semantic QuerySemantic TemporalQueryModel-references from SML to OWL-Time ontology concepts provides the ability to perform semantic temporal queriesSupported semantic query operators include:contains: user-specified interval falls wholly within a sensor reading interval (also called inside)within: sensor reading interval falls wholly within the user-specified interval (inverse of contains or inside)overlaps: user-specified interval overlaps the sensor reading intervalExample SPARQL query defining the temporal operator ‘within’
  • 49.
  • 50.
    Synthetic but realisticscenarioan image taken from a raw satellite feed48
  • 51.
    an image takenby a camera phone with an associated label, “explosion.” Synthetic but realistic scenario49
  • 52.
    Textual messages (suchas tweets) using STT analysisSynthetic but realistic scenario50
  • 53.
    Correlating to getSyntheticbut realistic scenario
  • 54.
    Create better views(smart mashups)
  • 55.
    A few morethingsUse of background knowledgeEvent extraction from texttime and location extraction Such information may not be presentSomeone from Washington DC can tweet about MumbaiScalable semantic analyticsSubgraph and pattern discoveryMeaningful subgraphs like relevant and interesting pathsRanking paths
  • 56.
    The Sum ofthe PartsSpatio-Temporal analysisFind out where and when+ Thematic What and how+ Semantic Extraction from text, multimedia and sensor data - tags, time, location, concepts, events+ Semantic models & background knowledgeMaking better sense of STTIntegration + Semantic Sensor WebThe platform = Situational Awareness
  • 57.
    SearchIntegrationAnalysisDiscoveryQuestion AnsweringSituational AwarenessDomain ModelsPatterns / Inference / ReasoningRDBRelationship WebMeta data / Semantic AnnotationsMetadata ExtractionMultimedia Content and Web dataTextSensor DataStructured and Semi-structured data
  • 58.
    Interested in morebackground?Semantics-Empowered Social ComputingSemantic Sensor Web Traveling the Semantic Web through Space, Theme and Time Relationship Web: Blazing Semantic Trails between Web Resources Contact/more details: amit @ knoesis.orgSpecial thanks: Karthik Gomadam, MeenaNagarajan, Christopher ThomasPartial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and DAGSI (Semantic Sensor Web), Microsoft Research and IBM Research (Analysis of Social Media Content),and HP Research (Knowledge Extraction from Community-Generated Content).

Editor's Notes

  • #22 Microblogs are one of the most powerful ways of talking of CSD
  • #24 Implicit social context created by people responding to other messages. In this example we are showing how the system can identify that its is Nariman and not Hareemane
  • #28 In the scenario, what techniques and technlologies are being brought together? Semantic + Social Computing + Mobile Web
  • #42 Users are shown two images along with labels. Labels gotten from GI or similar data source. Users add relationships. When 2 users agree, the labels are tagged with this relationship. Multiple relationships, using ML techniques, the system will learn .