Approximate Semantic Matching of Heterogeneous Events
Upcoming SlideShare
Loading in...5
×
 

Approximate Semantic Matching of Heterogeneous Events

on

  • 2,873 views

Event-based systems have loose coupling within space, time and ...

Event-based systems have loose coupling within space, time and
synchronization, providing a scalable infrastructure for
information exchange and distributed workflows. However,
event-based systems are tightly coupled, via event subscriptions
and patterns, to the semantics of the underlying event schema and
values. The high degree of semantic heterogeneity of events in
large and open deployments such as smart cities and the sensor
web makes it difficult to develop and maintain event-based
systems. In order to address semantic coupling within event-based
systems, we propose vocabulary free subscriptions together with
the use of approximate semantic matching of events. This paper
examines the requirement of event semantic decoupling and
discusses approximate semantic event matching and the
consequences it implies for event processing systems. We
introduce a semantic event matcher and evaluate the suitability of
an approximate hybrid matcher based on both thesauri-based and
distributional semantics-based similarity and relatedness
measures. The matcher is evaluated over a structured
representation of Wikipedia and Freebase events. Initial
evaluations show that the approach matches events with a
maximal combined precision-recall F1 score of 75.89% on
average in all experiments with a subscription set of 7
subscriptions. The evaluation shows how a hybrid approach to
semantic event matching outperforms a single similarity measure
approach.

Statistics

Views

Total Views
2,873
Views on SlideShare
2,833
Embed Views
40

Actions

Likes
3
Downloads
11
Comments
0

3 Embeds 40

http://www.linkedin.com 34
https://twitter.com 5
http://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Approximate Semantic Matching of Heterogeneous Events Approximate Semantic Matching of Heterogeneous Events Presentation Transcript

    • Digital Enterprise Research Institute www.deri.ie Approximate Semantic Matching of Heterogeneous Events Souleiman Hasan, Sean O’Riain, Edward Curry Digital Enterprise Research Institute (DERI), National University of Ireland, Galway (NUIG) In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012), Berlin, Germany, 2012. Stefan.Decker@deri.org http://www.StefanDecker.org/ Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
    • OutlineDigital Enterprise Research Institute www.deri.ie  Introduction  Experiments  Smart Environments  Wikipedia  Motivational Scenario  Freebase  Related Work  Conclusions  Proposal  Q&A  Approximate Semantic Matching 2 of 34
    • Smart EnvironmentsDigital Enterprise Research Institute www.deri.ie  Smart Homes, Grids, Cities…  Internet-of-Things, Sensor Web… by 2020 50 billion devices connected to mobile networks (OECD, 2012)  Non-technical users  High heterogeneity  Trend for dynamic data-driven decision making Event/Situation of Interest Event/Situation of Interest Soccer match played in Berlin New free parking space near me ........ 3 of 34
    • Motivational Scenario- EnterpriseDigital Enterprise Research Institute www.deri.ie CIO CSO Situation of Interest Company CO2 emissions performance Energy usage by global IT department Helpdesk Various terms used: energy consumption, energy usage…. PUE of the Data Center in room, space, zone… Dublin Maintenance Personnel Dynamic Environments: New events from kWhs used by equipments joining and server 172.16.0.8 leaving Building Data Center 4 of 34
    • RequirementsDigital Enterprise Research Institute www.deri.ie  Handling of semantically heterogeneous events  Handling of dynamic environments with event types by sources joining and leaving  Low cost of rules management  Usability  Precision 5 of 34
    • Event ProcessingDigital Enterprise Research Institute www.deri.ie Situation of Interest When a floor is empty and its energy usage for an hour is above threshold w.r.t budget then it is an excessive usage User Translation Non-technical users with natural Developer language needs CEP Engine Separated from the engine UI Rules tied to vocabulary EVENT PROCESSING RULE EPL Interface Rules and Parser Repository Execution INSERT INTO ExcessiveEnergyUsageByFloor Pattern Matcher Repository High cost in case of SELECT a.floor as floor FROM PATTERN heterogeneity or change [(a=FloorEmptySensor -> every b=DeviceEnergyUsageSensor Single Event Templates (a.floor=b.floor))] Matcher Repository .WIN:TIME(1 hour) GROUP BY a.floor WHERE (b.usage) > GetAcceptableThreshold(a.budgetValue) ERP PC NO XDG26359 Floor: 1st usage: 3 kWh VM: vmdgsit01.deri.ie Floor: 1st BMS usage: 15 kWh 6 of 34
    • Exact Event Processing ParadigmDigital Enterprise Research Institute www.deri.ie Requirement Addressing by the paradigm Semantic Heterogeneity Does not scale out to high heterogeneous environments Dynamic Environment Does not scale out to high dynamic environments Rule Management High cost on large heterogeneity and dynamicity Usability Low Precision 100% (typically) 7 of 34
    • Decoupling in Event SystemsDigital Enterprise Research Institute www.deri.ie  Space Producers and consumers don’t know each other  Time Participants don’t need to be actively involved in the interaction th same time  Synchronization Event producers and consumers don’t get blocked to send/receive events Space Time Event Event Producer Consumer Synchronization 8 of 34
    • Decoupling in Event SystemsDigital Enterprise Research Institute www.deri.ie  Principle  “Removal of explicit dependencies between participants” (Eugster et al., 2003)  Outcome  Scalability Space Time Event Event Producer Consumer Synchronization 9 of 34
    • Semantic CouplingDigital Enterprise Research Institute www.deri.ie  Current event-based systems keep explicit semantic dependency between participants  Limited scalability in highly heterogeneous and dynamic environment Space Time Event Event Producer Synchronization Consumer Semantic (Event types, property, values) 10 of 34
    • Current ApproachesDigital Enterprise Research Institute www.deri.ie  Ontology-based  (Petrovic et al., 2003), (Zhang & Ye, 2008)…  Does not “remove explicit dependency”  Hard to achieve ontology agreement a priori at large-scale of heterogeneity and dynamicism  Medium usability, 100% precision typically  Fuzzy sets  (Liu & Jacobsen, 2002)  Address only event numerical values vs. string values subscriptions  Medium usability, High precision 11 of 34
    • Proposed ApproachDigital Enterprise Research Institute www.deri.ie  Approximate semantic matching of events Event Types & properties Type(s) possible mappings Properties Values Subscription Values possible Type(s) mappings Properties Values Pick best overall mapping Post-matching event processing 12 of 34
    • BackgroundDigital Enterprise Research Institute www.deri.ie  Semantic Similarity  f: Terms X Terms  [0,1]  term1, term2 are Terms  f(term1, term2)=0 absolute semantic mismatch  f(term1,term2)=1 exact match  E.g. Football Match and Soccer Match are similar  Relatedness: a general case of similarity  E.g. Football Match and Referee related but not similar  Thesaurus-based: e.g. WordNet-based  Distributional semantics-based: e.g. Wikipedia ESA  The more Wikipedia articles two terms occurs in, the more related they are 13 of 34
    • Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ie Football Match Types & properties possible mappings 2010 FIFA World Howard Webb type Cup Final referee name Values possible mappings Spain National event team Football Team team Pick best overall location Netherlands National mapping location Football Team Johannesburg Post-matching event FNB stadium processing Subscription Event type “”Soccer Match Event team “Spain” Event place “South Africa” 14 of 34
    • Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ie Event Subscription Types & properties possible mappings type type name place Values possible referee team mappings team location Pick best overall mapping 1 0.9 Lin 0.8 Post-matching event 0.7 Jiang&Conrath processing Precision 0.6 Leacock&Chodorow 0.5 0.4 Lesk 0.3 Path 0.2 0.1 Resnik 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Gloss Vector Recall WuPalmer 15 of 34
    • Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ie Event Subscription Types & properties possible mappings type type name place Values possible referee team mappings team location Pick best overall mapping Determine top m correspondence candidates Post-matching event RankSimJiiang&Conrath(ps, pe) processing Measure properties relatedness fP=Min(1,m-RankSimJiiang&Conrath(ps, pe) +1)*WikipediaESA(ps, pe)) 16 of 34
    • Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ie Event Subscription Types & properties possible mappings type type name place Values possible referee team mappings team location Pick best overall mapping type type Top 1 location 90% place Post-matching event processing team team type type Top 2 name 40% place referee team 17 of 34
    • Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ie Event Subscription Types & properties possible mappings Football Match Soccer Match Howard Webb Spain National Football Team South Africa Values possible Johannesburg Spain mappings FNB stadium Netherlands National Football Team Pick best overall mapping Measure values relatedness fV=WikipediaESA(Vs, Ve) Post-matching event processing 18 of 34
    • Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ie Event Subscription Types & properties possible mappings Football Match Soccer Match Howard Webb Spain National Football Team South Africa Values possible Johannesburg Spain mappings FNB stadium Netherlands National Football Team Pick best overall mapping Spain National Football 95% Spain Team Post-matching event processing Netherlands National 30% Spain Football Team 19 of 34
    • Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ie Event Subscription Types & properties possible mappings type type name place Values possible referee team mappings team location Pick best overall mapping Football Match Soccer Match Howard Webb Spain National Football Team South Africa Post-matching event Johannesburg Spain processing FNB stadium Netherlands National Football Team Calculate statements relatedness fSTMT =fP(ps, pe)*fV(vs, ve) 20 of 34
    • Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ie Event Subscription Types & properties possible mappings type type name place Values possible referee team mappings team location Pick best overall mapping Football Match Soccer Match Howard Webb Spain National Football Team South Africa Post-matching event Johannesburg Spain processing FNB stadium Netherlands National Football Team Determine correspondent event statement Corre by Max fSTMT 21 of 34
    • Proposed Approach InstantiationDigital Enterprise Research Institute www.deri.ie Types & properties  Rank within a window possible mappings  Complex Event Processing Values possible  … mappings Pick best overall mapping Post-matching event processing 22 of 34
    • Experiments OverviewDigital Enterprise Research Institute www.deri.ie  Methodology  Prepare an event set that reflect required semantic heterogeneity (Wikipedia events)  Prepare gold standard set of subscriptions that stress multiple aspects of semantic coupling  Validate suitability of semantic approximation from precision perspective  Use a different event set and same subscriptions to validate low maintainability cost (Freebase events)  Evaluation Criteria  Average interpolated Precision-Recall Curve on 11 recall points  Maximal F1 Score over the average curve 23 of 34
    • Experiment 1- Wikipedia EventsDigital Enterprise Research Institute www.deri.ie Event Set Statistics Source structured Wikipedia Infoboxes, DBpedia 31 August 2011 Collection Triples directly associated to instances of dbpedia-owl:Event class Data model RDF Total # of events 20,156 Total # of distinct event types 4,950 Total # of distinct event properties 1,459 Total # of distinct event values 500,717 Total # of triples 1,502,599 Average # of distinct type per event 7.42 Average # of distinct property per event 30.52 Average # of distinct value per event 54.16 Average # of triple per event 64.67 24 of 34
    • Experiment 1- Wikipedia EventsDigital Enterprise Research Institute www.deri.ie  Example Event Types  Football Match  Race  Music Festival  Space Mission  Election  10th-Century BC Conflicts  Academic Conference  Aviation Accident  … 25 of 34
    • Experiment 1- Subscription SetDigital Enterprise Research Institute www.deri.ie  Manually created gold standard set of subscriptions ID Description Subscription # of # of Event type Event Literals and relevant needed approximation properties resources events exact approximation approximation rules 1 Football matches event type "Football Match" 1 1 NO NO NO played by Spain in the event team "Spain national football FNB stadium team" event stadium "FNB Stadium" 2 Football matches event type "Football Match" 2 2 NO YES NO played in the FNB event place "FNB Stadium" stadium 3 Events taking place in event type "Event" 219 5 NO YES Syntactic Wembley stadium event place "Wembley Stadium" 4 Charity events taking event type "Charity" 29 6 YES YES Semantic place in Wembley event place "Wembley Stadium" + Syntactic stadium 5 Charity Rock events event type "Charity" 2 2 YES YES Semantic taking place in event type "Rock" + Syntactic Wembley stadium event place "Wembley Stadium" 6 Football matches event type "Football Match" 505 603 NO YES Background played in the UK event stadium "United Kingdom" Knowledge 7 Football matches event type "Football Match" 20 123,774 NO YES Background played by a South event team "South America" Knowledge American team in event stadium "Europe" Europe 26 of 34
    • Experiment 1- Subscription SetDigital Enterprise Research Institute www.deri.ie Event properties  Manually created gold standard set of subscriptions approximation approximation approximation Subscription # of relevant Literals and # of needed Description ID Description Template # of # of Event type Event Literals and exact rules Event type resources relevant needed approximation properties resources events exact approximation approximation rules events 1 Football matches event type "Football Match" 1 1 NO NO NO ID played by Spain in the event team "Spain national football FNB stadium team" event stadium "FNB Stadium" 3 Events taking event type 219 5 NO YES Syntactic 2 Football matches event type "Football Match" 2 2 NO YES NO place in Wembley place "FNB Stadium" played in the FNB event "Event" stadium stadium event place 3 Events taking place in event type "Event" 219 5 NO YES Syntactic Wembley stadium "Wembley event place "Wembley Stadium" 4 Charity events taking Stadium" event type "Charity" 29 6 YES YES Semantic place in Wembley event place "Wembley Stadium" + Syntactic stadium event type "Event" Subscription 5 Charity Rock events event place "Wembley Stadium" event type "Charity" 2 2 YES YES Semantic taking place in event type "Rock" + Syntactic Wembley stadium ?event rdf:type dbpedia-owl:Event. event place "Wembley Stadium" SPARQL pattern 1 6 Football matches ?event dbpprop:stadium event type "Football Match" 505 dbpedia:Wembley_Stadium. 603 NO YES Background played in the UK event stadium "United Kingdom" Knowledge ?event rdf:type dbpedia-owl:Event. SPARQL pattern 2 7 Football matches event type "Football Match" 20 123,774 NO YES Background played by a South ?event dbpedia-owl:location event team "South America" dbpedia:Wembley_Stadium. Knowledge American team in event stadium "Europe" … Europe … 27 of 34
    • Experiment 1- ResultsDigital Enterprise Research Institute www.deri.ie 1 0.9 0.8 0.7 Precision 0.6 0.5 Events taking place in Wembley stadium 0.4 0.3 Need for a hybrid matcher that 0.2 0.1 combines both 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall 45% Jiang&Conrath 40% Wikipedia ESA 35% Frequency 30% 25% 1 20% 0.9 15% 0.8 10% 0.7 Precision 0.6 5% 0.5 Football matches played in the UK 0% 0.4 0 2^ -25 2^ -20 2^ -15 2^ -10 0.3 2^ -5 1 0.2 Semantic similarity or relatedness score 0.1 (log scale) 0 Jiang&Conrath WikipediaESA 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Jiang&Conrath Wikipedia ESA 28 of 34
    • Experiment 1- ResultsDigital Enterprise Research Institute www.deri.ie  Hybrid matcher outperforms a single similarity or relatedness measure matcher. Matcher Jiang&Conrath Wikipedia ESA Hybrid Maximal F1 Score 70.06% 44.26% 75.45% Recall 80% 80% 90% Precision 62.31% 30.59% 64.94% 1 0.9 0.8 0.7 Precision 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Jiang&Conrath Wikipedia ESA Hybrid 29 of 34
    • Experiment 2- Freebase Event SetDigital Enterprise Research Institute www.deri.ie Event Set Statistics Source Freebase events dump 1 December 2011, triples current Collection Triples directly associated to instances of “fbase:time.event" class Data model RDF Total # of events 84,529 Total # of distinct event types 858 Total # of distinct event properties 1,242 Total # of distinct event values 1,199,627 Total # of triples 1,859,338 Average # of distinct type per event 3.33 Average # of distinct property per event 10.67 Average # of distinct value per event 21.66 Average # of triple per event 21.99 30 of 34
    • Experiment 2- Subscription SetDigital Enterprise Research Institute www.deri.ie  Same as in Experiment 1. ID Description Subscription # of # of Event type Event Literals and relevant needed approximation properties resources events exact approximation approximation rules 1 Football matches event type "Football Match" 1 1 YES YES NO played by Spain in the event team "Spain national football FNB stadium team" event stadium "FNB Stadium" 2 Football matches event type "Football Match" 8 2 YES YES NO played in the FNB event place "FNB Stadium" stadium 3 Events taking place in event type "Event" 29 5 NO YES NO Wembley stadium event place "Wembley Stadium" 4 Charity events taking event type "Charity" 0 - - - - place in Wembley event place "Wembley Stadium" stadium 5 Charity Rock events event type "Charity" 0 - - - - taking place in event type "Rock" Wembley stadium event place "Wembley Stadium" 6 Football matches event type "Football Match" 34 1,398 YES YES Background played in the UK event stadium "United Kingdom" Knowledge 7 Football matches event type "Football Match" 2 219,600 YES YES Background played by a South event team "South America" Knowledge American team in event stadium "Europe" Europe 31 of 34
    • Experiment 2- ResultsDigital Enterprise Research Institute www.deri.ie  Hybrid matcher gives similar results in Freebase as in DBpedia Matcher Jiang&Conrath Wikipedia ESA Hybrid Maximal F1 Score 44.60% 70.73% 76.33% Recall 60% 80% 80% Precision 35.49% 63.39% 72.98% 1 0.9 0.8 0.7 Precision 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Jiang&Conrath Wikipedia ESA Hybrid 32 of 34
    • ConclusionsDigital Enterprise Research Institute www.deri.ie  Approximate semantic matcher addresses subscriptions/ rules maintainability cost in heterogeneous and dynamic environments  Approximate semantic matcher is suitable when less than 100% precision is acceptable Approximate Semantic Exact Matcher Matcher Number of Required Subscriptions 345,000 7 Maximal F1-Score 100% 75.89%  A hybrid matcher outperforms a single similarity or relatedness measure matcher. 33 of 34
    • Future WorkDigital Enterprise Research Institute www.deri.ie  Need to enhance subscription set for more representativeness.  Approximate semantic matcher generates “uncertain” results whose impacts on further event processing functions such as CEP needs to be studied 34 of 34