SlideShare a Scribd company logo
1 of 36
Download to read offline
A Graph-based Approach to Learn Semantic
Descriptions of Data Sources

Mohsen Taheriyan
Craig Knoblock
Pedro Szekely
Jose Luis Ambite
Problem:
How to learn semantic descriptions?
First, what is a semantic description?
Semantic Description
Describing the source in terms of the concepts and relationships
defined by the domain ontology
Domain Ontology
bornIn

nearby

birthdate

isPartOf

livesIn

Place

name

Person
organizer
name

ceo

worksFor

Event

Organization
phone

postalCode

location
City

state

State
object property
data property
subClassOf

title
startDate

name

Source

3

Column 1
Bill Gates
Mark Zuckerberg
Larry Page

Column 2
Oct 1955
May 1984
Mar 1973

Column 3
Microsoft
Facebook
Google

Column 4
Seattle
White Plains
East Lansing

Column 5
WA
NY
MI
Semantic Types

Person

Person

Organization

name

State

name

birthdate

Column 1

Column 2

Column 3

Column 4

Column 5

Bill Gates

Oct 1955

Microsoft

Seattle

WA

Mark Zuckerberg

May 1984

Facebook

White Plains

NY

Larry Page

4

name

City

name

Mar 1973

Google

East Lansing

MI
Relationships
bornIn

Person
name

City
state

worksFor
name

birthdate

Organization

State
name

name
Column 1

Column 2

Column 3

Column 4

Column 5

Bill Gates

Oct 1955

Microsoft

Seattle

WA

Mark Zuckerberg

May 1984

Facebook

White Plains

NY

Larry Page

Mar 1973

Google

East Lansing

MI

This semantic model is converted to a semantic
description in R2RML
5
Previous approach to learn semantic
descriptions
Karma

Sample Data

C
R
F

Learn
Semantic Types

Steiner
Tree
Extract
Relationships

Semantic
Model

Domain Ontology

http://www.isi.edu/integration/karma

7

@KarmaSemWeb
Refining The Model

Initial Model

8
Refining The Model

Refined Model

• Previous work does not learn the changes done by the
user in relationships
• User has to go through the refinement process each time
9
Our new approach to learn semantic
descriptions
Key Idea
• Sources in the same domain often have
similar data
• Exploit knowledge of existing source
models
• Leverage relationships in known source
models to hypothesize relationships for
new sources
11
Approach
Inputs

…
S1

S2

Sn

Known Source Models

Domain Ontology

New Source

C
R
F

Learn
Semantic Types
Construct Graph G

Generate Candidate
Models
12

Rank Results
Example
Domain Ontology

Known Source Models
worksFor

Organization

City

Person
name

state

State

state
name

birthdate

name| birthdate|

name
name

city|

state|

S1 = personalInfo

workplace

State
name

isPartOf

City

ceo

City

name

state |
city
S2 = getCities

Person

name

State

name

name

company| ceo| city|

New Source

S4 = postalCodeLookup(zipcode, city, state)
13

location

Organization

bornIn

name

state

S3 = businessInfo
Build a Graph from Known Models
• Create a component in G for each known source model
– Only add if the model is not subgraph of an existing
component

• Annotate links with list of supporting models
S1 = personalInfo

name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}

City

Person
name
{s1}

{s1}

{s1}
birthdate

Person.name

{s1} name
City.name

Person.birthdate

Component 1
14

State

{s1}

name

State.name
Build a Graph from Known Models
• Create a component in G for each known source model
– Only add if the model is not subgraph of an existing
component

• Annotate links with list of supporting models
S1 = personalInfo

S2 = getCities

name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

Person.name

City.name

Person.birthdate

Component 1
15

{s1,s2}

name

State.name
Build a Graph from Known Models
• Create a component in G for each known source model
– Only add if the model is not subgraph of an existing
component

• Annotate links with list of supporting models
S1 = personalInfo

S2 = getCities

name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

Person.name

City.name

Person.birthdate

Component 1
16

{s1,s2}

name

State.name

S3 = businessInfo

Organization

location
{s3}

name

{s3}

City

ceo

{s3}

isPartOf

State

{s3}

Person

{s3} name

{s3}
name

Org.name
name
{s3}

State.name
City.name

Person.name

Component 2
Build a Graph from Known Models
• Connect graph components using all paths inferred from the
ontology

name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate
City.name

Person.name

{s1,s2}

worksFor
ceo

name

State.name

organizer

Organization

Person.birthdate
location

name

{s3}

Person

Event
isPartOf
location

{s3} name

isPartOf
{s3}

State
{s3}
name

Org.name

isPartOf
location

Place

name
{s3}
Person.name

isPartOf
isPartOf
17

City

ceo

{s3}
organizer

location
{s3}

State.name

City.name
Build a Graph from Known Models
•
•

Assign low weight = ε to links within a component (black links)
Weight other links according to their (green links)
M
= known source models
Wmax = number of links in M (>= |EG|) = 18
c1(e) = number of links in M whose <label,source, target>
match e
c2(e) = number of links in M whose <label> match e
we
= Min(Wmax - c1 , Wmax - c2/Wmax)

name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

name

17 worksFor

State.name
17

ceo
Organization

Person.birthdate
organizer
18

17.94

location

name

organizer 18
isPartOf
location

{s3} name

isPartOf
{s3}

State
{s3}
name

Org.name

isPartOf

location
17.94

Place

name
{s3}
Person.name

17.94
18

{s3}

Person

Event
17.94

City

ceo

{s3}

17.94

location
{s3}

17.94

isPartOf
isPartOf

State.name
City.name
Learn Semantic Types (Previous Work)
• A CRF-based model to assign a Semantic Type to each
column from its data
• Semantic Type
– Ontology Class
– Data Property + Domain

Domain Ontology

Place.postalCode

S4 = postalCodeLookup
19

City.name

(zipcode ,

city ,

State.name

state)
Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings

• Compute Steiner tree (minimal tree) for each mapping
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

Place.postalCode

S4 = postalCodeLookup

name

ceo
Organization

17.94

name

City

{s3}

Person

Event

state)

isPartOf
location

isPartOf

location
17.94

name
{s3}
Person.name

17.94
17.94

{s3} name

isPartOf
{s3}

State
{s3}
name

Org.name

Place
20

location
{s3}

ceo

{s3}

17.94

organizer 18
17.94

city,

State.name
17

location

(zipcode,

State.name

17 worksFor

Person.birthdate
organizer
18

City.name

isPartOf
isPartOf

State.name
City.name
Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings

• Compute Steiner tree (minimal tree) for each mapping
Mapping 1
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

Place.postalCode

S4 = postalCodeLookup

name

ceo
Organization

17.94

name

21

Place.postalCode

City

{s3}

Person

Event
isPartOf
location

postalCode

location
{s3}

ceo

{s3}

17.94

organizer 18
17.94

city,

state)

State.name
17

location

(zipcode,

State.name

17 worksFor

Person.birthdate
organizer
18

City.name

isPartOf

location

Place

name
{s3}
Person.name

17.94

{s3}

State
{s3}
name

Org.name
17.94

17.94

{s3} name

isPartOf

isPartOf
isPartOf

State.name
City.name
Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings

• Compute Steiner tree (minimal tree) for each mapping
Mapping 1
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

Place.postalCode

S4 = postalCodeLookup

name

ceo
Organization

17.94

name

22

Place.postalCode

City

{s3}

Person

Event
isPartOf
location

postalCode

location
{s3}

ceo

{s3}

17.94

organizer 18
17.94

city,

state)

State.name
17

location

(zipcode,

State.name

17 worksFor

Person.birthdate
organizer
18

City.name

isPartOf

location

Place

name
{s3}
Person.name

17.94

{s3}

State
{s3}
name

Org.name
17.94

17.94

{s3} name

isPartOf

isPartOf
isPartOf

State.name
City.name
Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings

• Compute Steiner tree (minimal tree) for each mapping
Mapping 2
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

Place.postalCode

S4 = postalCodeLookup

name

ceo
Organization

17.94

name

23

Place.postalCode

City

{s3}

Person

Event
isPartOf
location

postalCode

location
{s3}

ceo

{s3}

17.94

organizer 18
17.94

city,

state)

State.name
17

location

(zipcode,

State.name

17 worksFor

Person.birthdate
organizer
18

City.name

isPartOf

location

Place

name
{s3}
Person.name

17.94

{s3}

State
{s3}
name

Org.name
17.94

17.94

{s3} name

isPartOf

isPartOf
isPartOf

State.name
City.name
Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings

• Compute Steiner tree (minimal tree) for each mapping
Mapping 2
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

Place.postalCode

S4 = postalCodeLookup

name

ceo
Organization

17.94

name

24

Place.postalCode

City

{s3}

Person

Event
isPartOf
location

postalCode

location
{s3}

ceo

{s3}

17.94

organizer 18
17.94

city,

state)

State.name
17

location

(zipcode,

State.name

17 worksFor

Person.birthdate
organizer
18

City.name

isPartOf

location

Place

name
{s3}
Person.name

17.94

{s3}

State
{s3}
name

Org.name
17.94

17.94

{s3} name

isPartOf

isPartOf
isPartOf

State.name
City.name
Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings

• Compute Steiner tree (minimal tree) for each mapping
Mapping 3
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

Place.postalCode

S4 = postalCodeLookup

name

ceo
Organization

17.94

name

25

Place.postalCode

state)

isPartOf
location

isPartOf

location

Place

State

{s3}

name
{s3}
Person.name

17.94

{s3}

{s3} name

{s3}
name

Org.name
17.94

17.94

isPartOf

City

Person

Event

postalCode

location
{s3}

ceo

{s3}

17.94

organizer 18
17.94

city,

State.name
17

location

(zipcode,

State.name

17 worksFor

Person.birthdate
organizer
18

City.name

isPartOf
isPartOf

State.name
City.name
Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings

• Compute Steiner tree (minimal tree) for each mapping
Mapping 3
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

Place.postalCode

S4 = postalCodeLookup

name

ceo
Organization

17.94

name

Event
isPartOf

postalCode
26

Place.postalCode

state)

Place

State

{s3}

name
{s3}
Person.name

17.94

{s3}

name
{s3}

{s3}
name

Org.name

17.94
17.94

isPartOf

City

Person

isPartOf
location

location

location
{s3}

ceo

{s3}

17.94

organizer 18
17.94

city,

State.name
17

location

(zipcode,

State.name

17 worksFor

Person.birthdate
organizer
18

City.name

isPartOf
isPartOf

State.name
City.name
Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings

• Compute Steiner tree (minimal tree) for each mapping
Mapping 4
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

Place.postalCode

S4 = postalCodeLookup

name

ceo
Organization

17.94

name

27

Place.postalCode

state)

isPartOf
location

isPartOf

location
17.94

Place
17.94

{s3}

State

{s3}
{s3} name

{s3}
name

Org.name
name
{s3}
Person.name

17.94

isPartOf

City

Person

Event

postalCode

location
{s3}

ceo

{s3}

17.94

organizer 18
17.94

city,

State.name
17

location

(zipcode,

State.name

17 worksFor

Person.birthdate
organizer
18

City.name

isPartOf
isPartOf

State.name
City.name
Generate Candidate Models
• Map learned semantic types to nodes in graph G
– There might be multiple mappings

• Compute Steiner tree (minimal tree) for each mapping
Mapping 4
name
Organization
worksFor
{s1}
{s1}
bornIn
Org.name
state
{s1}
City
State
{s1,s2}
Person
name
{s1}

{s1,s2} name
{s1}
birthdate

City.name

Person.name

{s1,s2}

Place.postalCode

S4 = postalCodeLookup

name

ceo
Organization

17.94

name

28

Place.postalCode

state)

isPartOf
location

isPartOf

17.94

Place

name
{s3}
Person.name

17.94
17.94

{s3}

State

{s3}

Org.name
location

isPartOf

City

Person

Event

postalCode

location
{s3}

ceo

{s3}

17.94

organizer 18
17.94

city,

State.name
17

location

(zipcode,

State.name

17 worksFor

Person.birthdate
organizer
18

City.name

isPartOf
isPartOf

name
{s3}

{s3}
name

State.name
City.name
Rank Source Models
• Rank the candidates based on:
– Cost: sum of the weights
– Coherence: prefer the models with higher number of supporting models
Rank 1: Candidate 1
isPartOf

City

state

State

Rank 2: Candidate 4
isPartOf

City

{s1,s2}

Place

{s1,s2} name
{s1,s2} name

postalCode

State.name

Place.postalCode

isPartOf

{s3} name

postalCode

postalCode

{s1,s2}

name

State
{s3} name
State.name

Place.postalCode
City.name
29

State

{s3} name
State.name

Place.postalCode
City.name

City

Place

{s3}

Place

City.name

isPartOf

isPartOf

Rank 3: Candidate 2

isPartOf
isPartOf

State

City

Place

name {s1,s2} name
{s3}
postalCode
State.name
Place.postalCode
City.name

Rank 3: Candidate 3
Evaluation
• Dataset 1
– 17 data sources containing overlapping data
– Semantic descriptions created manually using DBPedia, FOAF,
GeoNames, and WGS84 ontologies

• Dataset 2
– 6 museum sources
– Semantic descriptions created by domain experts using EDM,
SKOS, and FOAF ontologies

• Learned a source model assuming other models as input
• Computed the Graph Edit Distance (GED) between the
learned model and the correct one
– Operations: node insertion, node deletion, edge insertion, edge
deletion, edge relabeling

• Compared the results with our previous work in Karma

30
Results - Dataset 1
GED
Source Signature
nearestCity(lat, lng, city, state, country)
findRestaurant(zipcode, restaurantName, phone, address)
zipcodesInCity(city, state, postalCode)
parseAddress(address, city, state, zipcode, country)
citiesOfState(state, city)
ocean(lat, lng, name)
postalCodeLookup(zipCode, city, state, country)
country(lat, lng, code, name)
companyCEO(company, name)
personalInfo(firstname, lastname, birthdate, brithCity, birthCountry)
businessInfo(company, phone, homepage, city, country, name)
restaurantChef(restaurant, firstname, lastname)
findSchool(city, state, name, code, homepage, ranking, dean)
employees(organization, firstname, lastname, birthdate)
education(person, hometown, homecountry, school, city, country)
administrativeDistrict(city, province, country)
capital(country, city)
TOTAL

#Attributes

Previous
work

5
4
3
5
2
3
4
4
2
5
6
3
7
4
6
3
2
68

6
1
3
6
1
2
6
2
1
4
10
2
8
1
9
4
2
68

New
Approach
(Rank 1)
1
0
1
1
0
1
1
0
0
1
8
1
6
2
4
1
1
29

57% improvement
31
Results - Dataset 2
GED
#Attributes

Previous
work

New
Approach
(Rank 1)

S1(Attribution, BeginDate, EndDate, Title, Dated, Medium,
Dimensions)

7

1

0

S2(ObjectID, ObjectTitle, ObjectWorkType, ArtistName,
ArtistBirthDate, ArtistDeathDate, ObjectEarliestDate, ObjectRights,
ObjectFacetValue1)

8

2

3

S3(death, birth, name)

3

0

0

S4(accessionNumber, artist, creditLine, dimensions, imageURL,
materials, relatedArtworksURL, creationDate, provenance,
keywordValues)

10

9

6

S5(AccessionNumber, Classification, CreditLine, Date, Description,
DimensionsOrphan, WhatValues, Who, image, relatedArtworksValues)

10

9

5

S6(Artist, ArtistBornDate, ArtistDiedDate, Classification, Copyright,
CreditLine, Image, KeywordValues, Ref, SitterValues)

10

8

6

TOTAL

68

29

20

Source Signature

31% improvement
32
Related Work
• Writing semantic descriptions by hand
– R2RML, SWRL
– Tedious and time-consuming task
– Requires expertise in SW technologies

• Semantic annotation of Web services and Web tables
– Very limited in learning the relationships

• Learning Semantic Definitions of Online Information
Sources [Carman, Knoblock, 2007]
– Learns LAV rules from known sources
– Can only learn descriptions that overlap known sources

33
Discussion
• Automatically build rich semantic descriptions of
data sources
• Exploit the background knowledge from (i) the
domain ontology, and (ii) the known source
models
• Semantic descriptions are the key ingredients to
automate many tasks, e.g.,
– Source Discovery
– Data Integration
– Service Composition

34
Future Work
• Investigate how to create a more compact graph
– Consolidate the overlapping segments of the known semantic models

• Relax the problem by removing the constraint that the correct
semantic type of each attribute is known
– CRF part returns a set of candidate semantic types along with their
confidence values

• Use the data available in Linked Open Data (LOD) cloud to learn
more accurate models
• Put the user in the loop
– Integrate the new approach into Karma
– The user refines one of the suggested models
– The new model will be added to the graph as a new pattern

35

More Related Content

Recently uploaded

AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 

Recently uploaded (20)

AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 

A Graph-based Approach to Learn Semantic Descriptions of Data Sources

  • 1. A Graph-based Approach to Learn Semantic Descriptions of Data Sources Mohsen Taheriyan Craig Knoblock Pedro Szekely Jose Luis Ambite
  • 2. Problem: How to learn semantic descriptions?
  • 3. First, what is a semantic description?
  • 4. Semantic Description Describing the source in terms of the concepts and relationships defined by the domain ontology Domain Ontology bornIn nearby birthdate isPartOf livesIn Place name Person organizer name ceo worksFor Event Organization phone postalCode location City state State object property data property subClassOf title startDate name Source 3 Column 1 Bill Gates Mark Zuckerberg Larry Page Column 2 Oct 1955 May 1984 Mar 1973 Column 3 Microsoft Facebook Google Column 4 Seattle White Plains East Lansing Column 5 WA NY MI
  • 5. Semantic Types Person Person Organization name State name birthdate Column 1 Column 2 Column 3 Column 4 Column 5 Bill Gates Oct 1955 Microsoft Seattle WA Mark Zuckerberg May 1984 Facebook White Plains NY Larry Page 4 name City name Mar 1973 Google East Lansing MI
  • 6. Relationships bornIn Person name City state worksFor name birthdate Organization State name name Column 1 Column 2 Column 3 Column 4 Column 5 Bill Gates Oct 1955 Microsoft Seattle WA Mark Zuckerberg May 1984 Facebook White Plains NY Larry Page Mar 1973 Google East Lansing MI This semantic model is converted to a semantic description in R2RML 5
  • 7. Previous approach to learn semantic descriptions
  • 10. Refining The Model Refined Model • Previous work does not learn the changes done by the user in relationships • User has to go through the refinement process each time 9
  • 11. Our new approach to learn semantic descriptions
  • 12. Key Idea • Sources in the same domain often have similar data • Exploit knowledge of existing source models • Leverage relationships in known source models to hypothesize relationships for new sources 11
  • 13. Approach Inputs … S1 S2 Sn Known Source Models Domain Ontology New Source C R F Learn Semantic Types Construct Graph G Generate Candidate Models 12 Rank Results
  • 14. Example Domain Ontology Known Source Models worksFor Organization City Person name state State state name birthdate name| birthdate| name name city| state| S1 = personalInfo workplace State name isPartOf City ceo City name state | city S2 = getCities Person name State name name company| ceo| city| New Source S4 = postalCodeLookup(zipcode, city, state) 13 location Organization bornIn name state S3 = businessInfo
  • 15. Build a Graph from Known Models • Create a component in G for each known source model – Only add if the model is not subgraph of an existing component • Annotate links with list of supporting models S1 = personalInfo name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City Person name {s1} {s1} {s1} birthdate Person.name {s1} name City.name Person.birthdate Component 1 14 State {s1} name State.name
  • 16. Build a Graph from Known Models • Create a component in G for each known source model – Only add if the model is not subgraph of an existing component • Annotate links with list of supporting models S1 = personalInfo S2 = getCities name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate Person.name City.name Person.birthdate Component 1 15 {s1,s2} name State.name
  • 17. Build a Graph from Known Models • Create a component in G for each known source model – Only add if the model is not subgraph of an existing component • Annotate links with list of supporting models S1 = personalInfo S2 = getCities name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate Person.name City.name Person.birthdate Component 1 16 {s1,s2} name State.name S3 = businessInfo Organization location {s3} name {s3} City ceo {s3} isPartOf State {s3} Person {s3} name {s3} name Org.name name {s3} State.name City.name Person.name Component 2
  • 18. Build a Graph from Known Models • Connect graph components using all paths inferred from the ontology name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} worksFor ceo name State.name organizer Organization Person.birthdate location name {s3} Person Event isPartOf location {s3} name isPartOf {s3} State {s3} name Org.name isPartOf location Place name {s3} Person.name isPartOf isPartOf 17 City ceo {s3} organizer location {s3} State.name City.name
  • 19. Build a Graph from Known Models • • Assign low weight = ε to links within a component (black links) Weight other links according to their (green links) M = known source models Wmax = number of links in M (>= |EG|) = 18 c1(e) = number of links in M whose <label,source, target> match e c2(e) = number of links in M whose <label> match e we = Min(Wmax - c1 , Wmax - c2/Wmax) name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} name 17 worksFor State.name 17 ceo Organization Person.birthdate organizer 18 17.94 location name organizer 18 isPartOf location {s3} name isPartOf {s3} State {s3} name Org.name isPartOf location 17.94 Place name {s3} Person.name 17.94 18 {s3} Person Event 17.94 City ceo {s3} 17.94 location {s3} 17.94 isPartOf isPartOf State.name City.name
  • 20. Learn Semantic Types (Previous Work) • A CRF-based model to assign a Semantic Type to each column from its data • Semantic Type – Ontology Class – Data Property + Domain Domain Ontology Place.postalCode S4 = postalCodeLookup 19 City.name (zipcode , city , State.name state)
  • 21. Generate Candidate Models • Map learned semantic types to nodes in graph G – There might be multiple mappings • Compute Steiner tree (minimal tree) for each mapping name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} Place.postalCode S4 = postalCodeLookup name ceo Organization 17.94 name City {s3} Person Event state) isPartOf location isPartOf location 17.94 name {s3} Person.name 17.94 17.94 {s3} name isPartOf {s3} State {s3} name Org.name Place 20 location {s3} ceo {s3} 17.94 organizer 18 17.94 city, State.name 17 location (zipcode, State.name 17 worksFor Person.birthdate organizer 18 City.name isPartOf isPartOf State.name City.name
  • 22. Generate Candidate Models • Map learned semantic types to nodes in graph G – There might be multiple mappings • Compute Steiner tree (minimal tree) for each mapping Mapping 1 name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} Place.postalCode S4 = postalCodeLookup name ceo Organization 17.94 name 21 Place.postalCode City {s3} Person Event isPartOf location postalCode location {s3} ceo {s3} 17.94 organizer 18 17.94 city, state) State.name 17 location (zipcode, State.name 17 worksFor Person.birthdate organizer 18 City.name isPartOf location Place name {s3} Person.name 17.94 {s3} State {s3} name Org.name 17.94 17.94 {s3} name isPartOf isPartOf isPartOf State.name City.name
  • 23. Generate Candidate Models • Map learned semantic types to nodes in graph G – There might be multiple mappings • Compute Steiner tree (minimal tree) for each mapping Mapping 1 name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} Place.postalCode S4 = postalCodeLookup name ceo Organization 17.94 name 22 Place.postalCode City {s3} Person Event isPartOf location postalCode location {s3} ceo {s3} 17.94 organizer 18 17.94 city, state) State.name 17 location (zipcode, State.name 17 worksFor Person.birthdate organizer 18 City.name isPartOf location Place name {s3} Person.name 17.94 {s3} State {s3} name Org.name 17.94 17.94 {s3} name isPartOf isPartOf isPartOf State.name City.name
  • 24. Generate Candidate Models • Map learned semantic types to nodes in graph G – There might be multiple mappings • Compute Steiner tree (minimal tree) for each mapping Mapping 2 name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} Place.postalCode S4 = postalCodeLookup name ceo Organization 17.94 name 23 Place.postalCode City {s3} Person Event isPartOf location postalCode location {s3} ceo {s3} 17.94 organizer 18 17.94 city, state) State.name 17 location (zipcode, State.name 17 worksFor Person.birthdate organizer 18 City.name isPartOf location Place name {s3} Person.name 17.94 {s3} State {s3} name Org.name 17.94 17.94 {s3} name isPartOf isPartOf isPartOf State.name City.name
  • 25. Generate Candidate Models • Map learned semantic types to nodes in graph G – There might be multiple mappings • Compute Steiner tree (minimal tree) for each mapping Mapping 2 name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} Place.postalCode S4 = postalCodeLookup name ceo Organization 17.94 name 24 Place.postalCode City {s3} Person Event isPartOf location postalCode location {s3} ceo {s3} 17.94 organizer 18 17.94 city, state) State.name 17 location (zipcode, State.name 17 worksFor Person.birthdate organizer 18 City.name isPartOf location Place name {s3} Person.name 17.94 {s3} State {s3} name Org.name 17.94 17.94 {s3} name isPartOf isPartOf isPartOf State.name City.name
  • 26. Generate Candidate Models • Map learned semantic types to nodes in graph G – There might be multiple mappings • Compute Steiner tree (minimal tree) for each mapping Mapping 3 name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} Place.postalCode S4 = postalCodeLookup name ceo Organization 17.94 name 25 Place.postalCode state) isPartOf location isPartOf location Place State {s3} name {s3} Person.name 17.94 {s3} {s3} name {s3} name Org.name 17.94 17.94 isPartOf City Person Event postalCode location {s3} ceo {s3} 17.94 organizer 18 17.94 city, State.name 17 location (zipcode, State.name 17 worksFor Person.birthdate organizer 18 City.name isPartOf isPartOf State.name City.name
  • 27. Generate Candidate Models • Map learned semantic types to nodes in graph G – There might be multiple mappings • Compute Steiner tree (minimal tree) for each mapping Mapping 3 name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} Place.postalCode S4 = postalCodeLookup name ceo Organization 17.94 name Event isPartOf postalCode 26 Place.postalCode state) Place State {s3} name {s3} Person.name 17.94 {s3} name {s3} {s3} name Org.name 17.94 17.94 isPartOf City Person isPartOf location location location {s3} ceo {s3} 17.94 organizer 18 17.94 city, State.name 17 location (zipcode, State.name 17 worksFor Person.birthdate organizer 18 City.name isPartOf isPartOf State.name City.name
  • 28. Generate Candidate Models • Map learned semantic types to nodes in graph G – There might be multiple mappings • Compute Steiner tree (minimal tree) for each mapping Mapping 4 name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} Place.postalCode S4 = postalCodeLookup name ceo Organization 17.94 name 27 Place.postalCode state) isPartOf location isPartOf location 17.94 Place 17.94 {s3} State {s3} {s3} name {s3} name Org.name name {s3} Person.name 17.94 isPartOf City Person Event postalCode location {s3} ceo {s3} 17.94 organizer 18 17.94 city, State.name 17 location (zipcode, State.name 17 worksFor Person.birthdate organizer 18 City.name isPartOf isPartOf State.name City.name
  • 29. Generate Candidate Models • Map learned semantic types to nodes in graph G – There might be multiple mappings • Compute Steiner tree (minimal tree) for each mapping Mapping 4 name Organization worksFor {s1} {s1} bornIn Org.name state {s1} City State {s1,s2} Person name {s1} {s1,s2} name {s1} birthdate City.name Person.name {s1,s2} Place.postalCode S4 = postalCodeLookup name ceo Organization 17.94 name 28 Place.postalCode state) isPartOf location isPartOf 17.94 Place name {s3} Person.name 17.94 17.94 {s3} State {s3} Org.name location isPartOf City Person Event postalCode location {s3} ceo {s3} 17.94 organizer 18 17.94 city, State.name 17 location (zipcode, State.name 17 worksFor Person.birthdate organizer 18 City.name isPartOf isPartOf name {s3} {s3} name State.name City.name
  • 30. Rank Source Models • Rank the candidates based on: – Cost: sum of the weights – Coherence: prefer the models with higher number of supporting models Rank 1: Candidate 1 isPartOf City state State Rank 2: Candidate 4 isPartOf City {s1,s2} Place {s1,s2} name {s1,s2} name postalCode State.name Place.postalCode isPartOf {s3} name postalCode postalCode {s1,s2} name State {s3} name State.name Place.postalCode City.name 29 State {s3} name State.name Place.postalCode City.name City Place {s3} Place City.name isPartOf isPartOf Rank 3: Candidate 2 isPartOf isPartOf State City Place name {s1,s2} name {s3} postalCode State.name Place.postalCode City.name Rank 3: Candidate 3
  • 31. Evaluation • Dataset 1 – 17 data sources containing overlapping data – Semantic descriptions created manually using DBPedia, FOAF, GeoNames, and WGS84 ontologies • Dataset 2 – 6 museum sources – Semantic descriptions created by domain experts using EDM, SKOS, and FOAF ontologies • Learned a source model assuming other models as input • Computed the Graph Edit Distance (GED) between the learned model and the correct one – Operations: node insertion, node deletion, edge insertion, edge deletion, edge relabeling • Compared the results with our previous work in Karma 30
  • 32. Results - Dataset 1 GED Source Signature nearestCity(lat, lng, city, state, country) findRestaurant(zipcode, restaurantName, phone, address) zipcodesInCity(city, state, postalCode) parseAddress(address, city, state, zipcode, country) citiesOfState(state, city) ocean(lat, lng, name) postalCodeLookup(zipCode, city, state, country) country(lat, lng, code, name) companyCEO(company, name) personalInfo(firstname, lastname, birthdate, brithCity, birthCountry) businessInfo(company, phone, homepage, city, country, name) restaurantChef(restaurant, firstname, lastname) findSchool(city, state, name, code, homepage, ranking, dean) employees(organization, firstname, lastname, birthdate) education(person, hometown, homecountry, school, city, country) administrativeDistrict(city, province, country) capital(country, city) TOTAL #Attributes Previous work 5 4 3 5 2 3 4 4 2 5 6 3 7 4 6 3 2 68 6 1 3 6 1 2 6 2 1 4 10 2 8 1 9 4 2 68 New Approach (Rank 1) 1 0 1 1 0 1 1 0 0 1 8 1 6 2 4 1 1 29 57% improvement 31
  • 33. Results - Dataset 2 GED #Attributes Previous work New Approach (Rank 1) S1(Attribution, BeginDate, EndDate, Title, Dated, Medium, Dimensions) 7 1 0 S2(ObjectID, ObjectTitle, ObjectWorkType, ArtistName, ArtistBirthDate, ArtistDeathDate, ObjectEarliestDate, ObjectRights, ObjectFacetValue1) 8 2 3 S3(death, birth, name) 3 0 0 S4(accessionNumber, artist, creditLine, dimensions, imageURL, materials, relatedArtworksURL, creationDate, provenance, keywordValues) 10 9 6 S5(AccessionNumber, Classification, CreditLine, Date, Description, DimensionsOrphan, WhatValues, Who, image, relatedArtworksValues) 10 9 5 S6(Artist, ArtistBornDate, ArtistDiedDate, Classification, Copyright, CreditLine, Image, KeywordValues, Ref, SitterValues) 10 8 6 TOTAL 68 29 20 Source Signature 31% improvement 32
  • 34. Related Work • Writing semantic descriptions by hand – R2RML, SWRL – Tedious and time-consuming task – Requires expertise in SW technologies • Semantic annotation of Web services and Web tables – Very limited in learning the relationships • Learning Semantic Definitions of Online Information Sources [Carman, Knoblock, 2007] – Learns LAV rules from known sources – Can only learn descriptions that overlap known sources 33
  • 35. Discussion • Automatically build rich semantic descriptions of data sources • Exploit the background knowledge from (i) the domain ontology, and (ii) the known source models • Semantic descriptions are the key ingredients to automate many tasks, e.g., – Source Discovery – Data Integration – Service Composition 34
  • 36. Future Work • Investigate how to create a more compact graph – Consolidate the overlapping segments of the known semantic models • Relax the problem by removing the constraint that the correct semantic type of each attribute is known – CRF part returns a set of candidate semantic types along with their confidence values • Use the data available in Linked Open Data (LOD) cloud to learn more accurate models • Put the user in the loop – Integrate the new approach into Karma – The user refines one of the suggested models – The new model will be added to the graph as a new pattern 35

Editor's Notes

  1. M = set of known source modelsk=Total number of links in M (&gt;= |EG|)c1(e)=number of links in M whose &lt;label,source, target&gt; match ec2(e)=number of links in M whose &lt;label,source&gt; match ec3(e)=number of links in M whose &lt;label,target&gt; match ec4(e)=number of links in M whose &lt;label&gt; match ewe=Min( k - ε/ [k-c1], k - ε/ [k * (k-c2)],k - ε/ [k * (k-c3)], k - ε/ [k * k * (k-c4)])
  2. If number of mappings from semantic types to the nodes in the graph is too large, only keep top k promising mappings: score the mappings based on their coherence (the nodes in the same pattern)
  3. If there is large number of mappings, we sort them based on their coherency: how many number of each mapping belong to the same pattern?