Lecture 05-SchemaMatching.ppt

Schema Matching and Integration
IIS 651 (S 2022)
1

Outline
 Schema and Schema Matching
 Schema Heterogeneity & Data Interoperability
 Large Scale Scenarios concerning Schema Matching and
Integration
 Related Work
 Our approach to handle Large Scale Scenario
 PORSCHE (Performance Oriented Schema Mediation)
 Future Research Directions
2

Schema
origin in Greek, meaning "shape“ or "plan"
From computer science perspective –
• description of the relationship of data/ information in some
structured way or
• a set of rules defining the relationship
or
• a model to represent the data
For example
• Relational Schema
• XML Schema
• Class Diagram ….
3

Relational Database Schema
4
book_id
book
title
author_id
author
name pub_id
publisher
name
book_id
detail
author_id pub_id
books

XML Schema
5
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="time">
<xs:complexType/>
</xs:element>
<xs:element name="day">
<xs:complexType/>
</xs:element>
<xs:element name="courseCode">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element ref="time"/>
<xs:element ref="day"/>
<xs:element ref="Instructor"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="arizonaCourses">
<xs:complexType>
<xs:sequence>
<xs:element ref="courseCode"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Instructor">
<xs:complexType/>
</xs:element>
</xs:schema>

Web Interface Form Schema
From city or airport* To city or airport*
I f y o u a r e u n s u r e o f t h e s p e l l i n g o f a c i t y o r a i r p o r t , e n t e r t h e
f i r s t 3 o r m o r e l e t t e r s f o l l o w e d b y a n a s t e r i s k ( * ) .
Departure date Departure time
Jul 2008 23 Any Time
Wednesday
Return Date Return time
Thursday
Traveler types
Adults
(12-64 yrs)
1
Children
(2-11 yrs)
0
Seniors
(65+ yrs)
0
Infants (0-
23 months)
0
Cabin type
Coach
Direct or Non-Stop flights only
More search options
6

Schema Matching
7
• Takes two schemas/ontologies as input and produces a
mapping between elements of the two schemas that
correspond semantically to each other [Halevy05]
1-1 match
complex match
26,60 Harry Potter J. K. Rowling
11,50 Marie Des Juliette Benzoni
Intrigues
16,50 Nous Les Bernard Werber
Dieux
24 Pompei Robert Harris
price book-title author-name
Books
Source A
listed-price title a-fname a-lname
Books
Source B

Applications of Schema Matching
• Data Interoperability
• Data Integration
• Data Warehousing
• Catalogue Integration
• Web Services Discovery and
Composition
• Query over the Web
• ...
• Data Exchange
• E-commerce
• Agents Communication
• ...
8
Static
Dynamic
Contributing
Schema Set Not
Evolving >>
Matching and
Mapping is one
time process
Contributing
Schema Set
Evolving >>
Matching and
Mapping also
evolve

Schema Heterogeneity &
Data Interoperability
• A key roadblock for information integration!
• Different data sources speak their own schema
10
Consumer
Data Source
Data Source
Data Source
Hotels, Youth Centers
Lodges, Restaurants
Beaches, Volcanoes
Hotel, Restaurant,
AdventureSports,
HistoricalSites

Schema Integration and Mediation
• All concerned data sources schemas are merged together into one
schema, without any concept redundancy. i.e. similar concepts are
represented by one concept
• All the input data sources schemas are mapped to this integrated
schema, also called the mediated schema
12
Consumer
Data Source
Data Source
Data Source
Hotels, Youth Centers
Lodges, Restaurants
Beaches, Volcanoes
Hotel, Restaurant,
AdventureSports,
HistoricalSites
Mediation

Mediation
Schema Mapping is key to any data sharing architecture
13
[Tomasic et al. IEEE TKDE 1998].
Mediated Schema
Source n
Source 1 Source 2
mappings
...
wrapper wrapper wrapper
User Query
sub-query
sub-query
sub-query

Schema
Matching, Mapping, Integration & Mediation
14
S1
B C
S2
B1 C2
C1
Matching
S1
B C
S2
B1 C2
C1
Mapping
Merging/ Integration
Si
B C1
C
Mediation
Si
B C1
C
S1
B C
S2
B1 C2
C1
Finding similarities
between schemas
Final correspondences
between elements
of two schemas
Based upon schema
mappings, merging
schemas into one schema
Mappings from source
schemas to the integrated
schema for data interoperability

Different Research Domains - Mediation
15
Mediation
Distributed
Databases
Data
Warehousin
g
Data Mining
……………
Informatio
n Retrieval
Knowledge
Extraction

Large Scale Scenario
• Creating a mediated schema from two large schemas (with thousands
of nodes).
• For example Open Applications Group Integration Specification (OAGIS)1
XML schema instances with number of elements in thousands
• Creating a mediated schema from a large set of schemas (with
hundreds of schemas and thousands of nodes)
• For example creating a mediated web interface input form (schema) from
the hundreds of web interface forms (schemas) related to travel domain2
17
1. http://www.openapplications.org/
2. http://metaquerier.cs.uiuc.edu
Large scale schema matching and integration requires
automated approach

Related Work
18
Pre-Match
eTuner
[Lee&Doan 07]
Amid-Match
SCIA
[Wang et al 07]
Post-Match
COMA++
[Do et al 07,
Manakanatas06]
Tuning approach
Large Scale Schema Matching and
Integration Approaches
Incremental Holistic
Fragmentation Clustering Mining
Data-mining
Element
Level
Schema
Level
Tree-mining
COMA++
[Do&Rahm07]
BellFlower
[Smiljanic06]
DCM [He et al 04]
xClust
[Lee et al 02]
PORSCHE
[Saleem et al 08]

An approach to handle
Large Scale Scenario
 Handle Schemas as Trees
 Apply the Clustering Method
 Use Tree Mining
 Devise Hybrid Approach
19
Result
Automated Approach having
Good Time Performance with
Approximate Match Quality

From city or airport* To city or airport*
I f y o u a r e u n s u r e o f t h e s p e l l i n g o f a c i t y o r a i r p o r t , e n t e r t h e
f i r s t 3 o r m o r e l e t t e r s f o l l o w e d b y a n a s t e r i s k ( * ) .
Departure date Departure time
Wednesday
Return Date Return time
Thursday
Traveler types
Adults
(12-64 yrs)
1
Children
(2-11 yrs)
0
Seniors
(65+ yrs)
0
Infants (0-
23 months)
0
Cabin type
Coach
Direct or Non-Stop flights only
More search options
20
Schemas as trees – Web Interface Forms
absTravel
From
D_City
To
A_City
Departure
Date
D_Month
D_Day
D_Time
Return
Date
R_Month
R_Day
R_Time
CabinType
TravelerTypes
Adults
Children
Seniors
Infants
absTravel
D_City
D_Day
Return
D_Month
Departure
A_City
D_Time
CabinType
Adults
Children
Seniors
Infants
D_Day
D_Month
D_Time
TravlerTypes
From
To
Date
Date
[He et al. KDD 2004]

Schemas as trees – Relational Database
21
books
book_id
author_id
author
detail
name
publisher
title
pub_id name
book_id
book
title
author_id
author
name pub_id
publisher
name
book_id
detail
author_id pub_id
books
[Lee et al. CIKM 2006]

Schemas as trees – XML Schema
22
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="time">
<xs:complexType/>
</xs:element>
<xs:element name="day">
<xs:complexType/>
</xs:element>
<xs:element name="courseCode">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element ref="time"/>
<xs:element ref="day"/>
<xs:element ref="Instructor"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="arizonaCourses">
<xs:complexType>
<xs:sequence>
<xs:element ref="courseCode"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Instructor">
<xs:complexType/>
</xs:element>
</xs:schema>
arizonaCourses
courseCode
day
time place instructor

A speculatively rooted tree for rRNA genes
23

Schema Tree Benefit
• Tree structure for a data model inherently supports the contextual
meanings of the descendent nodes.
24
A
B
C
S1
D
A1
B1
C11
C1
S2
D
D
X
A
B C
D
S1
A1
B C11
C1
D D
S2

Element Level Clustering
• Clustering helps in target search space optimization
• Schema elements clustering based on label similarity
25
A
B
C
A1
B1
C4
C1
A
B
C2
A1
B1
C3
C5
D
D
S1 S2 S3 Si
Node Labels Similarity
C ≈ C1 ≈ C2 ≈ C3 ≈ C4 ≈ C5
t1 t2 t3 t4 …… tn
s1
s2
s3
s4
…
sm
a1
a2
a3
a4 …
aq
Typical matching scenario

Tree Mining Aspect
• Tree mining finds frequent sub-trees in a given set of trees;
• similar to schema matching, which finds similar concepts among a set of
schemas
• Use of data structures supporting tree mining algorithms for schema
matching is possible
• Helps in handling Large Scale Scenario
• Supports the context of nodes
26
computers
Desktop notebook
Software
Desktop notepad

Tree mining example
• Element Level Matching (sub-tree size 1)
• Structure Level Matching (sub-tree size > 1)
27
b
a p
n
t
n
b
a f
n
t
p i
n
b
d
a
f
t p r
a
n h b
t
a
n
b
t
b
p t ……

Hybrid Approach
28
Matching
Mapping
Integratio
n
Mediation
Schema Trees
Clustering
Tree Mining

Database Research Advances Reports
• https://dsf.berkeley.edu/claremont/claremontreport08.pdf
• https://beckman.cs.wisc.edu/beckman-report2013.pdf
• https://link.springer.com/article/10.1007/s10796-017-9819-2
• https://sigmodrecord.org/publications/sigmodRecord/1912/pdfs/07_
Reports_Abadi.pdf Last one 2018 …
• https://www.sciencedirect.com/science/article/pii/S0306437908000
15X
• https://vldb.org/2021/?papers-research
29

Lecture 05-SchemaMatching.ppt

Recommended

Recommended

More Related Content

Similar to Lecture 05-SchemaMatching.ppt

Similar to Lecture 05-SchemaMatching.ppt (20)

More from Asadkhan47384

More from Asadkhan47384 (15)

Recently uploaded

Recently uploaded (20)

Lecture 05-SchemaMatching.ppt