(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
semantic integration.ppt
1. Semantic integration of data in
database systems and
ontologies
Ing. Petra Šeflová
Technical university of Liberec
Faculty of mechatronics
2. 2
Integration of data
- merging a set given schemas into global
schema
Semantic integration
- part of concept integration of data
- be focusing on data exchange between
applications in the light of their meaning, content
and required business rules
3. 3
Example
Integration of data
homeseekers.com
Source schema
Source schema
Source schema
wrapper
wrapper
wrapper
mediated schema
Find houses
with four
bathrooms
and price
under
$500.000
realestate.com
greathomes.com
A data integration system in the real estate domain.
5. 5
Key commonalities application of
Semantic integration
Use structured representation
(e.g. relational schemas and XML DTDs)
Must resolve heterogenities with respect to the
schema and their data
Enable their manipulation
Merging the schemas
Computing differences
Enable translation of data and queries across the
schemas/ontologies
6. 6
• Database schema
– Present definition physical system layout (database)
• Ontology
– System of knowledge about world
• Claimless on coherence (lot of partial ontology)
• Frequently specific created artefact
– Definition of Gruber: Ontology is formal, explicit
specification sharing conceptualization.
7. 7
Problems of Semantic integration
Semantic of elements can be inferred from only
a few information sources
Creators of data
Dokumentation
Associated schema and data
Schema element are typically matched based on
clues in the schema and data
Schema and data clues are often incomlpete
Matching is often subjective, depending in the
application
8. 8
Matching process
• Take as input two schemas/ontologies,
each consisting of a set discrete entities,
and determine as output the relationships
holding between these entities
9. 9
Location Price ($) Agent-id
Atlanta, GA 360,000 32
Raleigh, NC 430,000 15
Id Name city state
32 Mike Brown Athlanta GA
15 Jean Laup Relaign NC
Area list-
price
Agent-
address
Agent-
name
Denver,
CO
550,000 Boulder,CO Laura
Smith
Atlanta,
GA
370,800 Athens,
GA
Mike
Brown
Schema S
Houses
Schema T
Agents
Example : The schema of two relational database S and T on house
listing, and the semantic correspondence between them
11. 11
Rule-based solutions
Many of the early as well as current matching
solutions employ hand-crafted rules
Exploit schema information
Element names
Data types
Structures
Integrity constraints
Can provide a quick and concise method to
capture valuable user knowledge about domain
12. 12
Rule-based solutions
Benefits
„relatively inexpensive“
Do not require training
Operate only on schema
Drawback
They cannot exploit data instance effectively
They cannot exploit previous matching efforts
For example :
TranScm
DIKE
MOMIS
CUPID
13. 13
• TranScm
– Employs rules such as
„two elements match if they have the same name
(allowing synonyms) and the same number of
subelements
• DIKE
– Computes similarity between two schema element
based on similarity of the characteristics of the
element and similarity of related elements
• MOMIS
– Compute similarity of schema elements as a
weighted suma of the similarity of name,data type
and substructure
• CUPID
– Employs rules that categorize elements based on
names, data types and domains
14. 14
Learning-based solutions
Exploit both schema and data information
They do exploit previous matching efforts
Examples:
SemInt system
LSD system
iMAP system
Autocomplex
Automatch
15. 15
• SemInt
– Uses a neuralnetwork learning approaches
– It matched schema elements based on attribute
specifications and statistic of data content
• LSD
– Employs Naive Bayes over data instance
– Develop novel learning solution exploit the
hierarchical nature of XML data
• iMAP
– Matches the schemas of two sources by analyzing
the description of objects that are found in both
sources
• Autoplex and Automatch
– Use a Naive Bayes learning approach that exploits
data instances to match element
17. 17
Input dimension
• Concern the kind of input on which algorithm
operate
• First dimension
– Algorithms depending on the data/ conceptual model
in which ontologies or schemas are expressed
• Second dimension
– Depend on the kind of data algorithms exploit
– Different approaches exploit different information of
the input data/conceptual models
• Schema-level information
• Instance data
• Exploit both
18. 18
Process dimensions
• Classification of the matching process could be
based on its general properties
• It depends on the approximate or exact nature
of its computation
– Exact algorithms compute the absolute solution to a
problem
– Approximate algorithms sacrifice exactness to
performance
• Three large classes based on intrinsic input,
external resources or some semantic theory
– Syntactic
– External
– Semantic
19. 19
Output dimensions
• Concern the form of the result they produce
– One-to-one correspondence
– Is any relation suitable
– Has it to be final mapping element
• System deliver a graded answer
• Correspondences hold with 98% confidence
• Correspondences hold with 4/5 probability
• All-or-nothing answer
– Correspondences using distance measuring
– Kind of relations between entities a system can
provide
• Equivalence
• Subsumption
• Incompatibility
20. 20
Classification of elementary schema-based
matching approaches
Schema-Based Matching Techniques
Element-level Structure-level
Syntantic
Syntactic External
Linguistic Internal Relational
Semantic
Structural
Terminological
Schema-Based Matching Techniques
Semantic
External
String-
Based
Language-
Based
Linguistic
Resource
Contraint-
Based
Upper
Level
Formal
ontologies
Graph-
Based
Taxonomy-
Based
Repository
of
Structure
Model-
Based
Alignment
reuse
Basic
Techniques
layer
Granuality/Input Interpretation layer
21. 21
Element-level vs structure-level
Element-level matching techniques
compute mapping elements by analyzing
entities in isolation
Ignoring their relation with other entities
Structure-level techniques compute
mapping elements by analyzing how
entities appear together in a structure
22. 22
Internal vs external techniques
Interal
Exploiting information which comes only with input
schema/ontologies
Syntactic interpretation of input
Sematic interpretation of input
External
Exploit auxiliary (external) resources of domain to
interpret the input
Resources :
Human input
Some thesaurus expressing the relationship between terms
23. 23
Schema Matching vs Ontology Matching
Differences
Database schema often do not provide explicit
semantics for their data
Semantics is usually specified explicitly at design-
time
Usually performed with the help of techniques trying
to guess the meaning encoded in the schemas
Ontologies are logical systems that themselves
obey some formal semantics
Primarily try to exploit knowledge explicitly encoded
in the ontologies
24. 24
Schema Matchin vs Ontology Matching
Commonalities
Ontologies and schemas are similar in the
sense :
Provide a vocablurary of terms that describes a
domain of interest
Constrain the meaning of terms used in
vocablurary
Schema and ontologies are found in such
enviroment as the Semantic web
25. 25
Sources :
• Natalya F.Noy : Semantic Integration: A survey of Ontology-Based
Approaches
• AnHai Doan, Alon Y. Haley: Semantic Integration in the Database
Community: A Brief Survey
• P.Schvaiko, J. Euzenat: A Survey of schema-based Matching
Approaches
• G. Antonious, F. van Harmelen: A Semantic Web Primer
• R. Araújo, H. Sofia Pinto: Toward Semantics-based ontology
similarity
• H. Wache, T. Vögele, U. Visser, H. Stuckenschmidt, G. Shuster, H.
Neumann and S. Húbner: Ontology-based integration of information
– A survey existing Approaches
• E. Rahm, P.A. Bernstein: A survey of approaches to automatic
schema matching