The document discusses schema matching and integration for large scale scenarios. It proposes handling schemas as trees and applying a hybrid approach involving clustering the schema elements based on label similarity, performing tree mining to find similar sub-trees, and developing a mediated schema through an automated matching, mapping, and integration process to enable data interoperability across large, heterogeneous schemas.
Multi-Model Data Query Languages and Processing ParadigmsJiaheng Lu
Specifying users' interests with a formal query language is a typically challenging task, which becomes even harder in the context of multi-model data management because we have to deal with data variety. It usually lacks a unified schema to help the users issuing their queries, or has an incomplete schema as data come from disparate sources. Multi-Model DataBases (MMDBs) have emerged as a promising approach for dealing with this task as they are capable of accommodating and querying the multi-model data in a single system. This tutorial aims to offer a comprehensive presentation of a wide range of query languages for MMDBs and to make comparisons of their properties from multiple perspectives. We will discuss the essence of cross-model query processing and provide insights on the research challenges and directions for future work. The tutorial will also offer the participants hands-on experience in applying MMDBs to issue multi-model data queries.
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
One of the most challenging issues in the era of Big Data is the
“Variety” of the data. In general, there are two solutions to directly manage multi-model data currently: a single integrated multi-model database system or a tightly-integrated middleware over multiple single-model data stores. In this tutorial, we review and compare these two approaches giving insights on their advantages, tradeoffs, and research opportunities. In particular, we dive into four key aspects of technology for both types of systems, namely (1) theoretical foundation of multi-model data management, (2) storage strategies for multi-model data, (3) query languages across models, and (4) query evaluation and its optimization. We provide a comparison of performance for the two approaches and discuss related open problems and remaining challenges.
CSV-X is a schema language, model, and processing engine for non-uniform CSV enabling annotation, validation, cross-referencing, Linked Data, RDF serialization, and transformation to other formats.
Some background and thoughts on Metadata Mapping and Metadata Crosswalks. A collection of online sources and related projects. Comments are more than welcome, as is reuse!
Pragmatic Approaches to the Semantic WebMike Bergman
Mike Bergman offers his take on what approaches to the semantic Web are working, what are not, and what all of this might say about the semantic Web moving forward. Informed by Structured Dynamics' open source frameworks and client experiences, the main thesis is that the pragmatic contribution of semantic technologies resides more in mindsets, information models and architectures than in 'linked data' as currently practiced.
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
Multi-Model Data Query Languages and Processing ParadigmsJiaheng Lu
Specifying users' interests with a formal query language is a typically challenging task, which becomes even harder in the context of multi-model data management because we have to deal with data variety. It usually lacks a unified schema to help the users issuing their queries, or has an incomplete schema as data come from disparate sources. Multi-Model DataBases (MMDBs) have emerged as a promising approach for dealing with this task as they are capable of accommodating and querying the multi-model data in a single system. This tutorial aims to offer a comprehensive presentation of a wide range of query languages for MMDBs and to make comparisons of their properties from multiple perspectives. We will discuss the essence of cross-model query processing and provide insights on the research challenges and directions for future work. The tutorial will also offer the participants hands-on experience in applying MMDBs to issue multi-model data queries.
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
One of the most challenging issues in the era of Big Data is the
“Variety” of the data. In general, there are two solutions to directly manage multi-model data currently: a single integrated multi-model database system or a tightly-integrated middleware over multiple single-model data stores. In this tutorial, we review and compare these two approaches giving insights on their advantages, tradeoffs, and research opportunities. In particular, we dive into four key aspects of technology for both types of systems, namely (1) theoretical foundation of multi-model data management, (2) storage strategies for multi-model data, (3) query languages across models, and (4) query evaluation and its optimization. We provide a comparison of performance for the two approaches and discuss related open problems and remaining challenges.
CSV-X is a schema language, model, and processing engine for non-uniform CSV enabling annotation, validation, cross-referencing, Linked Data, RDF serialization, and transformation to other formats.
Some background and thoughts on Metadata Mapping and Metadata Crosswalks. A collection of online sources and related projects. Comments are more than welcome, as is reuse!
Pragmatic Approaches to the Semantic WebMike Bergman
Mike Bergman offers his take on what approaches to the semantic Web are working, what are not, and what all of this might say about the semantic Web moving forward. Informed by Structured Dynamics' open source frameworks and client experiences, the main thesis is that the pragmatic contribution of semantic technologies resides more in mindsets, information models and architectures than in 'linked data' as currently practiced.
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
While mathematicians have used graph theory since the 18th century to solve problems, the software patterns for graph data are new to most developers. To enable "mass adoption" of graph technology, we need to establish the right abstractions, access APIs, and data models.
RDF triples, while of paramount importance in establishing RDF graph semantics, are a low-level abstraction, much like using assembly language. For practical and productive “graph programming” we need something different.
Similarly, existing declarative graph query languages (such as SPARQL and Cypher) are not always the best way to access graph data, and sometimes you need a simpler interface (e.g., GraphQL), or even a different approach altogether (e.g., imperative traversals such as with Gremlin).
Ora Lassila is a Principal Graph Technologist in the Amazon Neptune graph database group. He has a long experience with graphs, graph databases, ontologies, and knowledge representation. He was a co-author of the original RDF specification as well as a co-author of the seminal article on the Semantic Web.
The following was presented at the Semantic Technology conference in March of 2006 in San Jose California. This case study examines the extension of the National
Information Exchange Model NIEM to include K-12
education metadata. NIEM’s compliance with ISO/IEC
11179 metadata standards was found to be critical for
cost-effective system interoperability. This study indicates
that extending the NIEM can be compatible with newer
RDF and OWL metadata standards. We discuss how this
strategy will dramatically lower data integration costs and
make longitudinal data analysis more cost-effective. We
make recommendations for state education agencies,
federal policy makers, and metadata standards
organizations. The conclusion discusses the possible
impacts of recent innovations in collaborative metadata
standards efforts.
Presentation about - Semantic Web - Overview -Semantic Web
Web of Data, Giant Global Graph, Data Web, Web 3.0, Linked Data Web, Semantic Data Web, Enterprise Information Web, HTML, CSS,
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...Dr. Aparna Varde
These are slides from a 3-hour tutorial on some interesting aspects of the Web addressed from a scientific data management angle. It is co-authored by Fabian Suchanek, Aparna Varde, Pierre Senellart and Richi Nayak and has been presented at the ACM EDBT conference, March 2011, in Uppsala, Sweden.
Presentation given at the CBS (Central Bureau of Statistics) by CEDAR members on 06-11-2014 for the Studiemiddag "Digitalisering historische CBS-collectie" (digitisation of the CBS historical collection). All things on converting Excel spreadsheets to RDF Data Cube, harmonisation, and using Linked Data for standardizing statistical data on the Web.
Modern day application engineering demands persistence of complex and dynamic shapes of data to match the highly flexible and powerful languages used in today's software landscape. Traditional approaches to solution development with RDBMS increasingly expose the gap between the ease of use of modern development languages and the relational data model.
In this presentation, we will explore how the MongoDB programming model and APIs transform the way developers interact with a database. You will learn more about the document model, MongoDB’s flexible and durable architecture and get insight into best practices for migrating from SQL to MongoDB
Semantics in Financial Services -David NewmanPeter Berger
David Newman serves as a Senior Architect in the Enterprise Architecture group at Wells Fargo Bank. He has been following semantic technology for the last 3 years; and has developed several business ontologies. He has been instrumental in thought leadership at Wells Fargo on the application of Semantic Technology and is a representative of the Financial Services Technology Consortium (FSTC)on the W3C SPARQL Working Group.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
Andreas presents an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Find the answers to:
Why go semantic?
Should i use RDF or OWL?
What is the difference, what is the link?
Did you say smart data?
In this presentations you can check RDF Integration examples, learn about Ontologies and OWL
By Tara Raafat, (PhD) Chief Ontologist at Mphasis.
While mathematicians have used graph theory since the 18th century to solve problems, the software patterns for graph data are new to most developers. To enable "mass adoption" of graph technology, we need to establish the right abstractions, access APIs, and data models.
RDF triples, while of paramount importance in establishing RDF graph semantics, are a low-level abstraction, much like using assembly language. For practical and productive “graph programming” we need something different.
Similarly, existing declarative graph query languages (such as SPARQL and Cypher) are not always the best way to access graph data, and sometimes you need a simpler interface (e.g., GraphQL), or even a different approach altogether (e.g., imperative traversals such as with Gremlin).
Ora Lassila is a Principal Graph Technologist in the Amazon Neptune graph database group. He has a long experience with graphs, graph databases, ontologies, and knowledge representation. He was a co-author of the original RDF specification as well as a co-author of the seminal article on the Semantic Web.
The following was presented at the Semantic Technology conference in March of 2006 in San Jose California. This case study examines the extension of the National
Information Exchange Model NIEM to include K-12
education metadata. NIEM’s compliance with ISO/IEC
11179 metadata standards was found to be critical for
cost-effective system interoperability. This study indicates
that extending the NIEM can be compatible with newer
RDF and OWL metadata standards. We discuss how this
strategy will dramatically lower data integration costs and
make longitudinal data analysis more cost-effective. We
make recommendations for state education agencies,
federal policy makers, and metadata standards
organizations. The conclusion discusses the possible
impacts of recent innovations in collaborative metadata
standards efforts.
Presentation about - Semantic Web - Overview -Semantic Web
Web of Data, Giant Global Graph, Data Web, Web 3.0, Linked Data Web, Semantic Data Web, Enterprise Information Web, HTML, CSS,
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...Dr. Aparna Varde
These are slides from a 3-hour tutorial on some interesting aspects of the Web addressed from a scientific data management angle. It is co-authored by Fabian Suchanek, Aparna Varde, Pierre Senellart and Richi Nayak and has been presented at the ACM EDBT conference, March 2011, in Uppsala, Sweden.
Presentation given at the CBS (Central Bureau of Statistics) by CEDAR members on 06-11-2014 for the Studiemiddag "Digitalisering historische CBS-collectie" (digitisation of the CBS historical collection). All things on converting Excel spreadsheets to RDF Data Cube, harmonisation, and using Linked Data for standardizing statistical data on the Web.
Modern day application engineering demands persistence of complex and dynamic shapes of data to match the highly flexible and powerful languages used in today's software landscape. Traditional approaches to solution development with RDBMS increasingly expose the gap between the ease of use of modern development languages and the relational data model.
In this presentation, we will explore how the MongoDB programming model and APIs transform the way developers interact with a database. You will learn more about the document model, MongoDB’s flexible and durable architecture and get insight into best practices for migrating from SQL to MongoDB
Semantics in Financial Services -David NewmanPeter Berger
David Newman serves as a Senior Architect in the Enterprise Architecture group at Wells Fargo Bank. He has been following semantic technology for the last 3 years; and has developed several business ontologies. He has been instrumental in thought leadership at Wells Fargo on the application of Semantic Technology and is a representative of the Financial Services Technology Consortium (FSTC)on the W3C SPARQL Working Group.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
Andreas presents an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Find the answers to:
Why go semantic?
Should i use RDF or OWL?
What is the difference, what is the link?
Did you say smart data?
In this presentations you can check RDF Integration examples, learn about Ontologies and OWL
By Tara Raafat, (PhD) Chief Ontologist at Mphasis.
"Understanding the Carbon Cycle: Processes, Human Impacts, and Strategies for...MMariSelvam4
The carbon cycle is a critical component of Earth's environmental system, governing the movement and transformation of carbon through various reservoirs, including the atmosphere, oceans, soil, and living organisms. This complex cycle involves several key processes such as photosynthesis, respiration, decomposition, and carbon sequestration, each contributing to the regulation of carbon levels on the planet.
Human activities, particularly fossil fuel combustion and deforestation, have significantly altered the natural carbon cycle, leading to increased atmospheric carbon dioxide concentrations and driving climate change. Understanding the intricacies of the carbon cycle is essential for assessing the impacts of these changes and developing effective mitigation strategies.
By studying the carbon cycle, scientists can identify carbon sources and sinks, measure carbon fluxes, and predict future trends. This knowledge is crucial for crafting policies aimed at reducing carbon emissions, enhancing carbon storage, and promoting sustainable practices. The carbon cycle's interplay with climate systems, ecosystems, and human activities underscores its importance in maintaining a stable and healthy planet.
In-depth exploration of the carbon cycle reveals the delicate balance required to sustain life and the urgent need to address anthropogenic influences. Through research, education, and policy, we can work towards restoring equilibrium in the carbon cycle and ensuring a sustainable future for generations to come.
UNDERSTANDING WHAT GREEN WASHING IS!.pdfJulietMogola
Many companies today use green washing to lure the public into thinking they are conserving the environment but in real sense they are doing more harm. There have been such several cases from very big companies here in Kenya and also globally. This ranges from various sectors from manufacturing and goes to consumer products. Educating people on greenwashing will enable people to make better choices based on their analysis and not on what they see on marketing sites.
Artificial Reefs by Kuddle Life Foundation - May 2024punit537210
Situated in Pondicherry, India, Kuddle Life Foundation is a charitable, non-profit and non-governmental organization (NGO) dedicated to improving the living standards of coastal communities and simultaneously placing a strong emphasis on the protection of marine ecosystems.
One of the key areas we work in is Artificial Reefs. This presentation captures our journey so far and our learnings. We hope you get as excited about marine conservation and artificial reefs as we are.
Please visit our website: https://kuddlelife.org
Our Instagram channel:
@kuddlelifefoundation
Our Linkedin Page:
https://www.linkedin.com/company/kuddlelifefoundation/
and write to us if you have any questions:
info@kuddlelife.org
WRI’s brand new “Food Service Playbook for Promoting Sustainable Food Choices” gives food service operators the very latest strategies for creating dining environments that empower consumers to choose sustainable, plant-rich dishes. This research builds off our first guide for food service, now with industry experience and insights from nearly 350 academic trials.
2. Outline
Schema and Schema Matching
Schema Heterogeneity & Data Interoperability
Large Scale Scenarios concerning Schema Matching and
Integration
Related Work
Our approach to handle Large Scale Scenario
PORSCHE (Performance Oriented Schema Mediation)
Future Research Directions
2
3. Schema
origin in Greek, meaning "shape“ or "plan"
From computer science perspective –
• description of the relationship of data/ information in some
structured way or
• a set of rules defining the relationship
or
• a model to represent the data
For example
• Relational Schema
• XML Schema
• Class Diagram ….
3
6. Web Interface Form Schema
From city or airport* To city or airport*
I f y o u a r e u n s u r e o f t h e s p e l l i n g o f a c i t y o r a i r p o r t , e n t e r t h e
f i r s t 3 o r m o r e l e t t e r s f o l l o w e d b y a n a s t e r i s k ( * ) .
Departure date Departure time
Jul 2008 23 Any Time
Wednesday
Return Date Return time
Jul 2008 24 Any Time
Thursday
Traveler types
Adults
(12-64 yrs)
1
Children
(2-11 yrs)
0
Seniors
(65+ yrs)
0
Infants (0-
23 months)
0
Cabin type
Coach
Direct or Non-Stop flights only
More search options
6
7. Schema Matching
7
• Takes two schemas/ontologies as input and produces a
mapping between elements of the two schemas that
correspond semantically to each other [Halevy05]
1-1 match
complex match
26,60 Harry Potter J. K. Rowling
11,50 Marie Des Juliette Benzoni
Intrigues
16,50 Nous Les Bernard Werber
Dieux
24 Pompei Robert Harris
price book-title author-name
Books
Source A
listed-price title a-fname a-lname
Books
Source B
8. Applications of Schema Matching
• Data Interoperability
• Data Integration
• Data Warehousing
• Catalogue Integration
• Web Services Discovery and
Composition
• Query over the Web
• ...
• Data Exchange
• E-commerce
• Agents Communication
• ...
8
Static
Dynamic
Contributing
Schema Set Not
Evolving >>
Matching and
Mapping is one
time process
Contributing
Schema Set
Evolving >>
Matching and
Mapping also
evolve
10. Schema Heterogeneity &
Data Interoperability
• A key roadblock for information integration!
• Different data sources speak their own schema
10
Consumer
Data Source
Data Source
Data Source
Hotels, Youth Centers
Lodges, Restaurants
Beaches, Volcanoes
Hotel, Restaurant,
AdventureSports,
HistoricalSites
12. Schema Integration and Mediation
• All concerned data sources schemas are merged together into one
schema, without any concept redundancy. i.e. similar concepts are
represented by one concept
• All the input data sources schemas are mapped to this integrated
schema, also called the mediated schema
12
Consumer
Data Source
Data Source
Data Source
Hotels, Youth Centers
Lodges, Restaurants
Beaches, Volcanoes
Hotel, Restaurant,
AdventureSports,
HistoricalSites
Mediation
13. Mediation
Schema Mapping is key to any data sharing architecture
13
[Tomasic et al. IEEE TKDE 1998].
Mediated Schema
Source n
Source 1 Source 2
mappings
...
wrapper wrapper wrapper
User Query
sub-query
sub-query
sub-query
14. Schema
Matching, Mapping, Integration & Mediation
14
S1
B C
S2
B1 C2
C1
Matching
S1
B C
S2
B1 C2
C1
Mapping
Merging/ Integration
Si
B C1
C
Mediation
Si
B C1
C
S1
B C
S2
B1 C2
C1
Finding similarities
between schemas
Final correspondences
between elements
of two schemas
Based upon schema
mappings, merging
schemas into one schema
Mappings from source
schemas to the integrated
schema for data interoperability
15. Different Research Domains - Mediation
15
Mediation
Distributed
Databases
Data
Warehousin
g
Data Mining
……………
Informatio
n Retrieval
Knowledge
Extraction
17. Large Scale Scenario
• Creating a mediated schema from two large schemas (with thousands
of nodes).
• For example Open Applications Group Integration Specification (OAGIS)1
XML schema instances with number of elements in thousands
• Creating a mediated schema from a large set of schemas (with
hundreds of schemas and thousands of nodes)
• For example creating a mediated web interface input form (schema) from
the hundreds of web interface forms (schemas) related to travel domain2
17
1. http://www.openapplications.org/
2. http://metaquerier.cs.uiuc.edu
Large scale schema matching and integration requires
automated approach
18. Related Work
18
Pre-Match
eTuner
[Lee&Doan 07]
Amid-Match
SCIA
[Wang et al 07]
Post-Match
COMA++
[Do et al 07,
Manakanatas06]
Tuning approach
Large Scale Schema Matching and
Integration Approaches
Incremental Holistic
Fragmentation Clustering Mining
Data-mining
Element
Level
Schema
Level
Tree-mining
COMA++
[Do&Rahm07]
BellFlower
[Smiljanic06]
DCM [He et al 04]
xClust
[Lee et al 02]
PORSCHE
[Saleem et al 08]
19. An approach to handle
Large Scale Scenario
Handle Schemas as Trees
Apply the Clustering Method
Use Tree Mining
Devise Hybrid Approach
19
Result
Automated Approach having
Good Time Performance with
Approximate Match Quality
20. From city or airport* To city or airport*
I f y o u a r e u n s u r e o f t h e s p e l l i n g o f a c i t y o r a i r p o r t , e n t e r t h e
f i r s t 3 o r m o r e l e t t e r s f o l l o w e d b y a n a s t e r i s k ( * ) .
Departure date Departure time
Jul 2008 23 Any Time
Wednesday
Return Date Return time
Jul 2008 24 Any Time
Thursday
Traveler types
Adults
(12-64 yrs)
1
Children
(2-11 yrs)
0
Seniors
(65+ yrs)
0
Infants (0-
23 months)
0
Cabin type
Coach
Direct or Non-Stop flights only
More search options
20
Schemas as trees – Web Interface Forms
absTravel
From
D_City
To
A_City
Departure
Date
D_Month
D_Day
D_Time
Return
Date
R_Month
R_Day
R_Time
CabinType
TravelerTypes
Adults
Children
Seniors
Infants
absTravel
D_City
D_Day
Return
D_Month
Departure
A_City
D_Time
CabinType
Adults
Children
Seniors
Infants
D_Day
D_Month
D_Time
TravlerTypes
From
To
Date
Date
[He et al. KDD 2004]
21. Schemas as trees – Relational Database
21
books
book_id
author_id
author
detail
name
publisher
title
pub_id name
book_id
book
title
author_id
author
name pub_id
publisher
name
book_id
detail
author_id pub_id
books
[Lee et al. CIKM 2006]
22. Schemas as trees – XML Schema
22
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="time">
<xs:complexType/>
</xs:element>
<xs:element name="day">
<xs:complexType/>
</xs:element>
<xs:element name="courseCode">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element ref="time"/>
<xs:element ref="day"/>
<xs:element ref="Instructor"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="arizonaCourses">
<xs:complexType>
<xs:sequence>
<xs:element ref="courseCode"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Instructor">
<xs:complexType/>
</xs:element>
</xs:schema>
arizonaCourses
courseCode
day
time place instructor
24. Schema Tree Benefit
• Tree structure for a data model inherently supports the contextual
meanings of the descendent nodes.
24
A
B
C
S1
D
A1
B1
C11
C1
S2
D
D
X
A
B C
D
S1
A1
B C11
C1
D D
S2
25. Element Level Clustering
• Clustering helps in target search space optimization
• Schema elements clustering based on label similarity
25
A
B
C
A1
B1
C4
C1
A
B
C2
A1
B1
C3
C5
D
D
S1 S2 S3 Si
Node Labels Similarity
C ≈ C1 ≈ C2 ≈ C3 ≈ C4 ≈ C5
t1 t2 t3 t4 …… tn
s1
s2
s3
s4
…
sm
a1
a2
a3
a4 …
aq
Typical matching scenario
26. Tree Mining Aspect
• Tree mining finds frequent sub-trees in a given set of trees;
• similar to schema matching, which finds similar concepts among a set of
schemas
• Use of data structures supporting tree mining algorithms for schema
matching is possible
• Helps in handling Large Scale Scenario
• Supports the context of nodes
26
computers
Desktop notebook
Software
Desktop notepad
27. Tree mining example
• Element Level Matching (sub-tree size 1)
• Structure Level Matching (sub-tree size > 1)
27
b
a p
n
t
n
b
a f
n
t
p i
n
b
d
a
f
t p r
a
n h b
t
a
n
b
t
b
p t ……