The document describes OWSCIS (Ontology and Web Service based Cooperation of Information Sources), a system for integrating heterogeneous data sources. It has a hybrid ontology approach using local ontologies mapped to a reference ontology. Data sources are wrapped as web services and mapped to local ontologies. A mapping web service maps local ontologies to the reference ontology. Queries in the reference ontology are decomposed into subqueries over the local ontologies, resolved individually, and recomposed. The system was implemented in Java using Jena, JDBC, and WordNet APIs. Future work includes mapping additional data models like XML and enriching ontologies with complex formulas.
OWSCIS Ontology and Web Service based Cooperation of Information Sources
1. OWSCIS
Ontology and Web Service based
Cooperation of Information Sources
SITIS'2007, Shanghai, China, December 2007
Raji Ghawi, Thibault Poulain, Guillermo Gomez and Nadine Cullot
Laboratoire Electronique, Informatique et Image
UMR CNRS 5158
Université de Bourgogne, Dijon, FRANCE
2. 2
Outlines
Introduction
Related works
OWSCIS Architecture
Mapping Process
Querying Process
Decomposition
Resolution
Recomposition
Future Works
5. 5
Related Works (features)
Information Sources
DB, XML, files, ...
Dynamic .vs. Static
Architecture
Agent-based
Mediator-based
Use of Ontology
Single, Multiple, Hybrid
Ontology Specification Language
Query Process
6. 6
1. OWSCIS
Ontology and Web Service - based Cooperation of Information Sources
Hybrid Ontology Approach
Local Ontologies and Reference Ontology
Web Service - Oriented Architecture
GAV (Global As View).
7. 7
OWSCIS Architecture
Knowledge Base
Visualisation
Web Service
Data Provider Data Provider
End
User
Data Provider
Querying
Web Service
Mapping
Web Service
Local
Ontology
Reference
Ontology
Local
Ontology
Local
Ontology
8. 8
Knowledge Base
Mapping
Directory
Reference Ontology
Tool Box
Reference ontology
specific knowledge domaine
OWL-DL
Mapping directory
correspondances between terms of reference
and local ontologies
Toolbox
tools and methods used by the mapping web
service
Knowledge Base
Visualisation
Web Service
Data Provider Data Provider
End
User
Data Provider
Querying
Web Service
Mapping
Web Service
Local
Ontology
Reference
Ontology
Local
Ontology
Local
Ontology
9. 9
Data Provider
Knowledge Base
Visualisation
Web Service
Data Provider Data Provider
End
User
Data Provider
Querying
Web Service
Mapping
Web Service
Local
Ontology
Reference
Ontology
Local
Ontology
Local
Ontology
Information sources are wrapped
to a local ontology
using DB2OWL for databases
Two types of mappings
Local ontology has no instances
DB2OWL
Database
Mappings
Local Onto ↔ Ref. Onto
DB ↔ Local Onto
Local Ontology
10. 10
Mapping Web Service
Knowledge Base
Visualisation
Web Service
Data Provider Data Provider
End
User
Data Provider
Querying
Web Service
Mapping
Web Service
Local
Ontology
Reference
Ontology
Local
Ontology
Local
Ontology
Concepts
Similarity
Roles
Similarity
Refining
Mappings
Estimation
Mapping Web Service
Mappings
Directory
Reference
Ontology
Tool Box
Local Ontology
Mapping local ontologies to the
reference domain ontology
Comparing ontologies using the
methods defined in the toolbox
Storing produced mappings into
the appropriate data provider
Updating the mappings directory
11. 11
Querying Web Service
Query
Decomposition
Query
Recomposition
Querying Web Service
End
User
Knowledge Base
Visualisation
Web Service
Data Provider Data Provider
End
User
Data Provider
Querying
Web Service
Mapping
Web Service
Local
Ontology
Reference
Ontology
Local
Ontology
Local
Ontology
Mappings
Directory
Reference
Ontology
Visualisation
Web Service
Mapping users submit queries in
terms of the reference ontology
A query is decomposed into a set
of modular queries using the
mapping directory
Each sub-query is resolved in a
data provider
Partial results are recomposed
giving the final query result
12. 12
Visualisation Web Service
Knowledge Base
Visualisation
Web Service
Data Provider Data Provider
End
User
Data Provider
Querying
Web Service
Mapping
Web Service
Local
Ontology
Reference
Ontology
Local
Ontology
Local
Ontology
Visualisation
Web Service
End
User
Querying
Web Service
Visualization of the reference
ontology
Visualization of the queries and
their results.
Enriching the results of the query
using the semantic information
described in the reference
ontology.
13. 13
Mapping Process
1. Preprocessing:
cleaning up the data
2. Similarity estimation:
giving a numerical similarity estimation value to all pairs of concepts
Semantic : extract known words from the concepts names and perform a semantic
similarity estimation over them.
Structural : estimate structural similarity by comparing concept names as a string.
3. Refining:
solving cases where the similarity value between two concepts is neither high
enough nor low enough to determine whether there is an equivalence or not.
4. Exploitation:
Translating similarities from their numerical values into mappings.
Automatically: producing the overall mapping between the two ontologies
Iteratively: the program suggests what appears to be the best mapping, and let the
expert validate or not the choice.
14. 14
Query Process: Overview
Query Web Service
recompose
decompose
Visualization
Web Service
Mapping
Directory
…
SPARQL
User query
SPARQL
subquery1
SPARQL
result1
SQL
query1
SQL
Result1
Database
Local Ontology
DB ↔
Local Onto
rewrite
translatereformulate
Ref. Onto ↔
Local Onto
SQL
statementsDB2OWL
Mapping
Web Service
Data Provider
17. 17
SPARQL-to-SQL translation
BGP Construction
lo2:firstName SELECT person.personId AS C0,
person.firstName AS C1
FROM person
lo2:lastName SELECT person.personId AS C0,
person.lastName AS C1
FROM person
lo2:session.lecturer SELECT session.sessionId AS c0,
lecturer.lecturerId AS c1
FROM session, lecturer
WHERE session.lecturerId = lecturer.lecturerId
lo2:session.module SELECT session.sessionId AS c0,
module.moduleId AS c1
FROM session, module
WHERE session.moduleId = module.moduleId
lo2:module.name SELECT module.moduleId AS c0,
module.moduleName AS c1
FROM module
?s
?l ?m
?ln «Database»
:session.lecturer
:lastName :module.name
:session.module
?fn
:firstName
SELECT ?ln ?fn
WHERE {
?s lo2:session.lecturer ?l .
?s lo2:session.module ?m .
?m lo2:module.name "Database".
?l lo2:lastName ?ln .
?l lo2:firstName ?fn .
}
V0
V1
V2
V3
V4
18. 18
SPARQL-to-SQL translation
Joining Statements
SELECT V0.C1 AS ln, V1.C1 AS fn
FROM
(SELECT person.personId AS C0, person.firstName AS C1 FROM person) AS V0,
(SELECT person.personId AS C0, person.lastName AS C1 FROM person) AS V1,
(SELECT session.sessionId AS C0, lecturer.lecturerId AS C1 FROM session, lecturer
WHERE (session.lecturerId = lecturer.lecturerId)) AS V2,
(SELECT session.sessionId AS C0, module.moduleId AS C1 FROM session, module
WHERE (session.moduleId = module.moduleId)) AS V3,
(SELECT module.moduleId AS C0, module.moduleName AS C1 FROM module) AS V4
WHERE (V0.C0 = V1.C0) AND (V2.C1 = V0.C0)
AND (V2.C0 = V3.C0) AND (V3.C1 = V4.C0)
AND (V4.C1 = 'Database')
?s
?l ?m
?ln «Database»
:session.lecturer
:lastName :module.name
:session.module
?fn
:firstNameV0 V1
V2 V3
V4
C0
C1
C0
C1
C0
C1
C0
C1
C0
C1
19. 19
Query Recomposition
Each sub-query result forms a new relation.
These relations are joined using shared variables.
?t
Π (<R1> <R2>)
?ln, ?fn
R1(?ln,?fn,?t) R2(?ln,?fn)
SELECT ?ln ?fn
WHERE {
?s ro:session_lecturer ?l .
?s ro:session_module ?m .
?m ro:module_name "Database" .
?l ro:lastName ?ln .
?l ro:firstName ?fn .
}
SELECT ?ln ?fn ?t
WHERE {
?b ro:title ?t .
?b ro:book_author ?a .
?a ro:lastName ?ln .
?a ro:firstName ?fn .
}
21. 21
Future works
Map other data models to ontologies
Object-Oriented Databases
XML documents
Ontology enrichment
Adding complex formulas to the local ontology
Translating queries over them into SQL
It is known that the interoperability problem is hard to achieve due to several difficulties, such as: 1) the distribution of information sources over multiple sites. 2) the heterogeneity of information which may be at syntactic, structural and semantic levels. And 3) the instability of information sources which means that these sources may appear or disappear or they may change their contents.
Several solutions are proposed for the interoperability problem, including significant efforts that are based on the use of ontologies as explicit descriptions of the semantic of information sources and as unified model agreed by multiple parties. In addition, web services recently gain increasing importance due to their autonomy and remote accessibility, so they provide a new level of interoperability.
There are many systems that propose to solve the problem of interoperability between distributed heterogeneous information systems using ontologies. However, we refer to the most known systems in this area such as :
BUSTER, SIMS, KRAFT, COIN, Carnot, InfoSleuth, OBSERVER
There are several criteria that are usually used to compare ontology-based integration systems, such as: the types of information sources involved in the integration, the architecture type, the use of ontologies, the ontology representation language and the query processing.
we are developing an interoperability system called OWSCIS, that uses ontologies and web services technologies for cooperating distributed heterogeneous information systems. OWSCIS stands for Ontology and Web Services based Cooperation of Information Sources. This system is a hybrid ontology approach; which means that a local ontology is used for each information source, as well as a global ontology which is a reference for participating local ontologies. The architecture of OWSCIS is service-oriented, most of OWSCIS parts are developed as web-services , each of them aims at performing a specific task such as: mapping, querying, and visualizing. OWSCIS is a Global As View approach, this means that the end user can query different information sources using a global model which is the reference ontology.
It consists of several modules and web services, each of them aims at performing a specific task. A data provider is a wrapper that associates an information source with its local ontology. A knowledge base is a unique mediator used to encapsulate the global ontology with a toolbox and a directory for participating data providers. A mapping web service is used to establish mappings between the local ontologies and the global one. User’s queries are submitted only on the reference ontology via a query web service that analyses the queries, decompose them into sub-queries which are redelivered to the relevant data providers. Finally, a visualization web service performs the tasks of suitably presenting the obtained results to the end user.
First, the knowledge base module is the central part of the architecture, it includes the reference ontology, a mapping directory and a mapping toolbox. The reference ontology describes a specified knowledge domain. It represents the global model for local ontology models and is supposed to cover all the local domains.
The mapping directory contains information about the mappings between the reference ontology and the different local ontologies. The toolbox contains a set of tools and methods that are used by the mapping web service to estimate the similarity between ontologies components.
A data provider encapsulates an information source incorporated in the cooperation system. It includes a local ontology representing the semantics of this information source. It includes also two types of mappings: information source to local ontology mapping, and local ontology to reference ontology mapping. In the case where the local ontology does not already exist, it may be generated from the information source using DB2OWL tool that also generates a suitable mapping document. This tool is the focus of the reminder of this presentation.
The mapping web service is used to map the local ontologies to the reference ontology. The two ontologies are compared using the methods defined inside the knowledge base toolbox, this comparison is based on the estimation of similarity between ontologies components using semantic and structural methods chosen by a human expert. The produced inter-ontology mappings will be stored into the data provider, and the mappings directory will be suitably updated.
End users can query the system using the querying web service. When a query (expressed in SPARQL language) is submitted to the system, it is analyzed by this service and decomposed into a set of sub-queries using the mapping directory. The single queries are then redirected to the suitable data providers. When a SPARQL query is received by a data provider service, it is translated into an SQL query using the mappings between the database and the local ontology. The SQL query is executed in the database and its result is encapsulated as a SPARQL answer and returned to the query web service. The query web service collects the responses returned from data provider services and recomposes them in one coherent response.
The visualization service proposes different functionalities including the visualization of the reference ontology as well as the visualization of the queries and their results.
It offers a user-friend interface allowing the navigation through the reference ontology and visualizing the concept hierarchy, the properties with their domain and range.
The visualization service can also be used to show the results of a query in a dynamic and convivial way. The main idea is to use the semantic information described in the reference ontology to enrich the results of the query allowing a more intelligent visualization of them.
The mapping process is composed of four main steps: preprocessing, similarity estimation, refining and exploitation.
After a preprocessing step that cleans up the data, similarity estimation is computed between all pairs of concepts using methods defined in the toolbox. Both semantic and structural methods are used : on one hand, we extract known words from the concepts names and perform a semantic similarity estimation over them. on another hand, we estimate structural similarity by comparing concept names as a string. The various results are normalized and combined using a weighted mean. The resulting table of values is then refined.
The refining step allows to solve cases where the similarity value between two concepts is neither high enough nor low enough to determine whether there is an equivalence or not.
Once similarities are estimated, they must be translated from their numerical values into mappings. This translation can be done automatically, producing the overall mapping between the two ontologies, or iteratively. In that latter case, the program suggests what appears to be the best mapping, and let the expert validate or not the choice.
when a data provider connects to the system, the DB2OWL tool generates a local ontology from the database and a mapping document. Then, the mapping web service maps this local ontology to the reference ontology and stores the resulting mappings localy in the data provider and updates suitably the mapping directory. A set of SQL statements is created from the database-to-ontology mapping document, one for each term in the local ontology. The user submit queries to the querying web service in terms of the reference ontology using SPARQL language. When a query arrives to the query web service it will be decomposed using the mapping directory into a set of sub-queries, each of them will be sent to one data provider. When a data provider receives a sub-query it rewrites it in terms of the local ontology using the reference-to-local ontology mapping document. Then the rewritten query is translated into an equivalent SQL query using the SQL statements already generated. this SQL query is solved over the database and its results are reformulated as SPARQL result in XML format, this result is returned to the querying web service. When all partial results are collected from different data providers, they are recomposed giving the final answer of the user&apos;s query. This answer is sent to the visualization web service that displays it suitably to the user.
In order to illustrate the phase of query decomposition, we use this example of two local ontologies and a reference ontology.
The first ontology describes Authors and References ... .
The second describes Sessions with their lecturers, halls and modules ...
The reference ontology here is merges both local ontologies (Author is considered as sub-concept of Person )
Let us consider this SPARQL query that retrieves ( the titles of books written by lecturers of a &quot;Database&quot; module ) . the mentioned variables represents the following concepts:
?b -&gt; book, ?a -&gt; author, ?ln -&gt; last name, ?fn -&gt; first name, ?s -&gt; session, ?l -&gt; lecturer, ?m -&gt; module.
We note that the first 4 triple patterns correspond to the first local ontology, so the first sub-query will contains only this triples.
the last 5 triple patterns of user&apos;s query correspond to the second query, so the 2nd sub-query contains only these triples.
the variables that are shared between sub-queries (such as ?ln and ?fn) become selected variables in each sharing sub-query.
after this decomposition each sub-query is rewriten in terms of its local ontology. we note that the namespace prefixes of predicates are changed from &lt;ro&gt; to &lt;lo1&gt; in the first sub-query and &lt;lo2&gt; in the second. some predicate names may also change (such as &lt;firstName&gt; wich become &lt;author.firstName&gt; in the 1st sub-query)
Now, in order to translate the sub-query into an equivalent SQL query, a graph is established from the SPARQL query where the nodes represent subjects and objects of triples and the edges represent the predicates.
At the other hand, we have already SQL statements for ontology terms. A statement that corresponds to a property has two selected columns representing the domain and the range of the corresponding property. These columns are given two aliases (C0 and C1 respectively).
For each term mentioned in the query, we get its corresponding statement and give it a unique alias .. V1, V2, ...
Each edge in this graph is associated with the suitable SQL statement (from those mentioned above) and a unique alias is generated for this statement. The start node of the edge is associated to the first selected column in the statement (C0) and the end node is associated to the second one (C1). The set of statements representing all the edges in the graph form the FROM clause of the final SQL query.
When two (or more) edges share a variable node, then there equivalent statements will be joined depending on the columns representing the shared node and the direction of the edge (incoming or outgoing). for example edges V2 and V3 share the variable ?s at start so they will be joined using &lt;V2.C0 = V3.C0&gt; ()
If an edge has a literal end node, then the equivalent statement will be restricted using a logical condition in which the column equivalent to the node equals the literal value.
Distributed sub-queries are now solved at the data provider level, and the results (formatted as XML documents) are returned to the query web service. These results are recomposed depending on shared variables. If two queries share the same variable, their results will be joined on the shared variable. The joined results will be projected on the variables selected be the user in the original query.
For example, sub-query2 has returned values for variables ?fn and ?ln, and sub-query1 has returned values for variables ?fn, ?ln and ?t, therefore the results of two sub-queries are joined on variables ?fn and ?ln. In other words, sub-query1 results are restricted to those in which ?fn and ?ln has values in sub-query2 results, then they are projected on ?t since the global query were asking for values of ?t.
OWSCIS architecture is partially implemented in Java using Jena API, JDBC, WordNet API and other available APIs. Implemented parts are: DB2OWL, SPARQL-to-SQL translator and inter-ontology mapping module.