Enabling Technologies for Interoperability
Ubbo Visser, Heiner Stuckenschmidt, Holger Wache, Thomas V¨gele
TZI, Center for Computing Technologies
University of Bremen
D-28215 Bremen, Germany
Abstract formation integration. The solution of information in-
tegration applies here because existing information can
We present a new approach, which proposes to mini- be accessed by remote systems in order to supplement
mize the numerous problems existing in order to have
their own data basis.
fully interoperable GIS. We discuss the existence of
these heterogeneity problems and the fact that they The advantages of successful information integration
must be solved to achieve interoperability. These prob- are obvious for many reasons:
lems are addressed on three levels: the syntactic, struc- • Quality improvement of data due to the availability
tural and semantic level. In addition, we identify the
needs for an approach performing semantic translation
of large and complete data.
for interoperability and introduce a uniform descrip- • Improvement of existing analysis and application of
tion of contexts. Furthermore, we discuss a conceptual the new analysis.
architecture Buster (Bremen University Semantic
Translation for Enhanced Retrieval) which can pro- • Cost reduction resulting from the multiple use of ex-
vide intelligent information integration based on a re- isting information sources.
classiﬁcation of information entities in a new context.
• Avoidance of redundant data and conﬂicts that can
Lastly, we demonstrate our theories by sketching a real
life scenario. arise from redundancy.
However, before we can establish eﬃcient informa-
tion integration, diﬃculties arising from organizational,
Introduction competence questions and many other technical prob-
Over the last few years much work has been con- lems have to be solved. Firstly, a suitable information
ducted in regards to the research topic fully interopera- source must be located which contains the data needed
ble GIS. Vckovski (Vckovski, 1998) for example gives for a given task. Once the information source has been
an overview of the problems regarding data integra- found, access to the data contained therein has to be
tion and geographical information systems. Further- provided. Furthermore, access has to be provided on
more, the proceedings of the 2nd International Con- both a technical and an informational level. In short,
ference on Interoperating Geographic Information Sys- information integration not only needs to provide full
tems (Interop99) (Vckovski et al., 1999) consists of nu- accessibility to the data, it also requires that the ac-
merous contributions about this research topic (e. g. cessed data may be interpreted by the remote system.
(Wiederhold, 1999), (Landgraf, 1999)). In addition, re- While the problem of providing access to information
cent studies in areas such as data warehousing (Wiener has been largely solved by the invention of large-scale
et al., 1996) and information integration (Galhardas computer networks, the problem of processing and in-
et al., 1998) have also addressed interoperability prob- terpreting retrieved information remains an important
lems. research topic. This paper will address three of the
GIS’s share the need to store and process large problems mentioned above:
amounts of diverse data, which is often geographically
distributed. Most GIS’s use speciﬁc data models and • ﬁnding suitable information sources,
databases for this purpose. This implies, that making • enabling a remote system to process the accessed
new data available to the system requires the data to be data,
transferred into the system’s speciﬁc data format. This • and solutions to help the remote system interpreting
is a process which is very time consuming and tedious. the accessed data as intended by its source.
Data acquisition, automatically or semi-automatically,
often makes large-scale investment in technical infras- In addressing these questions we will explore tech-
tructure and/or manpower inevitable. These obstacles nologies which enable systems to interoperate, always
are some of the motivation behind the concept of in- bearing in mind the special needs of GIS.
Levels of Integration Our modern information soci- Structural Integration
ety requires complete access to all information available.
The task of structural data integration is, to re-format
The opening of information systems towards integrated
the data structures to a new homogeneous data struc-
access, which has been encouraged in order to satisfy
ture. This can be done with the help of a formalism
this demand, creates new challenges for many areas of
that is able to construct one speciﬁc information source
computer science. In this paper, we distinguish diﬀer-
out of numerous other information sources. This is a
ent integration tasks, that need to be solved in order to
classical task of a middleware which can be done with
achieve complete integrated access to information:
CORBA (OMG, 1992) on a low level or rule-based me-
diators (Wiederhold, 1992) on a higher level.
Syntactic Integration: Many standards have Mediators provide ﬂexible integration of several infor-
evolved that can be used to integrate diﬀerent informa- mation systems such as database management systems,
tion sources. Beside classical database interfaces such GIS, or the world wide web. A mediator combines,
as ODBC, web-oriented standards such as HTML and integrates, and abstracts the information provided by
XML are gaining importance. the sources. Normally the sources are encapsulated by
Over the last few years numerous mediators have
Structural Integration: The ﬁrst problem that been developed. A popular example is the rule-driven
passes a purely syntactic level is the integration of het- TSIMMIS mediator (Chawathe et al., 1994), (Papakon-
erogeneous structures. This problem is normally solved stantinou et al., 1996). The rules in the mediator de-
by mediator systems deﬁning mapping rules between scribe how information of the sources can be mapped
diﬀerent information structures. to the integrated view. In simple cases, a rule mediator
converts the information of the sources into information
on the integrated view. The mediator uses the rules to
Semantic Integration: In the following, we use split the query, which is formulated with respect to the
the term semantic integration or semantic translation, integrated view, into several sub-queries for each source
respectively, to denote the resolution of semantic con- and combine the results according to query plan.
ﬂicts, that make a one to one mapping between concepts A mediator has to solve the same problems which are
or terms impossible. discussed in the federated database research area, i. e.
structural heterogeneity (schematic heterogeneity) and
Our approach provides an overall solution to the semantic heterogeneity (data heterogeneity) (Kim and
problem of information integration, taking into ac- Seo, 1991), (Naiman and Ouksel, 1995), (Kim et al.,
count all three levels of integration and combining 1995). Structural heterogeneity means that diﬀerent
several technologies, including standard markup lan- information systems store their data in diﬀerent struc-
guages, mediator systems, ontologies, and a knowledge- tures. Semantic heterogeneity considers the content
based classiﬁer. and semantics of an information item. In rule-based
mediators, rules are mainly designed in order to rec-
oncile structural heterogeneity. Where as discovering
Enabling Technologies semantic heterogeneity problems and their reconcilia-
tion play a subordinate role. But for the reconciliation
In order to overcome the obstacles mentioned earlier,
of the semantic heterogeneity problems, the semantic
it is not suﬃcient to solve the heterogeneity problems
level must also be considered. Contexts are one possi-
separately. It is important to note that these problems
bility to describe the semantic level. A context contains
can only be solved with a system taking all three levels
”meta data relating to its meaning, properties (such as
of integration into account. In the following subsections
its source, quality, and precision), and organization”
we will give a short introduction to what we mean by
(Kashyap and Sheth, 1997). A value has to be consid-
problems concerning the syntactic, structural and se-
ered in its context and may be transformed into another
context (so-called context transformation).
Syntactic Integration Semantic Integration
The typical task of syntactic data integration is, to spec- The semantic integration process is by far the most
ify the information source on a syntactic level. This complicated process and presents a real challenge. As
means, that diﬀerent data type problems can be solved with database integration, semantic heterogeneities are
(e. g. short int vs. int and/or long). This ﬁrst data ab- the main problems that have to be solved within spa-
straction is used to re-structure the information source. tial data integration (Vckovski, 1998). Other authors
The standard technology to overcome problems on from the GIS community call this problem inconsisten-
this level are wrappers. Wrappers hide the internal data cies (Shepherd, 1991). Worboys & Deen (Worboys and
structure model of a source and transform the contents Deen, 1991) have identiﬁed two types of semantic het-
to a uniform data structure model. erogeneity in distributed geographic databases:
• Generic semantic heterogeneity: Heterogeneity re- speciﬁes elements that can be used in an XML docu-
sulting from ﬁeld- and object-based databases. ment. In the document, the elements are delimited by
a start and an end tag. It has a type and may have a
• Contextual semantic heterogeneity: Heterogeneity
set of attribute speciﬁcations consisting of a name and
based on diﬀerent meanings of concepts and schemes.
The generic semantic heterogeneity is based on the The additional constraints in a DTD refer to the log-
diﬀerent concepts of space or data models being used. ical structure of the document, this especially includes
In this paper, we will focus on contextual semantic het- the nesting of tags inside the information body that
erogeneity which is based on diﬀerent semantics of the is allowed and/or required. Further restrictions that
local schemata. can be expressed in a DTD concern the type of the at-
In order to discover semantic heterogeneities, a for- tributes and default values to be used when no attribute
mal representation is needed. Lately, WWW standard- value is provided.
ized markup languages such as XML and RDF have
been developed by the W3C community for this pur- Schema Deﬁnitions and Mappings: An XML
pose (W3C, 1998), (W3C, 1999). We will describe the schema itself is, an XML document deﬁning the valid
value of these languages for the semantic description structure of an XML document in the spirit of a DTD.
of concepts and also argue that we need more sophisti- The elements used in a schema deﬁnition are of the
cated approaches to overcome the semantic heterogene- type ’element’ and have attributes that are deﬁning the
ity problem. restrictions already mentioned above. The information
in such an element is a list of further element deﬁnitions
Ontologies have been identiﬁed to be useful for the in- that have to be nested inside the deﬁned element.
tergration/interoperation process (Visser et al., 2000). Furthermore, XML schema have some additional fea-
The advantages and disadvantages of this technology tures that are very useful to deﬁne data structures such
will be discussed in a separate subsection. as:
Ontologies can be used to describe information • Support for basic data types.
sources. However, how does the actual integration pro- • Constraints on attributes such as occurrence con-
cess work? This will be brieﬂy discussed in the following straints.
subsections. We call this process semantic mapping.
• Sophisticated structures such as type deﬁnition de-
XML/RDF and semantic modeling XML and rived by extending or restricting other types.
RDF have been developed for the semantic description
of information sources. • A name-space mechanism allowing the combination
of diﬀerent schemata.
XML – Exchanging Information: In order to We will not discuss these features at length. How-
overcome the purely visualization-oriented annotation ever, it should be mentioned that the additional fea-
provided e. g. by HTML, XML was proposed as an ex- tures make it possible to encode rather complex data
tensible language allowing the user to deﬁne his own structures. This enables us to map data-models of ap-
tags in order to indicate the type of it’s content. There- plications from whose information we want to share
fore, it followed that the main beneﬁt of XML lies ac- with others on an XML schema. From this point, we
tually in the opportunity to exchange data in a struc- can encode our information in terms of an XML docu-
tured way. Recently, this idea has been emphasized by ment and make it (together with the schema, which is
introducing XML schemata that could be seen as a def- also an XML document) available over the internet.
inition language for data structures. In the following This procedure has a big potential in the actual ex-
paragraphs we sketch the idea behind XML and de- changing of data. However, the user must to commit
scribe XML schema deﬁnitions and their potential use to our data-model in order to make use of the informa-
for data exchange. tion. We must point out that an XML schema deﬁnes
the structure of data providing no information about
the content or the potential use for others. Therefore,
The General Idea: A data object is said to be it lacks an important advantage of meta-information.
XML document if it follows the guidelines for well- We argued that XML is designed to provide an inter-
formed XML documents provided by the W3C com- change format for weakly structured data by deﬁning
munity. The speciﬁcation provide a formal grammar the underlying data-model in a schema and by using
used in well-formed documents. In addition to the gen- annotations, from the schema, in order to clarify the
eral grammar, the user can impose further grammatical role of single statements. Two things are important in
constraints on the structure of a document using a doc- this claim from the information sharing point:
ument type deﬁnition (DTD). A XML document is valid
• XML is purely syntactic/structural in nature.
if it has an associated type deﬁnition and complies to
the grammatical constraints of that deﬁnition. A DTD • XML describes data on the object level.
Consequently, we have to ﬁnd other approaches if we Semantic modeling: After introducing the W3C
want to describe information on the meta level and de- standards for information exchange and meta-data an-
ﬁne its meaning. In order to ﬁll this gap, the RDF notation we have to investigate their usefulness for in-
standard has been proposed as a data model for repre- formation integration with reference to the three lay-
senting meta-data about web pages and their content ers of integration (see section ). Firstly, we previously
using an XML syntax. discovered that XML is only concerned with the issue
of syntactic integration. However, XML deﬁnes struc-
tures as well, except there are no sophisticated mecha-
RDF – A Standard Format: The basic model un-
nism for mapping diﬀerent structures. Secondly, RDF
derlying RDF is very simple, every kind of information
is designed to provide some information on the semantic
about a resource which may be a web page or an XML
level, by enabling us to include meta-information in the
element is expressed in terms of a triple (resource, prop-
description of a web-page. In the last section we men-
tioned, RDF in it’s current state fails to really provide
Thereby, the property is a two-placed relation that semantic descriptions. Rather it provides a common
connects a resource to a certain value of that property. syntax and a basic vocabulary that can be used when
This value can be a simple data-type or a resource. Ad- describing this meta-data. Fortunately, the designers of
ditionally, the value can be replaced by a variable rep- RDF are aware that there is a strong need for an addi-
resenting a resource that is further described by nested tional ’logical level’ which deﬁnes a clear semantics for
triples making assertions about the properties of the RDF-expressions and provides a basis for integration
resource that is represented by the variable. Further- mechanisms.
more, RDF allows multiple values for a single prop-
Our conclusion about current web standards is that
erty. For this purpose, the model contains three build-
using XML and especially XML schemata is a suit-
in data types called collections, namely an unordered
able way of exchanging data with a well deﬁned syn-
lists (bag), ordered lists (seq), and sets of alternatives
tax and structure. Furthermore, simple RDF provides
(alt) providing some kind of an aggregation mechanism.
a uniform syntax for exchanging meta-information in
A further requirement arising from the nature of the
a machine-readable format. However, in their current
web is the need to avoid name-clashes that might oc-
state neither XML nor RDF provides suﬃcient support
cur when referring to diﬀerent web-sites that use diﬀer-
for the integration of heterogeneous structures or dif-
ent RDF-models to annotate meta-data. RDF deﬁnes
ferent meanings of terms. There is a need for semantic
name-spaces for this purpose. Name-spaces are deﬁned
modeling and reasoning about structure and meaning.
by referring to an URL that provides the names and
Promising candidates for semantic modeling approaches
connecting it to a source id that is then used to an-
can be found in the areas of knowledge representation,
notate each name in an RDF speciﬁcation deﬁning the
as well as, in the distributed databases community. We
origin of that particular name: source id:name
will discuss some of these approaches in the following
A standard syntax has been developed to express section.
RDF-statements making it possible to identify the
statements as meta-data, thereby providing a low level Ontologies Recently, the use of formal ontologies to
language for expressing the intended meaning of infor- support information systems has been discussed (Guar-
mation in a machine processable way. ino, 1998), (Bishr and Kuhn, 1999). The term ’Ontol-
ogy’ has been used in many ways and across diﬀerent
communities (Guarino and Giaretta, 1995). If we want
RDF/S – A Basic Vocabulary: The very simple to motivate the use of ontologies for information inte-
model underlying ordinary RDF-descriptions leave a lot gration we have to deﬁne what we mean when we refer
of freedom for describing meta-data in arbitrary ways. to ontologies. In the following sections, we will intro-
However, if people want to share this information, there duce ontologies as an explication of some shared vo-
has to be an agreement on a standard core of vocabulary cabulary or conceptualization of a speciﬁc subject mat-
in terms of modeling primitives that should be used to ter. Further, we describe the way an ontology explicates
describe meta-data. RDF schemes (RDF/S) attempt concepts and their properties and ﬁnally argue for the
to provide such a standard vocabulary. beneﬁt of this explication in many typical application
Looking closer at the modeling components, re- scenarios.
veals that RDF/S actually borrows from frame sys-
tems well known from the area of knowledge represen-
tation. RDF/S provides a notion of concepts (class), Shared Vocabularies and Conceptualizations:
slots (property), inheritance (SubclassOf, SubslotOf) In general, each person has an individual view on the
and range restrictions (Constraint Property). Unfortu- world and the things he/she has to deal with every
nately, no well-deﬁned semantics exist for these model- day. However, there is a common basis of understanding
ing primitives in the current state. Further, parts such in terms of the language we use to communicate with
as the re-identiﬁcation mechanism are not well deﬁned each other. Terms from natural language can there-
even on an informal level. Lastly, there is no reasoning fore, be assumed to be a shared vocabulary relying on
support available, not even for property inheritance. a (mostly) common understanding of certain concepts
with very little variety. This common understanding re- context knowledge by an ontology can be compared:
lies on speciﬁc idea of how the world is organized. We
often call these ideas a conceptualization of the world. Level of Formality:
These conceptualizations provide a terminology that The speciﬁcation of a conceptualization and its implicit
can be used for communication between people. context knowledge, can be done at diﬀerent levels of for-
The example of our natural language demonstrates, mality. As already mentioned above, a glossary of terms
that a conceptualization cannot be universally valid, can also be seen as an ontology, despite its purely in-
but rather a limited number of persons committed to formal character. A ﬁrst step to gain more formality, is
that particular conceptualization. This fact is reﬂected to describe a structure to be used for the description.
in the existence of diﬀerent languages which diﬀer even A good example of this approach is the standard web
more (English and Japanese) or much less (German and annotation language XML (see section ). The DTD is
Dutch). Confusion can become worse when we are con- an ontology describing the terminology of a web page
sidering terminologies developed for a special scientiﬁc on a low level of formality. Unfortunately, the rather in-
or economic areas. In these cases, we often ﬁnd situ- formal character of XML encourages its misuse. While
ations where one term refers to diﬀerent phenomena. the hierarchy of an XML speciﬁcation was originally
The use of the term ’ontology’ in philosophy and in designed to describe a layout, it can also be exploited
computer science serves as an example. The conse- to represent sub-type hierarchies, (van Harmelen and
quence of this confusion is, a separation into diﬀerent Fensel, 1999) which may lead to confusion. Fortunately,
groups, that share terminology and its conceptualiza- this problem can be solved by assigning formal seman-
tion. These groups are then called information commu- tics to the structures used for the description of the
nities. ontology. An example of this is the conceptual model-
The main problem with the use of a shared termi- ing language CML (Schreiber et al., 1994). CML oﬀers
nology according to a speciﬁc conceptualization of the primitives that describe a domain which can be given
world is that much information remains implicit. When a formal semantic in terms of ﬁrst order logic (Aben,
a mathematician talks about a binomial normal he is 1993). However, a formalization is only available for
referring to a wider scope than just the formula itself. the structural part of a speciﬁcation. Assertions about
Possibly, he will also consider its interpretation (the terms and the description of dynamic knowledge is not
number of subsets of a certain size) and its potential formalized which oﬀers total freedom for a descrip-
uses (e. g. estimating the chance of winning in a lot- tion. On the other, there are speciﬁcation languages
tery). which are completely formal. A prominent example is
Ontologies set out to overcome this problem of im- the Knowledge Interchange Format (KIF) (Genesereth
plicit and hidden knowledge by making the conceptu- and Fikes, 1992) which was designed to enable diﬀerent
alization of a domain (e. g. mathematics) explicit. This knowledge-based systems to exchange knowledge. KIF
corresponds to one of the deﬁnitions of the term ontol- has been used as a basis for the Ontolingua language
ogy most popular in computer science (Gruber, 1993): (Gruber, 1991) which supplies formal semantics to that
language as well.
An ontology is an explicit speciﬁcation of a conceptual-
Extend of Explication:
The other comparison criterion is, the extend of ex-
plication that is reached by the ontology. This crite-
An ontology is used to make assumptions about the
rion is strongly connected with the expressive power of
meaning of a term available. It can also be viewed an
the speciﬁcation language used. We already mentioned
explication, of the context a term, it is normally used
DTD’s which are mainly a simple hierarchy of terms.
in. Lenat (Lenat, 1998) for example, describes context
Furthermore, we can generalize this by saying that, the
in terms of twelve independent dimensions that have
least expressive speciﬁcation of an ontology consists of
to be know in order to understand a piece of knowledge
an organization of terms in a network using two-placed
completely. He also demonstrates how these dimensions
relations. The idea of this goes back to the use of se-
can be explicated, using the ’Cyc’ ontology.
mantic networks in the seventies. Many extensions of
the basic idea examined have been proposed. One of the
Speciﬁcation of Context Knowledge: There most inﬂuential ones was, the use of roles that could be
are many diﬀerent ways in which an ontology may expli- ﬁlled out by entities showing a certain type (Brachman,
cate a conceptualization and the corresponding context 1977). This kind of value restriction can still be found in
knowledge. The possibilities range from a purely infor- recent approaches. RDF schema descriptions (Brickley
mal natural language description of a term correspond- and Guha, 2000), which might become a new standard
ing to a glossary up, to a strictly formal approach, with for the semantic descriptions of web-pages, are an exam-
the expressive power of full ﬁrst order predicate logic or ple of this. An RDF schema contains class deﬁnitions
even beyond (e. g. Ontolingua (Gruber, 1991)). Jasper with associated properties that can be restricted by so-
and Uschold (Jasper and Uschold, 1999) distinguish two called constraint-properties. However, default values
ways in which the mechanisms for the speciﬁcation of and value range descriptions are not expressive enough
to cover all possible conceptualizations. A more ex- sion purposes. Another very challenging application of
pressive power can be provided by allowing classes to ontology-based speciﬁcation is the reuse of existing soft-
be speciﬁed by logical formulas. These formulas can ware. In this case, the specifying ontology serves as a
be restricted to a decidable subset of ﬁrst order logic. basis to decide if an existing component matches the
This is the approach of description logics (Borgida and requirements of a given task.
Patel-Schneider, 1994). Nevertheless, there are also ap- Depending on the purpose of the speciﬁcation, on-
proaches that allow for even more expressive descrip- tologies of diﬀerent formal strength and expressiveness
tions. In Ontolingua for example, classes can be de- are to be utilized. While the process of communica-
ﬁned by arbitrary KIF-expressions. Beyond the ex- tion design decisions and the acquisition of additional
pressiveness of full ﬁrst-order predicate logic, there are information normally beneﬁt from rather informal and
also special purpose languages that have an extended expressive ontology representations (often graphical),
expressiveness to cover speciﬁc needs of their applica- the directed search for information needs a rather strict
tion area. Examples are; speciﬁcation languages for speciﬁcation with a limited vocabulary to limit the com-
knowledge-based systems which often including vari- putational eﬀort. At the moment, the support of semi-
ants of dynamic logic to describe system dynamics. automatic software reuse seems to be one of the most
challenging applications of ontologies, because it re-
Applications: Ontologies are useful for many dif- quires expressive ontologies with a high level of formal
ferent applications, that can be classiﬁed into several strength.
areas. Each of these areas, has diﬀerent requirements The previously discussed considerations might pro-
on the level of formality and the extend of explication voke the impression that the beneﬁts of ontologies are
provided by the ontology. We will review brieﬂy com- limited to systems analysis and design. However, an
mon application areas, namely the support of commu- important application area of ontologies is the integra-
nication processes, the speciﬁcation of systems and in- tion of existing systems. The ability to exchange infor-
formation entities and the interoperability of computer mation at run time, also known as interoperability, is
systems. an valid and important topic. The attempt to provide
Information communities are useful because they ease interoperability suﬀers from problems similar to those
communication and cooperation among members with associated with the communication amongst diﬀerent
the use of shared terminology with well deﬁned mean- information communities. The important diﬀerence be-
ing. On the other hand, the formalization of informa- ing the actors are not people able to perform abstrac-
tion communities makes communication between mem- tion and common sense reasoning about the meaning
bers from diﬀerent information communities very diﬃ- of terms, but machines. In order to enable machines
cult. Generally, because they do not agree on a common to understand each other, we also have to explicate the
conceptualization. Although, they may use the shared context of each system on a much higher level of formal-
vocabulary of natural language, most of the vocabulary ity. Ontologies are often used as Inter-Linguas in order
used in their information communities is highly spe- to provide interoperability: They serve as a common
cialized and not shared with other communities. This format for data interchange. Each system that wants
situation demands for an explication and explanation to inter-operate with other systems has to transfer its
of the use of terminology. Informal ontologies with a data information into this common framework. Interop-
large extend of explication are a good choice to over- erability is achieved by explicitly considering contextual
come these problems. While deﬁnitions have always knowledge in the translation process.
played an important role in scientiﬁc literature, concep- Semantic Mapper For an appropriate support of an
tual models of certain domains are rather new. Nowa- integration of heterogeneous information sources an ex-
days systems analysis and related ﬁelds like software plicit description of semantics (i. e. an ontology) of each
engineering, rely on conceptual modeling to communi- source is required. In principle, there are three ways
cate structure and details of a problem domain as well how ontologies can be applied:
as the proposed solution between domain experts and
engineers. Prominent examples of ontologies used for • a centralized approach, where each source is related
communication are Entity-Relationship diagrams and to one common domain ontology,
Object-oriented Modeling languages such as UML. • a decentralized approach, where every source is re-
ER-diagrams as well as UML are not only used for lated to its own ontology, or
communication, they also serve as building plans for
• a hybrid approach, where every source is related to its
data and systems guiding the process of building (en-
own ontology but the vocabulary of these ontologies
gineering) the system. The use of ontologies for the
stem from a common domain ontology
description of information and systems has many bene-
ﬁts. The ontology can be used to identify requirements A common domain ontology describes the seman-
as well as inconsistencies in a chosen design. Further, it tics of the domain in the SIMS mediator (Arens et al.,
can help to acquire or search for available information. 1996). In the global domain model of these approaches
Once a systems component has been implemented, its all terms of a domain are arranged in a complex struc-
speciﬁcation can be used for maintenance and exten- ture. Each information source is related to the terms
of the global ontology (e. g. with articulation axioms of the sources to provide the speciﬁc knowledge for the
(Collet et al., 1991)). However, the scalability of such corresponding component in the query phase. A media-
a ﬁxed and static common domain model is low (Mitra tor for example, which is associated with the structural
et al., 1999), because the kind of information sources level, is responsible for the reconciliation of the struc-
which can be integrated in the future is limited. tural heterogeneity problems. The mediator is conﬁg-
In OBSERVER (Mena et al., 1996) and SKC (Mi- ured by a set of rules that describe the structural trans-
tra et al., 1999) it is assumed, that a predeﬁned ontol- formation of data from one source to another. The rules
ogy for each information source exists. Consequently, are acquired in the acquisition phase with the help of
new information sources can easily be added and re- the rule generator.
moved. But the comparison of the heterogeneous on- An important characteristic of the Buster architec-
tologies leads to many homonym, synonym, etc. prob- ture is the semantic level, where two diﬀerent types
lems, because the ontologies use their own vocabulary. of tools exists for solving the semantic heterogeneity
In SKC (Mitra et al., 1999) the ontology of each source problems. This demonstrates the focus of the Buster
is described by graphs. Graph transformation rules are system, providing a solution for this type of problems.
used to transport information from one ontology into Furthermore, the need for two types of tools exhibits,
another ontology (Mitra et al., 2000). These rules can that the reconciliation of semantic problems is very dif-
only solve the schematic heterogeneities between the ﬁcult and must be supported by a hybrid architecture
ontologies. where diﬀerent components are combined.
In MESA (Wache et al., 1999) the third hybrid ap- In the following sections we describe the two phases
proach is used. Each source is related to its source and the components in detail.
ontology. In order to make the source ontologies com-
parable, a common global vocabulary is used, organized Query Phase
in a common domain ontology. This hybrid approach In the query phase a user submits a query request to
provides the biggest ﬂexibility because new sources can one or more data sources in the network of integrated
easily be integrated and, in contrast to the decentralized data sources. In this query phase several components
approach, the source ontologies remain comparable. of diﬀerent levels interact (see Fig. 1).
In the next section we will describe how ontologies On the syntactic level, wrappers are used to establish
can help to solve heterogeneity problems. a communication channel to the data source(s), that is
independent of speciﬁc ﬁle formats and system imple-
BUSTER - An Approach for mentations. Each generic wrapper covers a speciﬁc ﬁle-
Comprehensive Interoperability or data-format. For example, generic wrappers may ex-
ist for ODBC data sources, XML data ﬁles, or speciﬁc
In chapter 2 we described the methods needed to GIS formats. Still, these generic wrappers have to be
achieve structural, syntactic, and semantic interoper- conﬁgured for the speciﬁc requirements of a data source.
ability. In this chapter, we propose the Buster- ap- The mediator on the structural level uses informa-
proach (Bremen University Semantic Translator for En- tion obtained from the wrappers and ”combines, in-
hanced Retrieval), which provides a comprehensive so- tegrates and abstracts” (Wiederhold, 1992) them. In
lution to reconcile all heterogeneity problems. the Buster approach, we use generic mediators which
During an acquisition phase all desired informa- are conﬁgured by transformation rules (query deﬁnition
tion for providing a network of integrated informa- rules QDR). These rules describe in a declarative style,
tion sources is acquired. This includes the acquisi- how the data from several sources can be integrated and
tion of a Comprehensive Source Description (CSD) of transformed to the data structure of original source.
each source together with the Integration Knowledge On the semantic level, we use two diﬀerent tools spe-
(IK) which describes how the information can be trans- cialized for solving the semantic heterogeneity prob-
formed from one source to another. lems. Both tools are responsible for the context
In the query phase, a user or an application (e. g. transformation, i. e. transforming data from an source-
a GIS) formulates a query to an integrated view of context to a goal-context. There are several ways
sources. Several specialized components in the query how the context transformation can be applied. In
phase use the acquired information, i. e. the CSD’s and Buster we consider the functional context transfor-
IK’s, to select the desired data from several informa- mation and context transformation by re-classiﬁcation
tion sources and to transform it to the structure and (Stuckenschmidt and Wache, 2000).
the context of the query. In the functional context transformation, the con-
All software components in both phases are associ- version of data is done by application of a predeﬁned
ated to three levels: the syntactic, the structural and functions. A function is declaratively represented in
the semantic level. The components on each level deal Context Transformation Rules. These (CTR’s) describe
with the corresponding heterogeneity problems. The from which source-context to which goal-context can be
components in the query phase are responsible for solv- transformed by the application of which function. The
ing the corresponding heterogeneity problems whereas context transformation rules are invoked by the CTR-
the components in the acquisition phase use the CSD’s Engine. The functional context transformation can be
use types in urban and rural areas of Germany.
The ATKIS data sets are generated and maintained
by a working group of several public agencies on a fed-
eral and state level. The complexity of the task of keep-
ing all data sets up-to-date and the underlying adminis-
trative structure, causes a certain delay in the produc-
tion and delivery of new updated maps. Consequently,
the engineer in our application example, is likely to
work with ATKIS maps that are not quite up-to-date
but show discrepancies with respect to features observ-
able in reality.
The engineer needs tools to compare his potentially
inconsistent base-data with more recent representations
of reality, in order to deﬁne potential problem areas. In
our example the CORINE land cover (EEA, 1999) data
base provide satellite images. From 1985 to 1990, the
European Commission carried out the CORINE Pro-
gramme (Co-ordination of Information on the Envi-
ronment). The results are essentially of three types,
which correspond to the three aims of the Programme:
(a) an information system on the state of the envi-
Figure 1: The query phase of the BUSTER architecture ronment in the European Community has been cre-
ated (the CORINE system). It is composed of a series
of data bases describing the environment in the Euro-
used for example, in the transformation of area mea- pean Community, as well as the data bases with back-
sures in hectars to area measures in acres, or the trans- ground information. (b) Nomenclatures and method-
formation of one coordinate system into another. All ologies were developed for carrying out the programs,
context transformation rules can be described with the which are now used as the reference in the areas con-
help of mathematical functions. cerned at the community level. (c) A systematic eﬀort
Further to the functional context transformation, was made to concert activities with all the bodies in-
Buster also allows the classiﬁcation of data into an- volved in the production of environmental information
other context. This is utilized to automatically map the especially at international level. As a result of this ac-
concepts of one data source to concepts of another data tivity, and indeed of the whole programs, several groups
source. To be more precise, the context description (i. e. of international scientists have been working together
the ontological description of the data) is re-classiﬁed. towards agreed targets. They now share a pool of ex-
The source-context description, to which the data is an- pertise on various themes of environmental information.
notated, is obtained from the CSD, completed with the
data information and relates to goal-context descrip-
The technologies of syntactic, structural, and seman-
tions. After the context re-classiﬁcation the data is sim-
tic integration described in section can be applied to
ply replaced with the data which is annotated with the
facilitate this task.
related goal-context. Context re-classiﬁcation together
with the data replacement is useful for the transforma-
Following, is a step-by-step example of how a typi-
tion of catalog terms, e. g. exchanging the term of an
cal user interaction with the system in the query phase
source catalog by a term from the goal catalog.
A Query Example We demonstrate the query phase
and the interaction of the components by a real world 1. The user starts the query from within his native GIS
example. The scenario presents a typical user, for ex- tool (here: ATKIS maps in ArcView). He deﬁnes
ample an environmental engineer in a public adminis- the parameters of the query, such as the properties
tration, who is involved in some kind of urban plan- and formats of the originating system, the speciﬁed
ning process. The basis for his work is a set of digital area of interest (bounding rectangle, coordinate sys-
maps and a GIS to view, evaluate, and manipulate these tem etc.), and information about the requested at-
maps. tribute data (here: ”land use”). Then he submits
the query to the network of integrated data sources.
In our example, the engineer uses a set of ATKIS
maps in an ArcView (ESRI, 1994) environment. ATKIS 2. The query is matched against the central network
stands for ”Amtliches Topographisch-Kartographisches database, and a decision is made about which of the
Informationssystem”, i. e. the oﬃcial German informa- participating data sources a.) cover the area of inter-
tion system related to maps and topographical infor- est and b.) hold information on the attribute ”land
mation (AdV, 1998). Among others, the ATKIS data use”. A list of all compatible data sources is created
source oﬀers detailed information with respect to land- and send back to the user.
From this list, the user selects one or more data
sources and re-submits the query to the system. In
our example, the engineer selects a set of CORINE
land-cover satellite images.
3. The system consults the central database and re-
trieves basic information needed to access the se-
lected data source(s). This includes information
about technical, syntactical, and structural details as
well as rules needed for the access exchange of data
from these sources.
4. The information is used to select and conﬁgure suit-
able wrappers from a repository of generic wrappers.
Once the wrappers are properly installed, a suitable
mediator is selected from a repository of generic me-
diators. Among others, the mediator-rules describe
the ﬁelds that hold the requested information (here:
the ﬁelds holding land-use information).
With the help of wrappers and mediators, a direct
connection to the selected data source(s) can be es-
tablished, and individual instances of data can be Figure 2: The data acquisition phase of the BUSTER
5. For the context transformation from the source con-
text into the query context the mediator queries the
CTR-Engine. For example, the CTR-Eengine trans- The Comprehensive Source Description Each
forms the area measure in hectares to area measures CSD consists of meta data that describe technical and
in acres. administrative details of the data source as well as its
structural and syntactic schema and annotations. In
If the CTR-Engine cannot transform the context, be-
addition, the CSD comprises a source ontology, i. e. a
cause no appropriate CTR’s exists, it queries the re-
detailed and computer-readable description of the con-
classiﬁer for a context mapping. In our example, it is
cepts stored in the data source. The CSD is attached to
used to re-classify the CORINE land-use attributes
the respective data source. It should be available in a
of all polygons in the selected area of interest to
highly interchangeable format (for example XML), that
make them consistent with the ATKIS classiﬁcation
allows easy data exchange over computer networks.
Setting up a CSD is the task of the domain special-
If no context transformation can be performed the ist responsible for the creation and maintenance of the
mediator rejects the data. speciﬁc data source. With the help of specialized tools
6. The result of the whole process is a new map for the that use repositories of pre-existing general ontologies
selected area that shows CORINE data re-classiﬁed and terminologies, the tedious task of setting up a CSD
to the ATKIS framework. The engineer in our exam- can be supported. These tools examine existing CSD’s
ple can overlay the original ATKIS set of maps with of other but similar sources and generate hypotheses for
the new map. He can then apply regular GIS tools similar parts of the new CSD’s. The domain specialist
to make immediate decisions about which areas of must verify – eventually modifying – the hypotheses
the ATKIS maps are inconsistent with the CORINE and add them to the CSD of the new source. With
satellite images and consequently need to be updated. these acquisition tools the creation of new CSD’s can
be simpliﬁed (Wache et al., 1999).
Data Acquisition Phase
Before the ﬁrst query can be submitted, the knowledge, The Integration Knowledge In a second step of
in fact the Comprehensive Source Description (CSD) the data acquisition phase, the data source is added
and Integration Knowledge (IK) has to be acquired. to the network of integrated data sources. In order
The ﬁrst step of the data acquisition phase consists of for the new data source to be able to exchange data
gathering information about the data source that is to with the other data sources in the network, Integration
be integrated (Fig. 2). This information is stored in Knowledge (IK) must be acquired. The IK is stored
a source-speciﬁc data base, the Comprehensive Source in a centralized database that is part of the network of
Descriptor (CSD). A CSD has to be created for each integrated data sources.
data source that participates in a network of integrated The IK consists of several separated parts which
data sources. provides speciﬁc knowledge for the components in the
query phase. For example, the rule generator exam-
ines several CSD’s and creates rules for the mediator [Bergamashi et al., 1999] Bergamashi, Castano,
(Wache et al., 1999). The wrapper conﬁgurator uses Vincini, and Beneventano (1999). Intelligent tech-
the information about the sources in order to adapt niques for the extraction and integration of hetero-
generic wrappers to the heterogeneous sources. geneous information. In Workshop Intelligent Infor-
Creating the IK is the task of the person responsi- mation Integration, IJCAI 99, Stockholm, Sweden.
ble for operating and maintaining the network of in- [Bishr and Kuhn, 1999] Bishr, Y. and Kuhn, W.
tegrated data sources. Due to the complexity of the (1999). The Role of Ontology in Modelling Geospa-
IK needed for the integration of multiple heterogeneous tial Features, volume 5 of IFGI prints. Institut f¨ru
data sources and the unavoidable semantic ambiguities, Geoinformatik, Universit¨t M¨nster, M¨nster.
a u u
it may not be possible to accomplish this task automat-
ically. However, the acquisition of the IK can be sup- [Borgida and Patel-Schneider, 1994] Borgida, A. and
ported by semi-automatic tools. In general, such ac- Patel-Schneider, P. (1994). A semantics and complete
quisition tools use the information stored in the CSDs algorithm for subsumption in the classic description
to pre-deﬁne parts of the IK and propose them to the logic. JAIR, 1:277–308.
human operator who makes the ﬁnal decision about [Brachman, 1977] Brachman, R. (1977). What’s in a
whether to accept, edit, or reject them. concept: Structural foundations for semantic nets. In-
ternational Journal of Man-Machine Studies, 9:127–
[Brickley and Guha, 2000] Brickley, D. and Guha, R.
In order to make GIS interoperable, several problems
(2000). Resource description framework (rdf) schema
have to be solved. We argued that these problems can
speciﬁcation 1.0. Technical Report PR-rdf-schema,
be divided onto three levels of integration, the syntac-
tic, structural, and semantic level. In our opinion it is
crucial to note that the problem of interoperable GIS
can only be solved if solutions (modules) on all three [Chawathe et al., 1994] Chawathe, S., Garcia-Molina,
levels of integration are working together. We believe H., Hammer, J., Ireland, K., Papakonstantinou, Y.,
that it is not possible to solve the heterogeneity prob- Ullman, J., and Widom, J. (1994). The TSIMMIS
lems separately. Project: Integration of Heterogeneous Information
The Buster- approach uses diﬀerent components Sources. In Proceedings of IPSJ Conference, pages 7–
for diﬀerent tasks on diﬀerent levels and provides a 18.
conceptional solution for these problems. The com- [Collet et al., 1991] Collet, C., Huhns, M. N., and
ponents can be any existing systems. We use wrap- Shen, W.-M. (1991). Resource integration using a
pers for the syntactic level, mediators for the struc- large knowledge base in carnot. IEEE Computer,
tural level, and both context transformation rule en- 24(12):55–62.
gines (CTR-Engines) and classiﬁers (mappers) for the [EEA, 1999] EEA (1997-1999). Corine land cover.
semantic level. CORBA as low level middleware is used technical guide, European Environmental Agency,
for the communication of the components. ETC/LC, European Topic Centre on Land Cover.
At the moment, a few wrappers are available (e. g.
ODBC-, XML-wrapper), a wrapper for shape ﬁles will [ESRI, 1994] ESRI (1994). Introducing ArcView. En-
be available soon. We are currently developing a medi- vironmental Systems Research Institute (ESRI), Red-
ator and the CTR-Engine and use FaCT (Fast Classiﬁ- lands,CA. USA.
cation of Terminologies) (Horrocks, 1999) as a reasoner [Galhardas et al., 1998] Galhardas, H., Simon, E., and
for our prototype system. Buster is a ﬁrst attempt Tomasic, A. (1998). A framework for classifying envi-
to solve the heterogeneous problems mentioned in this ronmental metadata. In AAAI, Workshop on AI and
paper, however, a lot of work has to be done in various Information Integration, Madison, WI.
areas. [Genesereth and Fikes, 1992] Genesereth, M. and
Fikes, R. (1992). Knowledge interchange format ver-
References sion 3.0 reference manual. Report of the Knowledge
[Aben, 1993] Aben, M. (1993). Formally specifying re- Systems Laboratory KSL 91-1, Stanford University.
usable knowledge model components. Knowledge Ac- [Gruber, 1991] Gruber, T. (1991). Ontolingua: A
quisition Journal, 5:119–141. mechanim to support portable ontologies. KSL Re-
[AdV, 1998] AdV (1998). Amtliches Topographisch- port KSL-91-66, Stanford University.
Kartographisches Informationssystem ATKIS. Lan- [Gruber, 1993] Gruber, T. (1993). A translation ap-
desvermessungsamt NRW, Bonn. proach to portable ontology speciﬁcations. Knowledge
[Arens et al., 1996] Arens, Y., Hsu, C.-N., and Acquisition, 5(2).
Knoblock, C. A. (1996). Query processing in the sims [Guarino, 1998] Guarino, N. (1998). Formal ontology
information mediator. In Advanced Planning Technol- and information systems. In Guarino, N., editor, FOIS
ogy, California, USA. AAAI Press. 98, Trento, Italy. IOS Press.
[Guarino and Giaretta, 1995] Guarino, N. and Gia- ticulation of ontology interdependencies. In Proc. Ex-
retta, P. (1995). Ontologies and knowledge bases: To- tending DataBase Technologies, EDBT 2000, volume
wards a terminological clariﬁcation. In Mars, N., edi- Lecture Notes on Computer Science, Konstanz, Ger-
tor, Towards Very Large Knowledge Bases: Knowledge many. Springer Verlag.
Building and Knowledge Sharing, pages 25–32. Ams- [Naiman and Ouksel, 1995] Naiman, C. F. and Ouksel,
terdam. A. M. (1995). A classiﬁcation of semantic conﬂicts in
[Horrocks, 1999] Horrocks, I. (1999). FaCT and iFaCT. heterogeneous database systems. Journal of Organi-
In (Lambrix et al., 1999), pages 133–135. zational Computing, pages 167–193.
[Jasper and Uschold, 1999] Jasper, R. and Uschold, M. [OMG, 1992] OMG (1992). The common object re-
(1999). A framework for understanding and classifying quest broker: Architecture and speciﬁcation. OMG
ontoogy applications. In Proceedings of the 12th Banﬀ Document 91.12.1, The Object Management Group.
Knowledge Acquisition for Knowledge-Based Systems Revision 1.1.92.
Workshop. University of Calgary/Stanford University. [Papakonstantinou et al., 1996] Papakonstantinou, Y.,
[Kashyap and Sheth, 1997] Kashyap, V. and Sheth, A. Garcia-Molina, H., and Ullman, J. (1996). Medmaker:
(1997). Cooperative Information Systems: Current A mediation system based on declarative speciﬁca-
Trends and Directions, chapter Semantic Heterogene- tions. In International Conference on Data Engineer-
ity in Global Information Systems: The role of Meta- ing, pages 132–141, New Orleans.
data, Context and Ontologies. Academic Press. [Schreiber et al., 1994] Schreiber, A., Wielinga, B.,
[Kim et al., 1995] Kim, W., Choi, I., Gala, S., and Akkermans, H., Velde, W., and Anjewierden, A.
Scheevel, M. (1995). Modern Database: The Ob- (1994). Cml the commonkads conceptual modeling
ject Model, Interoperability, and Beyond, chapter On language. In et al., S., editor, A Future of Knowledge
Resolving Schematic Heterogeneity in Multidatabase Acquisition, Proc. 8th European Knowledge Acquisi-
Systems, pages 521–550. ACM Press / Addison- tion Workshop (EKAW 94), number 867 in Lecture
Wesley Publishing Company. Notes in Artiﬁcial Intelligence. Springer.
[Kim and Seo, 1991] Kim, W. and Seo, J. (1991). Clas- [Shepherd, 1991] Shepherd, I. D. H. (1991). Informa-
sifying schematic and data heterogeinity in multi- tion integration in gis. In Maguire, D. J., Goodchild,
database systems. IEEE Computer, 24(12):12–18. M. F., and Rhind, D. W., editors, Geographical Infor-
[Lambrix et al., 1999] Lambrix, P., Borgida, A., Lenz- mation Systems: Principles and applications. Long-
erini, M., M¨ller, R., and Patel-Schneider, P., editors
o man, London, UK.
(1999). Proceedings of the International Workshop on [Stuckenschmidt and Wache, 2000] Stuckenschmidt,
Description Logics (DL’99). H. and Wache, H. (2000). Context modelling and
[Landgraf, 1999] Landgraf, G. (1999). Evolution of transformation for semantic interoperability. In
eo/gis interoperability; towards an integrated applica- Knowledge Representation Meets Databases (KRDB
tion infrastructure. In Vckovski, A., editor, Interop99, 2000). to appear.
volume 1580 of Lecture Notes in Computer Science, [van Harmelen and Fensel, 1999] van Harmelen, F. and
Z¨rich, Switzerland. Springer.
u Fensel, D. (1999). Practical knowledge representation
[Lenat, 1998] Lenat, D. (1998). The dimensions of con- for the web. In Fensel, D., editor, Proceedings of the
text space. Available on the web-site of the Cycorp IJCAI’99 Workshop on Intelligent Information Inte-
Corporation. (http://www.cyc.com/publications). gration.
[Maguire et al., 1991] Maguire, D. J., Goodchild, [Vckovski, 1998] Vckovski, A. (1998). Interoperable and
M. F., and Rhind, D. W., editors (1991). Geographi- Distributed Processing in GIS. Taylor & Francis, Lon-
cal Information Systems: Principles and applications. don.
Longman, London, UK. [Vckovski et al., 1999] Vckovski, A., Brassel, K., and
[Mena et al., 1996] Mena, E., Kashyap, V., Illarra- Schek, H.-J., editors (1999). Proceedings of the 2nd In-
mendi, A., and Sheth, A. (1996). Managing multiple ternational Conference on Interoperating Geographic
information sources through ontologies: Relationship Information Systems, volume 1580 of Lecture Notes in
between vocabulary heterogeneity and loss of informa- Computer Science, Z¨rich. Springer.
tion. In Baader, F., Buchheit, M., Jeusfeld, M. A., and [Visser et al., 2000] Visser, U., Stuckenschmidt, H.,
Nutt, W., editors, Proceedings of the 3rd Workshop Schuster, G., and V¨gele, T. (2000). Ontologies for
Knowledge Representation Meets Databases (KRDB geographic information processing. Computers & Geo-
’96). sciences. submitted.
[Mitra et al., 1999] Mitra, P., Wiederhold, G., and [W3C, 1998] W3C (1998). Extensible markup language
Jannink, J. (1999). Semi-automatic integration of (xml) 1.0. W3C Recommendation.
knowledge sources. In Fusion ’99, Sunnyvale CA. [W3C, 1999] W3C (1999). Resource descrition frame-
[Mitra et al., 2000] Mitra, P., Wiederhold, G., and work (rdf) schema speciﬁcation. W3C Proposed Rec-
Kersten, M. (2000). A graph-oriented model for ar- ommendation.
[Wache et al., 1999] Wache, H., Scholz, T., Stieghahn,
H., and K¨nig-Ries, B. (1999).
o An integration
method for the speciﬁcation of rule–oriented medi-
ators. In Kambayashi, Y. and Takakura, H., edi-
tors, Proceedings of the International Symposium on
Database Applications in Non-Traditional Environ-
ments (DANTE’99), pages 109–112, Kyoto, Japan.
[Wiederhold, 1992] Wiederhold, G. (1992). Mediators
in the architecture of future information systems.
IEEE Computer, 25(3):38–49. standard reference for
[Wiederhold, 1999] Wiederhold, G. (1999). Mediation
to deal with heterogeneous data sources. In Vckovski,
A., editor, Interop99, volume 1580 of Lecture Notes in
Computer Science, Z¨rich, Switzerland. Springer.
[Wiener et al., 1996] Wiener, J., Gupta, H., Labio, W.,
Zhuge, Y., Garcia-Molina, H., and Widom, J. (1996).
Whips: A system prototype for warehouse view main-
tenance. In Workshop on materialized views, pages
26–33, Montreal, Canada.
[Worboys and Deen, 1991] Worboys, M. F. and Deen,
S. M. (1991). Semantic heterogeneity in distributed
geographical databases. SIGMOID Record, 20(4).