FCA-M ERGE: Bottom-Up Merging of Ontologies Gerd Stumme and Alexander Maedche Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germany stumme,maedche @aifb.uni-karlsruhe.de http://www.aifb.uni-karlsruhe.de/WBS Abstract for comparisons, these approaches do not offer a structural description of the global merging process. Ontologies have been established for knowledge We propose the new method FCA–M ERGE for merging sharing and are widely used as a means for con- ontologies following a bottom-up approach which offers a ceptually structuring domains of interest. With global structural description of the merging process. For both the growing usage of ontologies, the problem of source ontologies, it extracts instances from a given set set of overlapping knowledge in a common domain be- domain-speciﬁc text documents by applying natural language comes critical. We propose the new method processing techniques. Based on the extracted instances we FCA–M ERGE for merging ontologies following a apply mathematically founded techniques taken from Formal bottom-up approach which offers a structural de- Concept Analysis [Wille, 1982; Ganter and Wille, 1999 ] to scription of the merging process. The method derive a lattice of concepts as a structural result of FCA– is guided by application-speciﬁc instances of two M ERGE. The produced result is explored and transformed to given source ontologies, that are to be merged. We the merged ontology by the ontology engineer. The extrac- apply techniques from natural language processing tion of instances from text documents circumvents the prob- and formal concept analysis to derive a lattice of lem that in most applications there are no objects which are concepts as a structural result of FCA–M ERGE. simultaneously instances of both ontologies, and which could The generated result is then explored and trans- be used as a basis for identifying similar concepts. formed into the merged ontology with human in- The remainder of the paper is as follows. We brieﬂy in- teraction. troduce some basic deﬁnitions concentrating on a formal def- inition of what an ontology is and recall the basics of For- mal Concept Analysis in Section 2. Before we present our1 Introduction generic method for ontology merging in Section 4, we giveOntologies have been established for knowledge sharing and an overview over existing and related work in Section 3. Sec-are widely used as a means for conceptually structuring do- tion 5 provides a detailed description of FCA–M ERGE. Sec-mains of interest. With the growing usage of ontologies, the tion 6 summarizes the paper and concludes with an outlookproblem of overlapping knowledge in a common domain oc- on future work.curs more often and becomes critical. Domain-speciﬁc on-tologies are modeled by multiple authors in multiple settings. 2 Ontologies and Formal Concept AnalysisThese ontologies lay the foundation for building new domain- In this section, we brieﬂy introduce some basic deﬁnitions.speciﬁc ontologies in similar domains by assembling and ex- We thereby concentrate on a formal deﬁnition of what an on-tending multiple ontologies from repositories. tology is and recall the basics of Formal Concept Analysis. The process of ontology merging takes as input two sourceontologies and returns a merged ontology based on the given 2.1 Ontologiessource ontologies. Manual ontology merging using con- There is no common formal deﬁnition of what an ontology is.ventional editing tools without intelligent support is difﬁ- However, most approaches share a few core items: concepts,cult, labor intensive and error prone. Therefore, several sys- a hierarchical IS-A-relation, and further relations. For saketems and frameworks for supporting the knowledge engi- of generality, we do not discuss more speciﬁc features likeneer in the ontology merging task have recently been pro- constraints, functions, or axioms here. We formalize the coreposed [Hovy, 1998; Chalupsky, 2000; Noy and Musen, 2000; in the following way.McGuinness et al, 2000 ]. The approaches rely on syntacticand semantic matching heuristics which are derived from the Deﬁnition: A (core) ontology is a tuple Çbehavior of ontology engineers when confronted with the task ´ × Ê µ, where is a set whose elements are calledof merging ontologies, i. e. human behaviour is simulated. concepts, × is a partial order on (i. e., a binary rela-Although some of them locally use different kinds of logics tion × ¢ which is reﬂexive, transitive, and anti-
symmetric), and Ê is a set whose elements are called relation syntactic rewriting supports the translation between two dif-names (or relations for short), and Ê · is a function ferent knowledge representation languages, semantic rewrit-which assigns to each relation name its arity. ing offers means for inference-based transformations. It ex- plicitly allows to violate the preservation of semantics inAs said above, the deﬁnition considers the core elements of trade-off for a more expressive, ﬂexible transformation mech-most languages for ontology representation only. It is possi- anism.ble to map the deﬁnition to most types of ontology represen- In [McGuinness et al, 2000 ] the Chimaera system is de-tation languages. Our implementation, for instance, is based scribed. It provides support for merging of ontological termson Frame Logic [Kifer et al, 1995 ]. Frame Logic has a well- from different sources, for checking the coverage and cor-founded semantics, but we do not refer to it in this paper. rectness of ontologies and for maintaining ontologies over time. Chimaera supports the merging of ontologies by co-2.2 Formal Concept Analysis alescing two semantically identical terms from different on-We recall the basics of Formal Concept Analysis (FCA) as far tologies and by identifying terms that should be related byas they are needed for this paper. A more extensive overview subsumption or disjointness relationships. Chimaera offers ais given in [Ganter and Wille, 1999 ]. To allow a mathematical broad collection of functions, but the underlying assumptionsdescription of concepts as being composed of extensions and about structural properties of the ontologies at hand are notintensions, FCA starts with a formal context deﬁned as a triple made explicit.Ã ´ µ Å Á , where is a set of objects, Å is a set of Prompt [Noy and Musen, 2000 ] is an algorithm for ontol-attributes, and Á is a binary relation between and Å (i. e. ogy merging and alignment embedded in Prot´ g´ 2000. It e eÁ ´ µ ¢ Å ). Ñ ¾ Á is read “object has attribute Ñ”. starts with the identiﬁcation of matching class names. Based Ñ¾Å ¾ on this initial step an iterative approach is carried out for per-Deﬁnition: For , we deﬁne ¼ ´ µ Ñ ¾ Á and, for Å , we deﬁne ¼ ¾ forming automatic updates, ﬁnding resulting conﬂicts, and Ñ¾ ´ µ Ñ ¾Á . making suggestions to remove these conﬂicts. A formal concept of a formal context ´ µ Å Á is deﬁned The tools described above offer extensive merging func-as a pair ´ µ with , Å, ¼ and ¼ . tionalities, most of them based on syntactic and semantic matching heuristics, which are derived from the behaviour ofThe sets and are called the extent and the intent of theformal concept ´ µ . The subconcept–superconcept rela- ontology engineers when confronted with the task of merg-tion is formalized by ½ ½ ´ µ ´ ¾ ¾ ´µ ½ µ ing ontologies. OntoMorph and Chimarea use a descrip- ´ ¾ ´µ ½ µ ¾ The set of all formal concepts of a tion logics based approach that inﬂuences the merging pro- cess locally, e. g. checking subsumption relationships be-context Ã together with the partial order is always a com- tween terms. None of these approaches offers a structural de-plete lattice,1 called the concept lattice of Ã and denoted by ´ µ Ã . scription of the global merging process. FCA–M ERGE can be regarded as complementary to existing work, offering a A possible confusion might arise from the double use of structural description of the overall merging process with anthe word ‘concept’ in FCA and in ontologies. This comes underlying mathematical framework.from the fact that FCA and ontologies are two models for The work closest to our approach is described in [Schmittthe concept of ‘concept’ which arose independently. In order and Saake, 1998 ]. They apply Formal Concept Analysis toto distinguish both notions, we will always refer to the FCA a related problem, namely database schema integration. Asconcepts as ‘formal concepts’. The concepts in ontologies in our approach, a knowledge engineer has to interpret theare referred to just as ‘concepts’ or as ‘ontology concepts’. results in order to make modeling decisions. Our techniqueThere is no direct counter-part of formal concepts in ontolo- differs in two points: There is no need of knowledge acquisi-gies. Ontology concepts are best compared to FCA attributes, tion from a domain expert in the preprocessing phase; and itas both can be considered as unary predicates on the set of ob- additionally suggests new concepts and relations for the tar-jects. get ontology.3 Related Work 4 Bottom-Up Ontology MergingA ﬁrst approach for supporting the merging of ontologies is As said above, we propose a bottom-up approach for ontol-described in [Hovy, 1998 ]. There, several heuristics are de- ogy merging. Our mechanism is based on application-speciﬁcscribed for identifying corresponding concepts in different instances of the two given ontologies Ç ½ and Ç¾ that are toontologies, e.g. comparing the names of two concepts, com- be merged. The overall process of merging two ontologies isparing the natural language deﬁnitions of two concepts by depicted in Figure 1 and consists of three steps, namely (i)linguistic techniques, and checking the closeness of two con- instance extraction and computing of two formal contexts Ã ½cepts in the concept hierarchy. and Ã ¾ , (ii) the FCA-M ERGE core algorithm that derives a The OntoMorph system [Chalupsky, 2000 ] offers two common context and computes a concept lattice, and (iii) thekinds of mechanisms for translating and merging ontologies: generation of the ﬁnal merged ontology based on the concept lattice. 1 Our method takes as input data the two ontologies and a I. e., for each set of formal concepts, there is always a greatestcommon subconcept and a least common superconcept. set of natural language documents. The documents have to
1 1 5 The FCA–M ERGE Method Linguistic Processing 1 In this section, we discuss the three steps of FCA–M ERGE in FCA- Lattice more detail. We illustrate FCA–M ERGE with a small exam- Merge Exploration Linguistic new ple taken from the tourism domain, where we have built sev- Processing 2 eral speciﬁc ontology-based information systems. Our gen- eral experiments are based on tourism ontologies that have 2 2 been modeled in an ontology engineering seminar. Differ- Figure 1: Ontology Merging Method ent ontologies have been modeled for a given text corpus on the web, which is provided by a WWW provider for tourist information. 2 The corpus describes actual objects, like loca- tions, accommodations, furnishings of accommodations, ad-be relevant to both ontologies, so that the documents are de- ministrative information, and cultural events. For the scenarioscribed by the concepts contained in the ontology. The doc- described here, we have selected two ontologies: The ﬁrst on-uments may be taken from the target application which re- tology contains 67 concepts and 31 relations, and the secondquires the ﬁnal merged ontology. From the documents in , ontology contains 51 concepts and 22 relations. The under-we extract instances. The mechanism for instance extraction lying text corpus consists of 233 natural language documentsis further described in Subsection 5.1. It returns, for each on- taken from the WWW provider described above. For demon- stration purposes, we restrict ourselves ﬁrst to two very small subsets Ç½ and Ç¾ of the two ontologies described above;tology, a formal context indicating which ontology conceptsappear in which documents. and to 14 out of the 233 documents. These examples will The extraction of the instances from documents is neces- be translated in English. In Subsection 5.3, we provide somesary because there are usually no instances which are already examples from the merging of the larger ontologies.classiﬁed by both ontologies. However, if this situation isgiven, one can skip the ﬁrst step and use the classiﬁcation of 5.1 Linguistic Analysis and Context Generationthe instances directly as input for the two formal contexts. The aim of this ﬁrst step is to generate, for each ontology The second step of our ontology merging approach com- Ç ¾ ½¾ , a formal context Ã ´ µ Å Á . The set of documents is taken as object set ( ), and the setprises the FCA–M ERGE core algorithm. The core algorithm of concepts is taken as attribute set (Å ). While thesemerges the two contexts and computes a concept lattice from sets come for free, the difﬁcult step is generating the binarythe merged context using FCA techniques. More precisely, itcomputes a pruned concept lattice which has the same degree relation Á . The relation ´ µ Ñ ¾ Á shall hold whenever document contains an instance of Ñ.of detail as the two source ontologies. The techniques ap- The computation uses linguistic techniques as describedplied for generating the pruned concept lattice are described in the sequel. We conceive an information extraction-basedin Subsection 5.2 in more detail. approach for ontology-based extraction, which has been im- Instance extraction and the FCA–M ERGE core algorithm plemented on top of SMES (Saarbr¨ cken Message Extrac- uare fully automatic. The ﬁnal step of deriving the merged tion System), a shallow text processor for German (cf. [Neu-ontology from the concept lattice requires human interaction. mann et al, 1997 ]). The architecture of SMES comprisesBased on the pruned concept lattice and the sets of relation a tokenizer based on regular expressions, a lexical analysisnames Ê½ and Ê¾ , the ontology engineer creates the con- component including a word and a domain lexicon, and acepts and relations of the target ontology. We offer graphical chunk parser. The tokenizer scans the text in order to identifymeans of the ontology engineering environment OntoEdit for boundaries of words and complex expressions like “$20.00”supporting this process. or “Mecklenburg–Vorpommern”, 3 and to expand abbrevia- tions. For obtaining good results, a few assumptions have to be The lexicon contains more than 120,000 stem entries andmet by the input data: Firstly, the documents have to be rel- more than 12,000 subcategorization frames describing infor-evant to each of the source ontologies. A document from mation used for lexical analysis and chunk parsing. Further-which no instance is extracted for each source ontology can more, the domain-speciﬁc part of the lexicon contains lexicalbe neglected for our task. Secondly, the documents have entries that express natural language representations of con-to cover all concepts from the source ontologies. Concepts cepts and relations. Lexical entries may refer to several con-which are not covered have to be treated manually after our cepts or relations, and one concept or relation may be referredmerging procedure (or the set of documents has to be ex- to by several lexical entries.panded). And last but not least, the documents must sepa- Lexical analysis uses the lexicon to perform (1) morpho-rate the concepts well enough. If two concepts which are logical analysis, i. e. the identiﬁcation of the canonical com-considered as different always appear in the same documents, mon stem of a set of related word forms and the analysisFCA-M ERGE will map them to the same concept in the target of compounds, (2) recognition of named entities, (3) part-of-ontology (unless this decision is overruled by the knowledge speech tagging, and (4) retrieval of domain-speciﬁc informa-engineer). When this situation appears too often, the knowl- 2edge engineer might want to add more documents which fur- URL: http://www.all-in-all.com 3ther separate the concepts. a region in the north east of Germany
Accommodation Root_1 Root_2 Vacation Hotel_1 Concert Musical Hotel_2 Hotel Event Hotel Root Root Á½ Á¾ Accommodation_2 Event_1 ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc1 doc1 Concert_1 ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc2 doc2 ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc3 doc3 Musical_2 ¢ ¢ ¢ ¢ ¢ doc4 doc4 ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc5 doc5 ¢ ¢ ¢ ¢ ¢ doc6 doc6 Vacation_1 ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc7 doc7 ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc8 doc8 ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc9 doc9 ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc10 doc10 ¢ ¢ ¢ ¢ ¢ doc11 doc11 ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc12 doc12 ¢ ¢ ¢ ¢ ¢ ¢ ¢ doc13 doc13 doc14 doc14 Figure 3: The pruned concept latticeFigure 2: The two contexts Ã ½ and Ã ¾ as result of the ﬁrststep We will not compute the whole concept lattice of Ã , as it would provide too many too speciﬁc concepts. We restrict the computation to those formal concepts which are abovetion. While steps (1), (2), and (3) can be viewed as standard at least one formal concept generated by an (ontology) con-for information extraction approaches, step (4) is of speciﬁc cept of the source ontologies. This assures that we remaininterest for our instance extraction mechanism. This step as- within the range of speciﬁcity of the source ontologies. Moresociates single words or complex expressions with a concept precisely, the pruned concept lattice is given by Ô Ã ´ µfrom the ontology if a corresponding entry in the domain-speciﬁc part of the lexicon exists. For instance, the expression ´ µ ´ µ ¾ Ã ´ Ñ¾Å Ñ ¼ Ñ ¼¼ µ ´ µ . For“Hotel Schwarzer Adler” is associated with the concept Ho- our example, the pruned concept lattice is shown in Figure 3.tel. If the concept Hotel is in ontology Ç ½ and document It consists of six formal concepts. Two formal concepts of contains the expression “Hotel Schwarzer Adler”, then the the total concept lattice are pruned since they are too speciﬁcrelation ( ,Hotel) ¾Á½ holds. compared to the two source ontologies. Finally, the transitivity of the × -relation is compiled The computation of the pruned concept lattice is done withinto the formal context, i. e. Ñ ¾Á and Ñ × Ò im- ´ µ the algorithm T ITANIC [Stumme et al, 2000 ]. It is slightlyplies ´ µ Ò ¾Á . This means that if ( ,Hotel) ¾Á ½ holds modiﬁed to allow the pruning. Compared to other algorithms for computing concept lattices, T ITANIC has — for our pur-and Hotel × Accommodation, then the documentalso describes an instance of the concept Accommodation: pose — the advantage that it computes the formal concepts( ,Accommodation) ¾Á½ . via their key sets (or minimal generators). A key set is a min- Figure 2 depicts the contexts Ã ½ and Ã ¾ that have been imal set of attributes which generates a given formal concept. I. e., Ã Å is a key set for the formal concept ´ µ if andgenerated from the documents for the small example ontolo-gies. E. g., document doc5 contains instances of the con- ´ only if Ã ¼ Ã ¼¼ µ ´ µ and ´ ¼ ¼¼ µ ´ µ for allcepts Event, Concert, and Root of ontology Ç ½ , and Ã with Ã.Musical and Root of ontology Ç ¾ . All other documents In our application, key sets serve two purposes. Firstly,contain some information on hotels, as they contain instances they indicate if the generated formal concept gives rise to aof the concept Hotel both in ontology Ç ½ and in ontology new concept in the target ontology or not. A concept is newÇ¾ . if and only if it has no key sets of cardinality one. Secondly, the key sets of cardinality two or more can be used as generic5.2 Generating the Pruned Concept Lattice names for new concepts and they indicate the arity of newThe second step takes as input the two formal contexts Ã ½ relations.and Ã ¾ which were generated in the last step, and returnsa pruned concept lattice (see below), which will be used as 5.3 Generating the new Ontology from theinput in the next step. Concept Lattice First we merge the two formal contexts into a new formal While the previous steps (instance extraction, context deriva-context Ã , from which we will derive the pruned concept lat- tion, context merging, and T ITANIC) are fully automatic, thetice. Before merging the two formal contexts, we have to derivation of the merged ontology from the concept latticedisambiguate the attribute sets, since ½ and ¾ may con- requires human interaction, since it heavily relies on back-tain the same concepts: Let Å ¼ Ñ Ñ ¾ Å , ´ µ ground knowledge of the domain expert.for ¾ ½¾ . The disambiguation of the concepts allows the The result from the last step is a pruned concept lattice.possibility that the same concept exists in both ontologies, but From it we have to derive the target ontology. Each of theis treated differently. For instance, a Campground may be formal concepts of the pruned concept lattice is a candidateconsidered as an Accommodation in the ﬁrst ontology, but for a concept, a relation, or a new subsumption in the targetnot in the second one. Then the merged formal context is ob- ontology. In the sequel we describe the strategy which under-tained by Ã Å Á with ´ ,Å µ Å½ Å¾ , ¼ ¼ lies this derivation process. Of course, most of the technicaland ´ ´Ñ µµ ¾Á ¸ Ñ ¾Á. ´ µ details are hidden from the user.
There is a number of queries which may be used to focus for the name of the new concept, or for the concepts whichon the most relevant parts of the pruned concept lattice. We should be linked with the new relation. Only those key setsdiscuss these queries after the description of the general strat- with minimal cardinality are considered, as they provide theegy — which follows now. shortest names for new concepts and minimal arities for new As the documents are not needed for the generation of the relations, resp.target ontology, we restrict our attention to the intents of the For instance, the formal concept in the middle of Fig-formal concepts, which are sets of (ontology) concepts of the ure 3 has Hotel 2, Event 1 , Hotel 1, Event 1 ,source ontologies. For each formal concept of the pruned and Accommodation 2, Event 1 as key sets. The userconcept lattice, we analyze the related key sets. For each for- can now decide if to create a new concept with the defaultmal concept, the following cases can be distinguished: name HotelEvent (which is unlikely in this situation), or 1. It has exactly one key set of cardinality 1. to create a new relation with arity (Hotel, Event), e. g., the 2. It has two or more key sets of cardinality 1. relation organizesEvent. 3. It has no key sets of cardinality 0 or 1. Key sets of cardinality 2 serve yet another purpose: 4. It has the empty set as key set. 4 Ñ½ Ñ¾ being a key set implies that neither Ñ ½ × Ñ¾ nor Ñ¾ × Ñ½ currently hold. Thus when the user does not useThe generation of the target ontology starts with all concepts a key set of cardinality 2 for generating a new concept or re-being in one of the two ﬁrst situations. The ﬁrst case is the lation, she should check if it is reasonable to add one of theeasiest: The formal concept is generated by exactly one on- two subsumptions to the target ontology. This case does nottology concept from one of the source ontologies. It can show up in our small example. An example from the largebe included in the target ontology without interaction of the ontologies is given at the end of the section.knowledge engineer. In our example, these are the two formal There is exactly one formal concept in the fourth case (asconcepts labeled by Vacation 1 and by Event 1. the empty set is always a key set). This formal concept gives In the second case, two or more concepts of the source on- rise to a new largest concept in the target ontology, the Roottologies generate the same formal concept. This indicates concept. It is up to the knowledge engineer to accept or tothat the concepts should be merged into one concept in the reject this concept. Many ontology tools require the existencetarget ontology. The user is asked which of the names to of such a largest concept. In our example, this is the formalretain. In the example, this is the case for two formal con- concept labeled by Root 1 and Root 2.cepts: The key sets Concert 1 and Musical 2 gen- Finally, the is a order on the concepts of the target ontologyerate the same formal concept, and are thus suggested to can be derived automatically from the pruned concept lattice:be merged; and the key sets Hotel 1 , Hotel 2 , and If the concepts ½ and ¾ are derived from the formal concepts Accommodation 2 also generate the same formal con-cept.5 The latter case is interesting, since it includes two con- ´ µ ´ µ ½ ½ and ¾ ¾ , resp., then ½ × ¾ if and only ifcepts of the same ontology. This means that the set of docu- ½ ¾ (or if the user explicitly modeled it based on a key set of cardinality 2).ments does not provide enough details to separate these twoconcepts. Either the knowledge engineer decides to merge Querying the pruned concept lattice. In order to support thethe concepts (for instance because he observes that the dis- knowledge engineer in the different steps, there is a numbertinction is of no importance in the target application), or he of queries for focusing his attention to the signiﬁcant parts ofadds them as separate concepts to the target ontology. If there the pruned concept lattice.are too many suggestions to merge concepts which should be Two queries support the handling of the second case (indistinguished, this is an indication that the set of documents which different ontology concepts generate the same formalwas not large enough. In such a case, the user might want to ´ µ concept). The ﬁrst is a list of all pairs Ñ ½ Ñ¾ ¾ ½ ¢ ¾re-launch FCA–M ERGE with a larger set of documents. with Ñ½ ¼ Ñ¾ ¼ . It indicates which concepts from the When all formal concepts in the ﬁrst two cases are dealt different source ontologies should be merged.with, then all concepts from the source ontologies are in- In our small example, this list contains for instance the paircluded in the target ontology. Now, all relations from the two (Concert 1, Musical 2). In the larger application (whichsource ontologies are copied into the target ontology. Possi- is based on the German language), pairs like (Zoo 1, Tier-ble conﬂicts and duplicats have to be resolved by the ontology park 2) and (Zoo 1, Tiergarten 2) are listed. We de-engineer. cided to merge Zoo [engl.: zoo] and Tierpark [zoo], but In the next step, we deal with all formal concepts covered not Zoo and Tiergarten [zoological garden].by the third case. They are all generated by at least two con- The second query returns, for ontology Ç with ¾ ½¾ ,cepts from the source ontologies, and are candidates for newontology concepts or relations in the target ontology. The de- ´ the list of pairs Ñ Ò ¾ µ ¢ with Ñ ¼ Ò ¼ . It helps checking which concepts out of a single ontology mightcision whether to add a concept or a relation to the target on- be subject to merge. The user might either conclude that sometology (or to discard the suggestion) is a modeling decision, of these concept pairs can be merged because their differen-and is left to the user. The key sets provide suggestions either tiation is not necessary in the target application; or he might 4 This implies (by the deﬁnition of key sets) that the formal con- decide that the set of documents must be extended because itcept does not have another key set. does not differentiate the concepts enough. 5 Root 1 and Root 2 are no key sets, as each of them has In the small example, the list for Ç ½ contains only the paira subset (namely the empty set) generating the same formal concept. (Hotel 1, Accommodation 1). In the larger application,
we had additionally pairs like (R¨umliches, Gebiet) and a two contexts and the computation of the pruned concept lat-(Auto, Fortbewegungsmittel). For the target applica- tice; and the semi-automatic ontology creation phase whichtion, we merged R¨umliches [spatial thing] and Gebiet a supports the user in modeling the target ontology. The pa-[region], but not Auto [car] and Fortbewegungsmittel per described the underlying assumptions and discussed the[means of travel]. methodology. The number of suggestions provided for the third situation Future work includes the closer integration of the FCA–can be quite high. There are three queries which present only M ERGE method in the ontology engineering environmentthe most signiﬁcant formal concepts. These queries can also O NTO E DIT. In particular, we will offer views on the prunedbe combined. concept lattice based on the queries described in Subsec- Firstly, one can ﬁx an upper bound for the cardinality of the tion 5.3.key sets. The lower the bound is, the fewer new concepts are The evaluation of ontology merging is an open issue [Noypresented. A typical value is 2, which allows to retain all con- and Musen, 2000 ]. We plan to use FCA–M ERGE to generatecepts from the two source ontologies (as they are generated independently a set of merged ontologies (based on two givenby key sets of cardinality 1), and to discover new binary rela- source ontologies). Comparing these merged ontologies us-tions between concepts from the different source ontologies, ing the standard information retrieval measures as proposedbut no relations of higher arity. If one is interested in having in [Noy and Musen, 2000 ] will allow us to evaluate the per-exactly the old concepts and relations in the target ontology, formance of FCA–M ERGE.and no suggestions for new concepts and relations, then the On the theoretical side, an interesting open question is theupper bound for the key set size is set to 1. extension of the formalism to features of speciﬁc ontology Secondly, one can ﬁx a minimum support. This prunes all languages, like for instance functions or axioms. The ques-formal concepts where the cardinality of the extent is too low tion is ( ) how they can be exploited for the merging process,(compared to the overall number of documents). The default and ( ) how new functions and axioms describing the inter-is no pruning, i. e., with a minimum support of 0 %. It is also play between the source ontologies can be generated for thepossible to ﬁx different minimum supports for different car- target ontology.dinalities of the key sets. The typical case is to set the min-imum support to 0 % for key sets of cardinality 1, and to a Referenceshigher percentage for key sets of higher cardinality. This way [Chalupsky, 2000 ] H. Chalupsky: OntoMorph: A translationwe retain all concepts from the source ontologies, and gen- system for symbolic knowledge. Proc. KR ’00.erate new concepts and relations only if they have a certain [Ganter and Wille, 1999 ] B. Ganter, R. Wille: Formal Con-(statistical) signiﬁcance. cept Analysis: mathematical foundations. Springer. Thirdly, one can consider only those key sets of cardinal- [Hovy, 1998 ] E. Hovy: Combining and standardizing large-ity 2 in which the two concepts come from one ontology each. scale, practical ontologies for machine translation andThis way, only those formal concepts are presented which other uses. Proc. 1st Intl. Conf. on Language Resourcesgive rise to concepts or relations linking the two source on- and Evaluation, Granada.tologies. This restriction is useful whenever the quality of [Kifer et al, 1995 ] M. Kifer, G. Lausen, J. Wu: Logical foun-each source ontolology per se is known to be high, i. e., when dations of object-oriented and frame-based languages.there is no need to extend each of the source ontologies alone. Journal of the ACM 42. In the small example, there are no key sets with cardinal- [McGuinness et al, 2000 ] D. L. McGuinness, R. Fikes, J.ity 3 or higher. The three key sets with cardinality 2 (asgiven above) all have a support of ½½ ½ ± . In the Rice, and S. Wilder: An environment for merging and testing large Oontologies. Proc. KR ’00.larger application, we ﬁxed 2 as upper bound for the cardinal- [Neumann et al, 1997 ] G. Neumann, R. Backofen, J. Baur,ity of the key sets. We obtained key sets like (Telefon 1 M. Becker, C. Braun: An information extraction core[telephone], ¨ffentliche Einrichtung 2 [public in- O system for real world German text processing. Proc.stitution]) (support = 24.5 %), (Unterkunft 1 [accom- ANLP-97, Washington.modation], Fortbewegungsmittel 2 [means of travel]) [Noy and Musen, 2000 ] N. Fridman Noy, M. A. Musen:(1.7 %), (Schloß 1 [castle], Bauwerk 2 [building]) PROMPT: algorithm and tool for automated ontology(2.1 %), and (Zimmer 1 [room], Bibliothek 2 [library]) merging and alignment. Proc. AAAI ’00.(2.1 %). The ﬁrst gave rise to a new concept Tele- [Schmitt and Saake, 1998 ] I. Schmitt, G. Saake: Merging in-fonzelle [public phone], the second to a new binary rela- heritance hierarchies for database integration. Proc.tion hatVerkehrsanbindung [hasPublicTransportCon- CoopIS’98. IEEE Computer Science Press.nection], the third to a new subsumption Schloß × [Stumme et al, 2000 ] G. Stumme, R. Taouil, Y. Bastide, N.Bauwerk, and the fourth was discarded as meaningless. Pasquier, L. Lakhal: Fast computation of concept lat- tices using data mining Ttechniques. Proc. KRDB ´00,6 Conclusion and Future Work CEUR-Workshop Proc. http://sunsite.informatik.rwth- aachen.de/Publications/CEUR-WS/FCA–M ERGE is a bottom-up technique for merging ontolo- [Wille, 1982] R. Wille: Restructuring lattice theory: an ap-gies based on a set of documents. In this paper, we described proach based on hierarchies of concepts. In: I. Rivalthe three steps of the technique: the linguistic analysis of the (ed.): Ordered sets. Reidel, Dordrecht, 445–470.texts which returns two formal contexts; the merging of the