Dr. Raymond Y.K. ...
commerce, and knowledge management [25]. Since Tim BernersLee, the inventor of the
World Wide Web (Web), coined the vision...
stage, certain linguistic patterns are extracted based on the specific requirements specified
by the ontology engineers. For...
[17, 24] in previous research. Mutual Information is an information theoretic method to
compute the dependency between two...
normalized by the corresponding joint probabilities. Only a feature with an association
weight greater than a threshold µ ...
management. For this experiment, only the "Noun Noun" linguistic pattern was specified
and the parameter α = 0.5 was set. ...
Cohesiveness 4.3         0.46
                               Isolation          4.2   0.60
formal concept analysis. Journal of Artificial Intelligence Research, 24:305–339, 2005.
 [5] The World Wide Web Consortium....
Information Retrieval, pages 206–213. ACM, 1999.
[23] C. Shannon. A mathematical theory of communication. Bell System Tech...
Upcoming SlideShare
Loading in …5



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. ONTOLOGY EXTRACTION FOR BUSINESS KNOWLEDGE MANAGEMENT Dr. Raymond Y.K. Lau Department of Information Systems City University of Hong Kong Tat Chee Avenue, Kowloon Hong Kong Email: raylau@cityu.edu.hk Phone: +852 2788 8495 FAX: +852 2788 8694 ABSTRACT Ontology plays an important role in capturing and disseminating business information (e.g., products, services, relationships of businesses) for effective human computer interactions. However, engineering of domain ontologies is very labor intensive and time consuming. Some machine learning methods have been explored for automatic or semi-automatic discovery of domain ontologies. Nevertheless, both the accuracy and the computational efficiency of these methods need to be improved to support large scale ontology construction for real-world business applications. This paper illustrates a novel domain ontology discovery method which exploits contextual information of the knowledge sources to construct domain ontology with better quality. The proposed ontology discovery method has been empirically tested in an e-Learning environment and the experimental results are encouraging. Keywords: Domain Ontology, Ontology Discovery, Statistical Learning, Knowledge Management. INTRODUCTION Knowledge has been recognized as the most important corporate asset and it is the key for organizations to achieve sustainable competitive advantage. Knowledge management is a collection of processes that govern the creation, dissemination, and utilization of knowledge [15, 16]. To be able to effectively manage the intellectual capital, businesses need an effective approach to identify and capture information and knowledge about business processes, products, services, markets, customers, suppliers, and competitors, and to share this knowledge to improve the organizations’ goal achievement. Ontologies allow domain knowledge such as products, services, markets, etc. to be captured in an explicit and formal way such that it can be shared among human and computer systems. The notion of ontology is becoming very useful in various fields such as intelligent information extraction and retrieval, cooperative information systems, electronic 1
  2. 2. commerce, and knowledge management [25]. Since Tim BernersLee, the inventor of the World Wide Web (Web), coined the vision of a Semantic Web [1] in which background information of Web resources is stored in the form of machine processable metadata, the proliferation of ontologies has under tremendous growth. The success of Semantic Web relies heavily on formal ontologies to structure data for comprehensive and transportable machine understanding [11]. Although there is not a universal consensus on the definition of ontology, it is generally accepted that ontology is a specification of conceptualization [7]. In other words, ontology is a formal representation of concepts and their interrelationships. It provides a view of the world that we wish to represent for some purposes [18]. Ontology can take the simple form of a taxonomy (i.e., knowledge encoded in a minimal hierarchical structure) or a vocabulary with standardized machine interpretable terminology supplemented with natural language definitions. On the other hand, the notion of ontology can also be used to describe a logical domain theory with very expressive, complex, and meaningful information. Ontology is often specified in a declarative form by using semantic markup languages such as RDF and OWL [5]. Although ontologies are useful in many areas, the engineering of ontologies turns out to be very expensive and time consuming. Therefore, many automatic or semiautomatic ontology engineering techniques have been proposed. Automated ontology discovery is vital for the success of ontology engineering because it deals with the knowledge acquisition bottleneck which is a classical knowledge engineering problem. Although fully automatic construction of perfect domain ontology is beyond the current stateof- the-art, we believe that the automatic ontology extraction method illustrated in this paper can assist ontology engineers to build domain ontology quicker and more accurately. Some learning techniques have been applied to the extraction of domain ontology [2, 6, 22]. Nevertheless, these methods are still subject to further enhancement in terms of computational efficiency and accuracy. One of the ways to improve domain ontology extraction is to exploit contextual information from the knowledge sources such as the Internet news about business products, services, and markets. As domain ontology captures domain (context) dependent information, an effective extraction method should exploit contextual information in order to build relevant ontologies. AN OVERVIEW OF THE ONTOLOGY DISCOVERY METHODOLOGY Figure 1 depicts the proposed methodology of context-sensitive domain ontology extraction. A text corpus is parsed to analyze the lexico-syntactic elements. For instance, stop words such as “a, an, the” are removed from the source documents since these words appear in any contexts and they cannot provide useful information to describe a domain concept. For our implementation, a stop word file is constructed based on the standard stop word file used in the SMART retrieval system [20]. Lexical pattern is identified by applying Part of Speech (POS) tagging to the source documents and then followed by token stemming based on the Porter stemming algorithm [19]. We refer to the WordNet lexicon [13] to tag each word during this process. During the linguistic pattern filtering
  3. 3. stage, certain linguistic patterns are extracted based on the specific requirements specified by the ontology engineers. For example, the ontology engineers may only focus on the “Noun Noun” and “Adjective Noun” patterns instead of all the linguistic patterns. This is in fact a good way to gain computational efficiency by reducing the number of patterns for further statistical analysis. In addition, to extract relevant domain specific concepts, the appearances of concepts across different domains should be taken into account. The basic intuition is that a concept frequently appears in a specific domain (corpus) rather than many different domains is more likely to be a relevant domain concept. The statistical Token Analysis step employs the information theoretic measure to compute the cooccurrence statistics of the targeting linguistic patterns. Finally, taxonomy of domain concepts is developed according to the subsumption based fuzzy computational method. The details of the proposed ontology extraction method will be discussed in Section 4. 0100090000037c00000002001c00000000000400000003010800050000000b0200000000 050000000c02f806170e040000002e0118001c000000fb021000070000000000bc0200000 0860102022253797374656d0006170e0000f8bf1100fce2d430481a310a0c020000170e00 00040000002d0100001c000000fb02a8ff0000000000009001000000000440001254696d6 573204e657720526f6d616e0000000000000000000000000000000000040000002d01010 00400000002010100040000002d010100050000000902000000020d000000320a6000000 00100040000000000100ef50620b64b00040000002d010000030000000000 Figure 1: Context Sensitive Ontology Extraction Process LEXICO-SYNTACTIC AND STATISTICAL ANALYSIS After standard document preprocessing such as stop word removal, POS tagging, and word stemming [21], a windowing process is conducted over the collection of documents. This makes our method quite different from the approach developed by Sanderson and Croft [22] which does not take into account the proximity between tokens. The proximity factor is a key to reduce the number of noisy term relationships. For each document (e.g., Net news, Web page, email, etc.), a virtual window of δ words is moved from left to right one word at a time until the end of a sentence is reached. Within each window, the statistical information among tokens is collected to develop collocational expressions. Such a windowing process has successfully been applied to text mining before [9]. The windowing process is repeated for each document until the entire collection has been processed. According to previous studies, a text window of 5 to 10 terms is effective [8, 17], and so we adopt this range as the basis to perform our windowing process. To improve computational efficiency and filter noisy relations, only the specific linguistic pattern (e.g., Noun Noun, and Adjective Noun) defined by an ontology engineer will be analyzed. For statistical token analysis, Mutual Information (MI) is adopted as the basic computational method. Mutual Information has been applied to collocational analysis 3
  4. 4. [17, 24] in previous research. Mutual Information is an information theoretic method to compute the dependency between two entities and is defined by [23]: Pr( ti , t j ) MI ( ti , t j ) = log 2 Pr( ti ) Pr(t j ) (1) MI (t i , t j ) Pr(ti , t j ) where is the mutual information between term ti and term tj. is the joint probability that both terms appear in a text window, and Pr(t i ) is the probability wt that a term ti appears in a text window. The probability Pr(t i ) is estimated based on w where wt is the number of windows containing the term t and |w| the total number of Pr(ti , t j ) windows constructed from a textual database (i.e., a collection). Similarly, is the fraction of the number of windows containing both terms out of the total number of windows. We propose Balanced Mutual Information (BMI) to compute the association weights among tokens. This method considers both term presence and term absence as the evidence of the implicit term relationships. Ass(t i , t j ) ≈ BMI (t i , t j ) Pr(t i , t j ) = α × Pr(t i , t j ) log 2 ( + 1) + Pr(t i ) Pr(t j ) Pr( ¬t i , ¬t j ) (1 − α ) × Pr( ¬t i , ¬t j ) log 2 ( + 1) Pr( ¬t i ) Pr( ¬t j ) (2) where Ass(ti, tj) is the association weight between term ti and term tj. Such an association Pr(ti , t j ) value is approximated by the BMI score. is the joint probability that both terms Pr( ¬t i , ¬t j ) appear in a text window, and is the joint probability that both terms are absent in a text window. The factor α > 0.5 is a weight assigned to the positively associated mutual information. As it is counterintuitive to have a zero BMI value if two Pr(t i , t j ) Pr(t i ) Pr(t j ) terms always appear Pr(ti,tj ) together in every text window, the faction is adjusted by adding the constant 1 before applying the logarithm. Since our text mining method is applied after removing stop words, such an adjustment is reasonable to capture the intuition of significant term co-occurrence. In Eq.(2), each MI value is then
  5. 5. normalized by the corresponding joint probabilities. Only a feature with an association weight greater than a threshold µ (i.e., Ass(ti, tj) > µ) will be considered a significant feature for representing a concept in a context vector. After computing all the BMI values in a collection, these values are subject to linear scaling such that each term association ∀ Ass(t i , t j ) ∈[0,1] weight is within the unit interval ti ,tj . In should be noted that the constituent terms of a concept are always implicitly included in the underlying context vector with a default association weight of 1. The final stage towards our ontology extraction method is taxonomy generation based on subsumption relations among extracted concepts. Let Spec(cx, cy) denotes that concept cx is a specialization (subclass) of another concept cy . The degree of such a specialization is derived by: ∑ Ass(t tx ∈cx ,ty ∈cy ,tx = ty x , c x ) ⊗ Ass(t y , c y ) Spec( c x , c y ) = ∑ Ass(t x , cx ) tx ∈cx (3) where ⊗ is a standard fuzzy conjunction operator which is equivalent to a minimum function. The above formula states that the degree of subsumption (specificity) of cx to cy is based on the ratio of the sum of the minimal association weights of the common features of the two concepts to the sum of the feature weights of the concept cx. For instance, if every property term of cx is also a property term of cy , a high specificity value will be derived. In general, Spec(cx, cy) takes its values from the unit interval [0, 1] and it is an asymmetric relation. Since it is a fuzzy rather than a crisp relation, Spec(cy , cx) may also be true to certain degree. When the taxonomy graph is developed, we only select the subsumption relation such that Spec(cx, cy) > Spec(cy , cx) and Spec(cx, cy ) > λ where λ is a threshold to distinguish significant subsumption relations. If Spec(cx, cy) = Spec(cy , cx) and Spec(cx, cy) > λ is established, the equivalent relation between cx and cy will be extracted. EVALUATION A small scale experiment of testing the functionality of the prototype system and the accuracy of the aforementioned domain ontology discovery method has been conducted under a e-Learning environment. A group of ten undergraduate students were recruited to try the prototype system; all of these subjects have attended a course in knowledge management before. At the beginning of the experiment, they attended a briefing session of fifteen minutes to learn the objective of this experiment and were instructed to write the most important concepts in knowledge management using concise and precise statements on an on-line discussion board. They were given thirty minutes to write their messages. After the message generation session, the coordinator of the experiment would execute the ontology discovery method to discover the domain knowledge from the on- line discussion messages which are about the writers' perception of knowledge 5
  6. 6. management. For this experiment, only the "Noun Noun" linguistic pattern was specified and the parameter α = 0.5 was set. Each subject could then view the system generated ontology on-line. Following the viewing session, a questionnaire was distributed to each subject to let them assess if the ontology could really reflect their understanding about the chosen topic. Our questionnaire was developed based on the instrument employed by [3]. It I included the assessment of the following factors: Accuracy - Whether the concepts and relationships shown at the taxonomy are correct; Cohesiveness - Whether each concept at the taxonomy is unique and not overlapped with one another; w Isolation - Whether the concepts at the same level are distinguishable and not subsume one another; o Hierarchy - Whether the taxonomy is traversed from broader concepts at the higher levels to narrow concepts at the lower level; l Readability - Whether the concepts at all levels are easy to be comprehended by human; A five point semantic differential scale from very good (5), good (4), average (3), bad (2), to very poor (1) is used to measure these variables. In general, a score close to 5 indicates that the automatically generated ontology is with good quality and it correctly reflects the mental state of the subjects. The average scores pertaining to various factors are shown in Table 1. The overall mean score is 4.3 with a standard deviation of 0.61; the overall mean score is close to the maximum 5. There are 26 key concepts and 97 relationships spreading at 3 levels. From this initial experiment, it is shown that our domain ontology discovery method is promising since it can automatically discover the prominent concepts and their relationships from a set of messages written in natural language. Mean STD Accuracy 4.4 0.66
  7. 7. Cohesiveness 4.3 0.46 Isolation 4.2 0.60 Hierarchy 4.0 0.63 Readability 4.5 0.67 Overall 4.3 0.61 Table 1: Qualitative Evaluation of Automatically Discovered Ontology CONCLUSIONS The manipulation and exchange of semantically enriched business intelligence (e.g., products, services, markets, etc.) can enhance the quality of an eCommerce system and offer a high level of interoperability among different enterprise systems. Ontology certainly plays an important role in the formalization of business knowledge. However, the biggest challenge for the wide spread applications of ontologies is on the construction of these ontologies because it is a very labor intensive and time consuming process. This paper illustrates a novel automatic ontology extraction method to facilitate the ontology engineering process. In particular, contextual information of a domain is exploited so that more reliable domain ontology can be extracted. The proposed extraction method combines lexico-syntactic and statistical learning approaches so as to reduce the chance of generating noisy relations and to improve the computational efficiency. Empirical studies have been performed to evaluate the quality of the domain ontology extracted by the proposed ontology discovery method. Our preliminary experiment shows that the extracted domain ontology can accurately reflect the domain knowledge embedded in on- line discussion messages. Future work involves comparing the accuracy and the computational efficiency of our extraction method with that of the other approaches. In addition, larger scale of quantitative evaluation of our ontology extraction method will be conducted. REFERENCES [1] T. BernersLee, J. Hendler, and O. Lassila. The semantic web. Scientific American, 284(5):34–43, 2001. [2] Shan Chen, Damminda Alahakoon, and Maria Indrawan. Background knowledge driven ontology discovery. In Proceedings of the 2005 IEEE International Conference on e- Technology, eCommerce and eService, pages 202–207, 2005. [3] S. Chuang and L. Chien. Taxonomy generation for text segments: A practical webbased approach. ACM Transactions on Information Systems, 23(4):363–396, 2005. [4] P. Cimiano, A. Hotho, and S. Staab. Learning concept hierarchies from text corpora using 7
  8. 8. formal concept analysis. Journal of Artificial Intelligence Research, 24:305–339, 2005. [5] The World Wide Web Consortium. Web Ontology Language, 2004. Available from http://www.w3.org/2004/OWL/. [6] Michael Dittenbach, Helmut Berger, and Dieter Merkl. Improving domain ontologies by mining semantics from text. In Proceedings of the First AsiaPacific Conference on Conceptual Modelling (APCCM2004), pages 91–100, 2004. [7] T. R. Gruber. A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2):199–220, 1993. [8] Hongyan Jing and Evelyne Tzoukermann. Information retrieval based on context distance and morphology. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Language Analysis, pages 90–96, 1999. [9] R.Y.K. Lau. ContextSensitive Text Mining and Belief Revision for Intelligent Information Retrieval on the Web. Web Intelligence and Agent Systems An International Journal, 1(3- 4):1–22, 2003. [10] Taehee Lee, Ig hoon Lee, Suekyung Lee, Sang goo Lee, Dongkyu Kim, Jonghoon Chun, Hyunja Lee, and Junho Shim. Building an operational product ontology system. Electronic Commerce Research and Applications, 5(1):16–28, 2006. [11] Alexander Maedche and Steffen Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2):72–79, 2001. [12] Alexander Maedche and Steffen Staab. Ontology learning. In Handbook on Ontologies, pages 173– 190. 2004. [13] G. A. Miller, Beckwith R., C. Fellbaum, D. Gross, and K. J. Miller. Introduction to wordnet: An online lexical database. Journal of Lexicography, 3(4):234–244, 1990. [14] Roberto Navigli, Paola Velardi, and Aldo Gangemi. Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems, 18(1):22–31, 2003. [15] I. Nonaka. A dynamic theory of organizational knowledge creation. Organization Science, 5(1):14–37, 1994. [16] I. Nonaka and H. Takeuchi. The Knowledge Creating Company: How Japanese Companies Create the Dynamics of Innovation. Oxford University Press, New York, 1995Oxford University Press. [17] Patrick Perrin and Frederick Petry. Extraction and representation of contextual information for knowledge discovery in texts. Information Sciences, 151:125–152, 2003. [18] H. S. Pinto and J. P. Martins. Ontologies: How can they be built? Knowledge and Information Systems, 6(4):441–464, 2004. [19] M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980. [20] G. Salton. Full text information processing using the smart system. Database Engineering Bulletin, 13(1):2–9, March 1990. [21] G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGrawHill, New York, New York, 1983. [22] M. Sanderson and B. Croft. Deriving concept hierarchies from text. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in
  9. 9. Information Retrieval, pages 206–213. ACM, 1999. [23] C. Shannon. A mathematical theory of communication. Bell System Technology Journal, 27:379–423, 1948. [24] Mark A. Stairmand. Textual context analysis for information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 140–147, 1997. [25] Christopher A. Welty. Ontology research. AI Magazine, 24(3):11–12, 2003. [26] Rudolf Wille. Formal concept analysis as mathematical theory of concepts and concept hierarchies. In Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors, Formal Concept Analysis, Foundations and Applications, volume 3626, pages 1–33. Springer, 2005. [27] J. Xu and W.B. Croft. Query expansion using local and global document analysis. In Hans- Peter Frei, Donna Harman, Peter Schauble, and Ross Wilkinson, editors, Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4–11, Zurich, Switzerland, 1996. 9