Towards an automatic semantic integration of information


Published on

With their expanding information assets and the increasing importance of the knowledge factor, organizations are increasingly challenged to efficiently support knowledge management processes with appropriate integration and retrieval technologies. Besides traditional information retrieval approaches, the use of semantic technologies like Topic Maps is also becoming more important. This paper proposes a technology framework for the automatic semantic integration of information. Based on various information repositories, topics and topic associations are created automatically in real time. In addition, the first results from a proof of concept in conjunction with the European company EADS provide further insights into the proposed framework's applicability in practice.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Towards an automatic semantic integration of information

  1. 1. Fourth International Conference on Topic Maps – Research and Applications (TMRA 2008) Towards an Automatic Semantic Integration of Information Dr. Jörg Wurzer, iQser AG Prof. Dr. Stefan Smolnik, European Business School (EBS) Leipzig, October 16, 2008
  2. 2. Agenda • Status quo and motivation • New paradigm: information access by context • Proof of Concept at EADS • Technical architecture • Analysis process & queries • Further research & questions
  3. 3. Motivation • The quantity of digital information is still growing. IDC 2008: 60% per year • Information is dispersed over documents and various applications/databases • Growing need for creating knowledge based on available information • Profound knowledge for management decisions, completing tasks and business processes, development of new products, sales and marketing campaigns • Topic Maps can adopt the new results of research in semantic technologies
  4. 4. Todays solution I: full-text search • Advantages: easy to use, generally accepted, high user experiences • Disadvantages: • Result quality depends on the keyword selection • Results are presented as long document lists, which have to be assessed intellectually by the users • The result set does not necessarily consider the user’s intention • Each application has its own search functionality (no standards)
  5. 5. Todays solution II: directory hierarchy • Advantages: content like documents can be organized considering their meaning, context, and applicability • Disadvantages: • A manually created hierarchy provides a static view on the content, but in practice, the user need different views like on customers, projects and products dimensions • Documents are usually needed in several contexts; in this case, the documents are stored redundantly; problem: editing of all relevant documents • Directory hierarchies often reflect the current state of knowledge; however, some documents can not be included appropriately in the hierarchy
  6. 6. New paradigm: access content in any context • Automatically created topic maps of all content object types • Multiple links between the content objects establish a semantic, non- hierachical network; links are created semantically • The user chooses his focus of interest; a topic map provides the related content; example: customers are linked to projects, contracts, products, employees, and service calls. • Exploring the available data by navigating through a topic map • The content could be located in heterogeneous sources and could be stored in different formats or data models; even external content could be included
  7. 7. Proof of Concept of iQser Middleware at EADS • Devision Defence and Communcation Systems • Requirements: • Analysis of unstructured data of military information • Automatically created network of content objects • Automatically created network of main concepts • All links between documents have to be justified • Benchmark: a system with a manually created ontology
  8. 8. Application screenshot (modified data due to confidentiality)
  9. 9. Results • The created topic map provides transparent relations between documents • The terms tree provides users with an overview of the document base’s content as well as of related fundamental facts • In the Poc for EADS, the concept-tree shows that “Biber” is a bridge tank and the location of the anti-missile defense • The tree’s information quality as well as the topic map’s quality is high and can compete with that of a manually created ontology
  10. 10. Uniform Information Layer (UIL) • Single point of access for all content object types • Connector for each type of structured and unstructured content from any source (document, database, application): transforms data into a semantically typed generic content object and stores modified data back. • No redundantly stored data • Searching across heterogeneous sources including the web is possible • Users can specify search queries by means of attributes
  11. 11. Architecture of iQser Semantic Middleware
  12. 12. Analysis process • All content changes (and changes of the topic map) trigger an event • All user actions are tracked • All changes or specific amounts of user actions trigger the analysis process • Combination of three analysis methods: Syntax Analyzer, Pattern Analyzer, Semantic Analyzer • More analyzers could be included according to customers needs • Pairs of content objects can have n relations with calculated weights
  13. 13. Syntax Analyzer • Each content object can have multiple key attributes defined in the content provider • Examples: full name of a person, sender and recipient of an email, project ID • The Syntax Analyzer looks wether these key attributes are related to attributes of other content objects in the data pool
  14. 14. Pattern Analyzer • The Pattern Analyzer extracts the meaningful words according to significance • Transforms a selected set of words into a data query; the result is a list of similar content objects • The similarity is described by a weight between 0 and 1 • The Pattern Analyzer considers the context of used words in a text; it therefore reflects the different use of words in different contexts
  15. 15. Semantic Analyzer • Background: the meaning of words and sentences in a language is not defined abstractly but indirectly manifested in the daily use of language • The Semantic Analyzer evaluates the tracked user actions • If two content objects are selected, edited, or created in a sequence, the Semantic Analyzer creates a link between these objects • The weight of such a link will grow, if the same sequence of content objects occurs again • The weights of content object links can shrink, if a weight has a value larger than 1 • The topic map is self-optimizing considering the customers’ interests
  16. 16. Querying associated information • Users can specify search queries aiming at a precise result by means of • attibutes • semantic types • relations (context search) • All changes in the data pool and in the topic map can be used to trigger or control a process
  17. 17. Further research • Developing more applications as concrete use cases based on the iQser Semantic Middleware • Developing and evaluating additional analysis methods • Implementing complex queries with multiple contexts
  18. 18. Thank you! Dr. Jörg Wurzer +49 172 6680073
  19. 19. Technical details • Hardware: Pentium(R) Dual Core 3 GHz, 2 GB RAM • Software: Windows XP 2002 SP3, JBoss 4.0.4 GA, Sun JDK 1.5_12 • JBoss JVM heap size configuration: -Xms128m -Xmx512m • 3 GB of data (Word, Excel, PowerPoint, Plain Text, HTML) are indexed and analyzed in 14 hours • More than 70 % of CPU resources for I/O waits • CPU needed less than 400 MB memory