Your SlideShare is downloading. ×
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Intelligent Content Management System Project Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Intelligent Content Management System Project Presentation

6,715

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
6,715
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
57
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Intelligent Content Management System Project Presentation April 2002
  • 2. IST-2001-32429 ICONS Intelligent Content Management System www.icons.rodan.pl Project Partners Rodan Systems (PL) The Polish Academy of Sciences (PL) Centro di Ingegneria Economica e Sociale (IT) InfoVide (PL) SchlumbergerSema (BE) University Paris 9 Dauphine (FR) University of Ulster (UK) Intelligent Content Management System Project Presentation Project name Intelligent Content Management System Acronym ICONS Workpackage WP9 Task T9.1 Document type report Title Intelligent Content Management System Subtitle Project Presentation Document acronym D01 Author(s) Witold Staniszkis, Nicola Leone, Pasquale Rullo, Łukasz Balcerek, Michał Śmiałek, Witold Litwin, Gérard Levy, Jules Georges, Kazimierz Subieta, Mariusz Momotko, Dorota Depowska, Janusz Charczuk, Waldemar Piszczewiat, Yaxin Bi, David Bell Reviewer(s) Annette Bleeker, Bartosz Nowicki Accepting Witold Staniszkis Location I:WP9 Project ManagementICONS WP9 T1 D01 0115.doc Version 1.15 Date April 2002 Status final version Distribution public April 2002
  • 3. Intelligent Content Management System 1.15 History of changes April 2002 History of changes Date Version Author Change description 6.4.02 1.14 Bartosz Nowicki final packaging 4.4.02 1.12 Witold Staniszkis integration of partners’ inputs for chapter 5-9 30.3.02 1.8 partners’ inputs provided Mariusz Momotko (5.5, 7, 8.3) Witold Litwin, Gérard Levy (8.1, 8.2) David Bell, Yaxin Bi (5.1-5.4) Nicola Leone, Pasquale Rullo (5.1-5.4) Kazimierz Subieta (5.6) Jules Georges (9.1) Łukasz Balcerek, Michał Śmiałek (9.2) Dorota Depowska, Waldemar Piszczewiat (6) 30.2.02 1.03 Bartosz Nowicki ICONS template applied 30.2.02 1.02 Witold Staniszkis further elaboration; work distribution among partners 1.2.02 1.01 Witold Staniszkis document creation IST-2001-32429 ICONS Intelligent Content Management System page 3/86
  • 4. Intelligent Content Management System 1.15 Executive summary April 2002 Executive summary The primary objective of the ICONS Project Presentation report is to provide a baseline platform for all ICONS project stakeholders representing the consensus of the ICONS consortium members with respect to the ICONS research and development strategy. Much effort has gone into interactions among members of the ICONS research and development community aiming at reconciliation of diverse views and specialisations in the relevant research realms. We assume that the ensuing research results may require refinements and modifications of the underlying ICONS assumptions and we plan to reflect them in the ensuing versions of the report. Hence, this report is to “live: status document reflecting the current views of the ICONS consortium. The initial effort has gone into the Knowledge Management System (KMS) feature requirements analysis in order to establish compatibility of requirements voiced by the knowledge management community and the prevailing opinions and conclusions of the on-going research work in the IT field. Our motivation has been to verify the ICONS project goals and objectives and possibly to re-orient some of the principal research and development objectives. The representative results of the management science research pertaining to intellectual capital and knowledge management have been examined. We have concentrated on the work of the Knowledge Management Consortium International [Firestone2000, McElroy1999], the seminal work in the area of learning organisations [Garvin1993] and knowledge modelling [Popper1971, Popper1977], as well as generally accepted views of Nonaka and Takeuchi [Nonaka1995] with respect to knowledge creation and dissemination processes. The principal conclusions are that the current KM needs require IT support for KM processes in order to facilitate innovation leading to enhanced competitive advantage. A mapping of the KM processes and the desirable KMS features has been established. Our findings have been confronted with the prevailing views of the IT research and development community with respect to the KMS architecture requirements. We have developed a KMS reference architecture enumerating the desirable KM features to provide a “common denominator” representation of the current IT research and development work. Principal results of the on-going European KM projects may be found in European KM Forum web site [KMForum2001]. The principal KMS feature sets include knowledge dissemination features, domain ontology features, content repository features, KMS actor collaboration features, knowledge security features, and content integration features. The KMS features role semantics with respect to the KM processes have been specified in order to confront the IT community prevailing views with those represented by management scientists. We have established that the referential KMS architecture is sufficiently powerful to provide significant enabling leverage for the KM field. The above complementary views on the KM scene provide a solid referential background for the ICONS architecture specification providing a backbone for our research and development work. We concentrate our project work on three key technological areas, namely on the Knowledge Management Technologies area, the Human/Computer Interaction (HCI) area, and the Distributed Architecture Technologies area. We further demonstrate that such approach is fully compatible with the stated ICONS project goal and objectives and that it enables us to provide the required technical support for the KMS reference architecture. The complete view of the ICONS architecture comprises additional technological areas, auxiliary to our project, namely the Content Management Technologies area and the Development Technologies area. The software modules within the auxiliary areas are input into the project, preferably as “open source” or proprietary to consortium partners to be subsequently used and/or modified within the ICONS prototype. The cross-reference between the KMS referential architecture and the proposed ICONS architecture indicating research and/or development effort needed shows completeness of the ICONS features with respect to the established requirements. Knowledge-based features are the important building block of the ICONS architecture therefore a multi- paradigm approach has been proposed. The research work on formal aspects of knowledge representation including rules and uncertainty, the Dempster-Shafer theory, and the extended relational model. Disjunctive Datalog inference engine is to be extended and integrated into the system provides principal knowledge-based platform. Procedural knowledge based on workflow specifications is to extent the Workflow Management Coalition model with the time modelling features and the CPM (Critical Path Method) modelling capabilities. Such extensions allow for enhanced support for knowledge management processes usually unsuitable for the WfMC-based process modelling approach. We proposed an advanced graphic HCI interface to support visualisation and manipulation of structural knowledge comprising semantic nets, UML relationships, and process graphs. IST-2001-32429 ICONS Intelligent Content Management System page 4/86
  • 5. Intelligent Content Management System 1.15 Executive summary April 2002 The knowledge-based capabilities are to be used in development of the intelligent content integration features to support an open ICONS content repository. The ICONS content management functions are to integrate under a unique knowledge map information resources stored internally and those stored in Web information sources, as well as in the legacy information systems and the heterogeneous databases. A wrapper-based architecture is to establish the technological base content integration. The key features of the ICONS workflow management platform are the dynamic workflow participant assignment functions, the dynamic control flow condition modification capabilities, and time modelling features. A knowledge-based support to be used within the workflow management engine is to be developed with the use of the disjunctive Datalog inference engine module. Appropriate extensions to the WfMC model will be developed. The ICONS distributed processing organisation, providing both for data and processing distribution, is to be based on the SDDS approach with appropriate extensions to meet the system requirements. Distributed processing will be enabled by the load balancing algorithms to be embedded in the ICONS control functions. The workflow process distribution and inter-operability is to be based on the distributed workflow communication and synchronisation features to be developed for the ICONS prototype. ICONS capabilities are to be demonstrated by a knowledge management application to be developed by the project team as “The NAS Best Practices Portal”. The application development cycle and techniques are to follow a KMS development methodology to be specified within the ICONS project. A preliminary analysis of the state-of-the-art in the area of KMS methodologies shows that, although sound methodological basis exists in the software engineering area, no generally accepted approach exists in the knowledge management realm. The conclusions of the report show that the proposed approach to the ICONS project research and development work is compatible with the stated project objectives. The ICONS project activities are covering the following research and development areas: (i) knowledge representation techniques and methodologies for a multimedia content repository, (ii) advanced graphic user interface design and management tools, (iii) design and implementation of efficient algorithms for management of large, distributed multimedia content repositories, and an analysis and design methodology for large, knowledge-based content repository systems. IST-2001-32429 ICONS Intelligent Content Management System page 5/86
  • 6. Intelligent Content Management System 1.15 Table of contents April 2002 Table of contents History of changes................................................................................................................................................... 3 Executive summary ................................................................................................................................................. 4 Table of contents ..................................................................................................................................................... 6 List of figures .......................................................................................................................................................... 8 List of tables ............................................................................................................................................................ 8 1. Introduction ..................................................................................................................................................... 9 1.1 Objectives ................................................................................................................................................ 9 1.2 Scope ....................................................................................................................................................... 9 1.3 Relations to other documents................................................................................................................... 9 1.4 Intended audience .................................................................................................................................... 9 1.5 Usage guidelines...................................................................................................................................... 9 1.6 Notation conventions............................................................................................................................... 9 2. The ICONS Project Goal and Objectives ...................................................................................................... 10 3. Feature Requirements of a Knowledge Management System ....................................................................... 12 3.1 Knowledge Management: A Framework for User Requirements.......................................................... 12 3.2 The KMS Reference Architecture ......................................................................................................... 18 3.2.1 Domain Ontology features............................................................................................................. 19 3.2.2 Content Repository features .......................................................................................................... 21 3.2.3 Knowledge Dissemination features ............................................................................................... 21 3.2.4 Content Integration features .......................................................................................................... 22 3.2.5 Actor Collaboration features.......................................................................................................... 23 3.2.6 Knowledge Security features ......................................................................................................... 24 4. Architecture of the Intelligent CONtent management System (ICONS)....................................................... 25 4.1 The ICONS architecture specification................................................................................................... 25 4.1.1 Development Technologies ........................................................................................................... 25 4.1.2 Content Management Technologies .............................................................................................. 26 4.1.3 Knowledge Management Technologies......................................................................................... 27 4.1.4 Human Computer Interaction Technologies.................................................................................. 28 4.1.5 Distributed Architecture Technologies .......................................................................................... 29 4.2 The ICONS architecture vs. the KMS reference architecture................................................................ 29 5. The ICONS Knowledge Representation Features ......................................................................................... 33 5.1 Requirements for Knowledge Management (KM) ................................................................................ 33 5.2 Syntax/Semantics................................................................................................................................... 33 5.3 Formal foundations of knowledge representation.................................................................................. 35 5.3.1 Rules and uncertainty .................................................................................................................... 35 5.3.2 Data Representation using Dempster-Shafer theory...................................................................... 35 5.3.3 Extended relational database model .............................................................................................. 36 5.3.4 Hyperrelations used for representing mined knowledge................................................................ 36 5.3.5 Hyperrelations as knowledge representation ................................................................................. 36 5.3.6 Metadata ........................................................................................................................................ 37 5.3.7 Sharing data ................................................................................................................................... 37 5.4 Disjunctive Logic Programming............................................................................................................ 38 5.5 Procedural knowledge representation features ...................................................................................... 43 5.6 Knowledge representation and manipulation in the graphic user interface ........................................... 45 6. The ICONS Intelligent Content Integration Features .................................................................................... 50 6.1 The ICONS Global Knowledge Schema ............................................................................................... 50 6.2 The ICONS Content Repository............................................................................................................ 51 6.3 Integration of the heterogeneous content sources.................................................................................. 51 7. The ICONS Intelligent Workflow Features................................................................................................... 53 7.1 Dynamic workflow participant assignment ........................................................................................... 53 7.2 Dynamic control flow condition definition............................................................................................ 53 7.3 Time management ................................................................................................................................. 53 7.4 Task scheduling ..................................................................................................................................... 54 7.5 Extensions with respect to the WfMC's workflow process meta-model................................................ 54 8. The ICONS Distributed Processing Organisation ......................................................................................... 55 8.1 The ICONS scalable, distributed architecture ....................................................................................... 55 8.2 The ICONS distributed processing optimisation and load balancing .................................................... 57 IST-2001-32429 ICONS Intelligent Content Management System page 6/86
  • 7. Intelligent Content Management System 1.15 Table of contents April 2002 8.3 The ICONS distributed workflow process communication and synchronisation .................................. 58 9. Demonstration of ICONS prototype capabilities........................................................................................... 60 9.1 The “Newly-associated States Best Practices” Portal............................................................................ 60 9.1.1 Introduction ................................................................................................................................... 60 9.1.2 Key Issues for Application Development ...................................................................................... 64 9.1.3 Key Success Factors ...................................................................................................................... 66 9.1.4 Remarks......................................................................................................................................... 66 9.2 The Knowledge Management System Design Methodology................................................................. 67 9.2.1 Approaches to Knowledge Management methodologies............................................................... 67 9.2.2 Requirements for defining a comprehensive KMS development methodology ............................ 67 9.2.3 The ICONS Development Methodology ....................................................................................... 70 10. Conclusions ............................................................................................................................................... 72 10.1 Compatibility with the stated ICONS project goals and objectives....................................................... 72 10.2 Overview of the ICONS project development plan ............................................................................... 72 Appendix A. List of workpackages and deliverables ............................................................................................ 76 Workpackages ................................................................................................................................................... 76 Deliverables list ................................................................................................................................................. 77 Bibliography.......................................................................................................................................................... 78 External references ............................................................................................................................................ 78 ICONS references.............................................................................................................................................. 84 Dictionary.............................................................................................................................................................. 85 IST-2001-32429 ICONS Intelligent Content Management System page 7/86
  • 8. Intelligent Content Management System 1.15 List of figures April 2002 List of figures Figure 1. The scope of KM activities in 423 corporations surveyed by KPMG [KPMG1999]............................. 12 Figure 2. The Knowledge Life Cycle (KLC)........................................................................................................ 13 Figure 3. Four processes of knowledge conversion [Nonaka1995]....................................................................... 15 Figure 4. ICONS taxonomy of knowledge. ........................................................................................................... 18 Figure 5. The Knowledge Management System reference architecture. .............................................................. 18 Figure 6. The ICONS architecture schematic model............................................................................................ 25 Figure 7. Treatment relation. ................................................................................................................................. 36 Figure 8. A hyperrelation. ..................................................................................................................................... 37 Figure 9. Architecture of the GUI module............................................................................................................. 45 Figure 10. ICONS GUI module with interfaces to databases............................................................................... 47 Figure 11. A graph of objects. ............................................................................................................................... 48 Figure 12. The idea of the user basket................................................................................................................... 48 Figure 13. Models of workflow co-operation........................................................................................................ 58 Figure 14. Main Concept of ICONS portal for NAS Best Practice. ...................................................................... 63 Figure 15. The Knowledge life cycle of the NAS Best Practices Portal. .............................................................. 65 List of tables Table 1. Cross-reference between the KM processes and the KMS features. ....................................................... 16 Table 2. Feature roles within the knowledge management processes. .................................................................. 17 Table 3. Feature requirements of a Knowledge Management System. ................................................................. 19 Table 4. The ICONS focus technological area modules and the Domain Ontology features cross reference ....... 30 Table 5. The ICONS focus technological area modules and the Content Repository features cross reference..... 30 Table 6. The ICONS focus technological area modules and the Knowledge Dissemination features cross reference. ....................................................................................................................................................... 31 Table 7. The ICONS focus technological area modules and the Content Integration features cross reference..... 32 Table 8. The ICONS focus technological area modules and the Actor Collaboration features cross reference.... 32 Table 9. Checklist of the acquis (chapters in Regular Reports). ........................................................................... 61 Table 10. Overview of Phare................................................................................................................................. 62 Table 11. Best practice taxonomy. ........................................................................................................................ 63 Table 12. Key technological issues for development of the NAS Best Practices Portal. ...................................... 66 Table 13. The ICONS project focus technological areas and the project objectives cross-reference.................... 72 Table 14. The ICONS focus technological area modules and the research stream workpackages........................ 75 IST-2001-32429 ICONS Intelligent Content Management System page 8/86
  • 9. Intelligent Content Management System 1.15 Introduction April 2002 1. Introduction 1.1 Objectives The ICONS project presentation represents a refinement of the technical project specification comprised in the ICONS project proposal and the ensuing Work Description [ICONS CONRACT] document developed as the addendum to the research contract with the European Commission. It also reflects the commitments of project partners represented in the Consortium Agreement. The primary objective is to present the current ICONS consortium views on the scope and directions of the research and development work specified in the project work description as well as on the methods and techniques to reach the stated project objectives. It is assumed that the project presentation document reconciles diverse approaches to attainment of the project objectives proposed by the project consortium partners and harmonises the initial research work on standards, research and technological terms of reference of the ICONS project. Although the preliminary ICONS architecture representing the functional scope of the project has been defined in the Work Description document [ICONS CONTRACT], a flexible approach is adopted to allow for changing views of the project team members, influenced by the ongoing research and development activities in the knowledge management field. Hence, the ICONS Project Presentation is to evolve, under the constraints of the project change management procedure [ICONS D2], to be published as new versions of the document. Each new version of the project presentation is to highlight the important changes with respect to the previous technical approach and the scope of work. The principal project change management rule indicates, that the scope of the project and the corresponding ICONS architecture may not be changed without the written consent of ICONS Project Officer representing the European Commission. 1.2 Scope The scope of this report covers the entire research and development work currently under way in the ICONS project. 1.3 Relations to other documents This report provides a baseline specification of the principal directions of the research and development work to be developed within the ICONS project. In this sense the report represent the consensus of the ICONS consortium members regarding the ICONS architecture and principal features as well as with respect to responsibilities and development tasks comprised in the project development plan. All ensuing technical documents to be produced within the ICONS project should not contradict the design decisions and research assumptions comprised in this report. Should there arise a need to modify the underlying assumptions of the ICONS project development philosophy, appropriate changes will be applied to this report to be published as the succeeding version. 1.4 Intended audience The intended audience comprises all members of the ICONS project consortium as well as the representatives of the European Commission monitoring and evaluating the progress of the project research and development work. 1.5 Usage guidelines The contents of the ICONS Project Presentation must be known to and evaluated by all by all members of the project team. Since the document is to represent the current consensus of the ICONS consortium, it is mandatory that no important deviations from the presented ICONS architecture and the principal technical directions, as represented in the current version of this document, are allowed. 1.6 Notation conventions No special notation conventions are used in this report. IST-2001-32429 ICONS Intelligent Content Management System page 9/86
  • 10. Intelligent Content Management System 1.15 The ICONS Project Goal and Objectives April 2002 2. The ICONS Project Goal and Objectives Turning information into knowledge has been one of the principal goals of advanced information systems developed in all realms of social and economic life of modern societies. Terms like “knowledge management”, “knowledge engineering” and “knowledge bases” became ubiquitous in corporate board rooms as well as IT departments. Easy access to information enabled by the explosion of Internet technologies has created new problems related to exponentially growing wealth of information sources flooding the information system users. Many advanced information systems are focused on knowledge bases comprising large collections of facts, rules, and heuristics pertaining to a specific application domain. Such knowledge bases are typically divided into two principal parts, namely the content base comprising repositories of mutlimedia information objects and ontologies representing formal knowledge pertaining to the corresponding application domain. Our goal is to develop a prototype of an Intelligent CONtent management System (ICONS) supporting a uniform, knowledge-based access to distributed information resources available in the form of web pages, pre-existing heterogeneous databases (formatted, text, and multimedia), business process specifications and operational information, as well as legacy information processing systems. The principal objectives of our research and development project are to obtain and present novel results in the areas of knowledge representation and inference, heterogeneous information integration, and user- friendly interfaces based on advanced information architecture techniques. The overall approach of the ICONS project is to: (a) provide effective methods for analysing and modelling, (b) develop practical tools for exploiting and using, (c) assess in a pilot system the usefulness of ... an intelligent content management system with advanced knowledge management capabilities integrating internal content repositories with external heterogeneous information sources. To achieve these overall objectives four streams of technical work can be identified comprising the above operational goals: Objective 1: Development of knowledge representation techniques and methodologies for a multimedia content repository. The following specific research problems must be addressed in order to develop the knowledge representation capabilities of ICONS: (a) Application of semantic data models (UML) and deductive data base mechanisms as the domain ontology specification tool. (b) Extraction of knowledge embedded in XML documents and in the associated RDF specifications. (c) Representing knowledge embedded in the schemata of pre-existing heterogeneous databases and legacy information processing system outputs. (d) Design and implementation of an efficient, non-procedural content management framework providing content and knowledge model definition and query capabilities. (e) Development of mechanisms for procedural knowledge definition and its further exploitation in the area of effective knowledge and business processes management. Results obtained in the above research areas will be embedded in the ICONS prototype and they will be verified in the pilot application environment. The principal research approach is to create synergies by integrating known research results in novel configurations and contexts, as well as extending known results in order to meet the identified new requirements. Objective 2: Development of user interface design and management tools meeting the requirements of the information architecture methodology The user interface requirements fall into three distinct areas, namely the user tool set and dialogue model, the content presentation model, and the graphical knowledge presentation and manipulation model. All of the above presentation models must incorporate personalisation capabilities in order to enable dynamic adjustments to changing user preferences discerned from the system usage patterns. IST-2001-32429 ICONS Intelligent Content Management System page 10/86
  • 11. Intelligent Content Management System 1.15 The ICONS Project Goal and Objectives April 2002 The information architecture methodologies and techniques are considered to be the prime requirements for design and implementation of the ICONS user interface management functionality. The multi-disciplinary research involves skills of industrial designers, psychologists, and computer scientists. The ICONS prototype and pilot application work is to provide a realistic test-bed for the proposed user interface management techniques. Objective 3: Design and implementation of efficient algorithms for management of large, distributed multimedia content repositories There are two dimensions of the ICONS content distribution. The first pertains to distribution of the system content repository comprising the Content Base and the Ontology Base and the hierarchical storage management processes among the ICONS servers. The second concerns integration of external information sources, such as pre-existing heterogeneous databases, legacy information processing systems, and web information resources. Distribution of the ICONS components among the system servers requires efficient load balancing algorithms inter-operational with the selective content and ontology replication mechanism. Research will also concentrate of adaptive data cashing techniques and the multi-criterial data distribution optimisation. Integration of the external information resources is to be performed with the use of the XML wrapper technology. Wrapper programs producing required XML envelopes for extracted data are to be enriched with RDF specifications resulting from extracting semantics from database schemata, in the case of the external databases, or representing semantics, in the case of the legacy information processing system outputs. The wrapper programs will be generated in the form of Enterprise Java Bean modules comprising the necessary query statements. Objective 4: Develop an analysis and design methodology for large, knowledge-based content repository systems. The multimedia content repositories with knowledge representation capabilities require a novel approach to the analysis and design methodology. An application development life-cycle and the associated methods and techniques will be specified and a pilot application of ICONS will be developed. The pilot application is to be the “Best practices of PHARE, SAPARD, and ISPA projects developed within the Newly Associated States” content repository accessible on the Internet. The aim is to present the viability of the proposed methodology and to provide a starting point for the clearly needed knowledge source. IST-2001-32429 ICONS Intelligent Content Management System page 11/86
  • 12. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 3. Feature Requirements of a Knowledge Management System Our objective is to confront the contemporary requirements of the fast growing knowledge management field with the current views on the KMS feature architectures as well as with the already existing IT technology pertaining to the KM realm. 3.1 Knowledge Management: A Framework for User Requirements The knowledge management field has been growing dynamically fuelled by intensification of the global competition in all principal areas of the world economy. The state of the KM field at the turn of centuries is illustrated by a study of 423 corporations performed by KPMG (KPMG1999). The scope of the KM activities in the study sample is presented in Figure 1. KM has been abolished KM is not currently planned 1% 19% KM is currently in operation 34% KM is currently considered 17% KM is currently being implemented 29% Figure 1. The scope of KM activities in 423 corporations surveyed by KPMG [KPMG1999]. High interest in the field was evident (80% of corporations in some stage of KM activities) at the time of the study and judging by the increasing number of trade conferences and exhibitions pertaining to the KM field the discipline has reached maturity. The principal questions from our point of view, to be discussed in this section, are (i) what is the role of IT as the enabling technology?, and (ii) what extension of the currently available information management platforms is required in order to meet the growing requirements of the KM field? The second question has been the root of the ICONS project proposal, so the proper identification of the added value for the KM field emerging from the project is of paramount importance to the project consortium. A critical appraisal of the state-of-the-art of the content management system area, massively claiming to provide direct support for KM, should provide the initial vantage point for evaluation of the ICONS project contribution. We commence with a brief overview of the requirements of the KM field identified in a number of research studies performed in the realm of the European KM Forum [KMForum2001]. We also consider views of the US knowledge management research community comprised in the research papers representing the current views of the Knowledge Management Consortium International (KMCI) [Firestone2000, McElroy1999] and focusing the KM research and practice in the USA [Garvin1993, Quinn1996, Baek1999, Becker1999, Coleman1999, Davenport1999, Huntington1999]. The common fallacy of the IT side of the KM scene is focusing on the purely technological view of the field with the tendency to highlight features that are already available in advanced content management systems. Such systems are commonly referred to as corporate portal platforms or, more to the point, as the knowledge portal platforms. From the KM perspective, as discussed in [McElroy1999], such claims may be justified only with respect to a narrow view of the field focusing on distribution of existing knowledge throughout the organisation. The above views, called by some authors the “First Generation Knowledge Management (FGKM)” or “Supply-side KM”, provides a natural link into the realm of currently used content management IST-2001-32429 ICONS Intelligent Content Management System page 12/86
  • 13. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 techniques, such as groupware, information indexing and retrieval systems, knowledge repositories, data warehousing, document management, and imaging systems. We shall briefly refer to existing content management technologies in the ensuing sections of the report to show that, within the above narrow view, the existing commercial technologies meet most of the user requirements. With the growing maturity of the KM field the emerging opinions are that IT support for accelerating the production of new knowledge is a much more attractive proposition from the point of view of gaining the competitive advantage. Such focus, exemplified in stated feature requirements for so called “Second Generation Knowledge Management (SGKM)”, is on enhancing the conditions in which innovation and creativity naturally occur. This does not mean that such FGKM required features as systems support for knowledge preservation and sharing are to be ignored. A host of new KM concepts, such as knowledge life cycle, knowledge processes, organisational learning and complex adaptive systems (CAS), provide the underlying conceptual base for the SGKM, thus challenging the architects of the new generation Knowledge Management Systems (KMS). The Knowledge Life Cycle (KLC), developed within the KMCI sponsored research [Firestone2000], provides us with the high-level feature requirements abstraction to be used as the starting point for evaluation of the ICONS architecture. The KLC as proposed by KMCI is presented in Figure 2. Knowledge Knowledge Knowledge Organizational Knowledge Production Claims Validation Knowledge Integration •Individual and group interaction •Knowledge claim peer review •Knowledge sharing and transfer •Data/Info acquisition •Application of validation criteria •Teaching and training •New knowledge claims •Weighting of value in practice •Operationalizing new knowledge •Initial knowledge codification •Formal knowledge codification •Production of knowledge artifacts Experiental feedback loop Figure 2. The Knowledge Life Cycle (KLC). The concepts underlying the KLC model of knowledge management comprise the notion of a Natural Knowledge Management System (NKMS) defined in [Firestone2000] as “the on-going, conceptually distinct, persistent, adaptive interaction among intelligent agents: (a) whose interaction properties are not determined by design, but instead emerge from the dynamics of the enterprise interaction process itself, (b) that produces, maintains, and enhances the knowledge base produced by the interaction”. The above definition of the knowledge management system fits the notion of a complex adaptive system (CAS) defined as “a goal-directed open system attempting to fit itself to its environment and composed of interacting adaptive agents described in terms of rules applicable with respect to some specified class of environmental inputs” [Holland1995]. In order to keep compatibility with our project terminology we shall distinguish two classes of actors interacting within the KM environment; human beings called employees or knowledge workers, and knowledge-based computer programs called intelligent agents. A thorough discussion of the intelligent agent technology may be found in [Baek1999] while a taxonomy of intelligent agent knowledge-based features is presented in [Huntington1999]. The Knowledge Base (KB) of the system is “the set of remembered data, validated propositions and models (along with metadata related to their testing), refuted propositions and models (along with metadata related to their refutation), metamodels, and (if the system produces such an artifact) software used for manipulating these, pertaining to the system and produced by it” [Firestone2000]. IST-2001-32429 ICONS Intelligent Content Management System page 13/86
  • 14. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 A knowledge base, not necessarily meant as the IT-related concept, constitutes the principal element of any knowledge management system and therefore requires a more detailed consideration. There are emerging schools of thought, deviating from the popular definition of knowledge as the “justified, true belief” [Goldman1991] in several important aspects. First of all, the knowledge base is to comprise justified knowledge, where justification is specific to the validation criteria used by the system (note, that such validation criteria may vary from organisation to organisation), and, although the definition is consistent with the idea, that individual knowledge is a particular kind of belief, the notion of belief extends beyond cognition alone to evaluation. The concept of the learning organization, defined in [Garvin1993] as “an organization skilled at creating, acquiring, and transferring knowledge, and at modifying its behaviour to reflect new knowledge and insights”, provides an important context for the KMS feature analysis. Garvin introduces five main activities, acting as the building blocks of a learning organization, namely; “systematic problem solving, experimentation with new approaches, learning from one’s own experience and past history, learning from experiences and best practices of others, transferring knowledge quickly and efficiently throughout the organization”. Attributes of a learning organization, important for management of professional intellect, have been identified in [Quinn1996]. The intellectual capital of an organization comprises such elements as: cognitive knowledge (know what) – the basic mastery of a discipline that professionals achieve through extensive training and certification, advanced skills (know how) – the ability to apply the rules of a discipline to complex real-world problems, systems understanding (know why) – deep knowledge of the web of cause-and-effect relationships underlying a discipline, and self-motivated creativity (care why) – the will, motivation and adaptability for success. An important notion discriminating between the content management systems and the knowledge management systems is that of the domain ontology defined in [Becker1999] as “an explicit conceptualization model comprising objects, their definitions, and relationships among objects”. A well-defined terminology, called taxonomy [Letson2001], is used within a particular ontology to describe the classes of objects, their properties, and relationships. Domain ontologies are important elements of knowledge management systems, quite similar to the conceptual schema of the database management model, serving to organize the knowledge of an organization. Thus, the domain ontology management features of a knowledge management system directly pertain to modelling of knowledge. We concentrate on two distinct, but compatible, views pertaining to modelling of knowledge, represented by the seminal work of Popper [Popper1971, Popper1977], and by the generally accepted views of Nonaka and Takeuchi [Nonaka1995]. The above results directly relate to the KLC model, thus providing a base for the ensuing discussion of feature requirements for a knowledge management system. Popper’s views the body of knowledge existing in an organisation as three distinct worlds, namely; (a) the first world (World 1) made of material entities: things, oceans, towns etc., (b) the second world (World 2) made of psychological objects and emergent predispositional attributes of intelligent systems: minds, cognitions, beliefs, perceptions, intentions, evaluations, emotions etc., (c) the third world (World 3) made of abstractions created by the second world acting upon the first world objects. This approach provides us with a two-tier view of knowledge: 1. Knowledge viewed as a belief is a second world predispositional object. This pertains to such situations, where individuals, groups of individuals, and organizations, hold beliefs (subjectively considered to be true), that are immediate precursors of their decisions and actions. The predispositional knowledge is “personal” in the sense that other individuals have no direct access to one’s own knowledge in full detail and therefore can not either “know it” as their own belief, or validate it. 2. Knowledge viewed as validated models, theories, arguments, descriptions, problem statements, etc., is a third world linguistic object. One can talk about the truth, or nearness to the truth of such knowledge, defined as the above third world objects in terms of being closer to truth then those hold by the competitors. This kind of knowledge is not an immediate precursor of decisions and actions, it rather impacts the second world beliefs and these, in turn, impact the behaviour of the KMS actors. Such knowledge is objective, in the sense that it is not agent specific and is shared among agents. The above characteristics bring to the forefront the issue of community validation of the shared knowledge. Looking at the above two distinct categories of knowledge, we may conclude, that the third world knowledge is the principal product of a knowledge management system. Whereas the knowledge of the individuals in a social organisation is not produced by the system alone, although it may be strongly influenced by interaction with the objective knowledge represented by the third world abstractions. IST-2001-32429 ICONS Intelligent Content Management System page 14/86
  • 15. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 Importance of a widely recognized distinction between tacit and explicit knowledge, first introduced by Polonyi [Polonyi1966], is emphasized by the work of Nonaka and Takeuchi [Nonaka1995]. The principal idea is that knowledge is created by interaction between tacit and explicit knowledge presented schematically in Figure 3. Note, that the above two knowledge base models are compatible, since the tacit vs. explicit knowledge distinction corresponds closely to Popper’s subjective (World 2) vs. objective knowledge (World 3) distinction. Considering the knowledge categorisations and transformations from the organizational knowledge point of view, constituting the principal knowledge management perspective, we view the following aspects of the model as crucial from the knowledge creation process perspective: Tacit Explicit knowledge To knowledge Tacit knowledge Socialisation Externalisation From Explicit Internalisation Combination knowledge Figure 3. Four processes of knowledge conversion [Nonaka1995]. 1. Transformation from tacit to explicit knowledge. The process corresponds to the externalisation transformation of Nonaka and Takeuchi and that of abstracting the objective knowledge, or transformation of World 2 beliefs into the World 3 objective knowledge, in Popper’s model. The process corresponds to the knowledge claim formulation in the KLC. However, in view of the KLC model, knowledge claims do not constitute the “objective knowledge’ until they successfully pass the knowledge validation process. Only then the validated knowledge claims become the organisational knowledge, after having been formalised and edited in the knowledge integration process of the KLC. 2. Transformation from tacit to tacit knowledge. The process corresponds to the socialisation transformation of Nonaka and Takeuchi as well as to sharing of “personal” knowledge by intelligent agent interactions implied in Popper’s approach. The process, although does not create “new” organisational knowledge may be crucial to maintaining and enhancing the competitive advantage of many creative organisations (e.g. a software company). This transformation fits into the knowledge production process of the KLC. 3. Transformation from explicit to tacit knowledge. The process corresponds to the internalisation transformation of Nonaka and Takeuchi and to the “impact” of the objective knowledge on the World 2 beliefs, and consequently on the organizational decision making process, presented in Popper’s model. This transformation matches closely the knowledge operationalization step of the knowledge integration process of the KLC. Although no new knowledge is produced at this stage, the transformation may be very important for highly innovative organizations. We do not consider the explicit knowledge combination to be relevant to knowledge management, since either a mechanical process of external knowledge takes place through some mechanism of information categorisation, or an intelligent agent must be involved in inferring new knowledge from a combination of external knowledge artifacts. In the latter case, other transformations, namely the internalisation-externalisation path, would have to be followed. A distinction must be made at this stage between knowledge management, dealing with the above classes of structural and procedural knowledge, and information derived from information systems supporting the daily operation of an organisation. Data and results of such information systems are considered, for the sake of our IST-2001-32429 ICONS Intelligent Content Management System page 15/86
  • 16. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 KMS feature requirement analysis, to be a representations of Popper’s World 1 entities and their relationships and are, therefore considered merely objects of the KMS actors’ activities and decisions. A similar view is taken with respect to ad hoc or unstructured business processes with flows determined by subjective knowledge of an intelligent agent, rather then by a validated artifact of objective knowledge. An artifact of the objective procedural knowledge may be, for example, a formal workflow definition controlling execution of all processes belonging to a given class. The above discussion sets the stage for an analysis of the principal feature requirements pertaining to the distinct knowledge management processes of the KLC and to the characteristics of the knowledge transformations underlying the knowledge production process. Note, that the KMS features are technological categories providing a taxonomy for user functions viewed collectively as the KMS architecture and, as such, they should be discussed in the context of the knowledge management processes present in the KLC. We relate the KMS features to the knowledge management processes in Table 1. KLC Knowledge Knowledge Knowledge KMS features Production (KP) Validation (KV) Integration (KI) Domain Ontology (DO) DO-KP DO-KV DO-KI Content Repository (CR) CR-KP CR-KV Knowledge Dissemination (KD) KD-KP KD-KV KD-KI Content Integration (CI) CI-KP CI-KV Knowledge Security (KS) KS-KI Actor Collaboration (AC) AC-KP AC-KV AC-KI Table 1. Cross-reference between the KM processes and the KMS features. The user functions clustered in the principal KMS features may play varying support roles within the knowledge management processes. Collectively, the sum of user requirements for a given principal feature, defined within the distinct knowledge management processes, represents the user requirement set for a given principal KMS feature. We discuss the support role semantics corresponding to the principal KMS features in Table 2. The principal KMS features serve as the basic building blocks for the reference KMS architecture presented in the ensuing section. Feature role Feature role semantics DO-KP The domain ontology functionality supports: 1. The externalisation transformation by providing the KMS actor with the means for the initial knowledge codification during formulation of knowledge claims. Codification is performed on both declarative and procedural knowledge. 2. Referencing the content artifacts providing supporting evidence or providing the fact base for knowledge inference. The reference information provides a knowledge map serving as the principal access path to the content repository. DO-KV The domain ontology functionality supports: 1. The formal knowledge codification pertaining to the validated knowledge claims. 2. The formal specification of the models and rules supporting the knowledge claim screening and validation activities, in particular those involving complex networks of experts. DO-KI The domain ontology functionality supports: 1. The internalization transformation by providing means to interpret and learn from objective knowledge as well as to find reference to supporting evidence exemplified in the real world cases comprised in the content repository. 2. The socialization transformation by providing means to find reference to peer expertise and work results, including formulation of knowledge claims, thus fostering interaction between the KMS actors. CR-KP The content repository comprises all content artifacts, actual and virtual, that support the daily operation of an organization. In this sense, the content repository provides the principal platform of information processing support for the knowledge worker (a KMS actor that uses and/or produces knowledge) activities. The knowledge map, provided by the KMS domain ontology, defines the structure and scope of the content repository. IST-2001-32429 ICONS Intelligent Content Management System page 16/86
  • 17. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 CR-KV The content repository provides the body of supporting evidence as well as the documentation means for the knowledge claim validation activities. Information comprised in the content repository may be used and processed during the normal activities of knowledge workers and it may be the basis for new knowledge claim formulations. CR-KI N/A KD-KP The body of organisational knowledge, formally codified in the domain ontology, and supported by information comprised in the content repository, must be accessible to the knowledge workers in order to influence their subjective beliefs and predispositions (tacit knowledge) and thus to impact their activities and decisions. The quality of systems support for this process determines the efficiency of the knowledge externalization transformation fundamental for the knowledge production process. KD-KV The knowledge claim validation process may heavily depend on the existing body of information, accessible through the content repository, as well as on the already validated and integrated objective knowledge pertaining to the subject domain. The validation process typically involves complex, and variable, interactions among experts drawing upon declarative as well as procedural knowledge. The quality of systems support, as in the case above, is of paramount importance to the efficiency of the validation process, which, additionally, must be supported by complex and flexible workflow procedures representing the procedural knowledge. KD-KI The dissemination functionality supports the principal facets of the knowledge integration process, namely the knowledge sharing and transfer, as well as teaching and training. Both the codified objective knowledge and the supporting information must be made available. CI-KP Information represented in content artifacts may, either be created and retained in the content repository, or may be derived from heterogeneous information sources, usually maintained by external information systems. The derived content artifacts may be stored in the repository or they may be materialized on demand by the appropriate interaction with the external source. The content integration functionality entails selection and retrieval of structured and semi-structured information, homogenization into a common content model, and derivation of semantics into the domain ontology representations. CI-KV Same semantics as above. CI-KI Same semantics as above. KS-KP N/A KS-KV N/A KS-KI The organisational knowledge comprised in the KMS, both in the form of the codified objective knowledge artifacts, and of the supporting information artifacts, represents an important part of the intellectual capital. Hence the system integrity and privacy must be maintained. AC-KP Interaction of knowledge workers is the basis of socialization processes. Interaction may be spontaneous, or it may result from a, more or less formally, specified and supported procedure. Automatic support for such interactions may vary from typical groupware functions, such as chat rooms and messaging, to advanced ontology-based workflow procedures. An important by-product of automatic support may be the possibility to capture operational metrics characterising the knowledge production process. AC-KV Knowledge claim validation may entail interactions within a complex network of experts, both internal and external to the organisation, using a variety of information processing environments. As in the case above, supporting expert interaction, possibly involving also intelligent agents, may be a critical success factor of the knowledge claim validation processes. AC-KI Production of the objective knowledge artifacts and of the supporting content, inherent in the knowledge integration process, may require well-defined editorial procedures. Such procedures may typically be supported by automatic workflow management functionality. The requirements may vary from simple groupware-like support to complex, ontology-based workflow management environments. Table 2. Feature roles within the knowledge management processes. Further analysis of the KMS feature requirements in the context of the knowledge life-cycle, leading to development of Use Case models [Rumbaugh1999] to be used for design and validation of the ICONS architecture, is to be performed in the succeeding phases of the ICONS project. We believe that the above discussion provides sufficient user requirements context for the ensuing presentation of the KMS reference IST-2001-32429 ICONS Intelligent Content Management System page 17/86
  • 18. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 architecture. The reference architecture is to provide a beacon for the further unfolding of the research and development work of the ICONS project. Within the document we use several types of the knowledge. Figure 4 presents the ICONS knowledge taxonomy while Dictionary presents their meaning. knowledge declarative procedural knowledge knowledge structural knowledge knowledge-based knowledge maps reasoning Figure 4. ICONS taxonomy of knowledge. 3.2 The KMS Reference Architecture The European KM Forum [KMForum2001, KMForum2001_D11, KMForum2001_D11a, KMForum2001_D12] is an IST project with the goal to collect the current KM practices and to create an almost complete overview of the KM domain in Europe. The KMS reference architecture presented in Figure 5 has been developed on the basis of the current KM technologies discussed in the EKMF project reports, as well as on the KMS feature requirements identified in the preceding section. Full Knowledge Text Content Object Map Business Properties graphs Intelligence Data Systems Bases Taxonomies Knowledge Semantic SDM nets Dissemination nets Time Push modelling Content technology Web Pages Files Knowledge-based Integration Conceptual Semantic Domain reasoning trees nets Ontology Legacy Hyper-text Information Intelligent Document Semantic Data Process Systems Agents Management A Models graphs Knowledge Management RDF System XML Encryption Files Systems Knowledge Discussion Content Security Electronic Version Forums Repository signature control Access Control Knowledge KMS Actor Autenthication Engineering Collaboration HSM DBMS Rendering Workflow Message Management Exchange Internet Intranet Figure 5. The Knowledge Management System reference architecture. Table 3 presents the above presented feature requirements of a KMS reference architecture in the tabular form. IST-2001-32429 ICONS Intelligent Content Management System page 18/86
  • 19. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 Feature requirements of a Knowledge Management System Domain Content Knowledge Content Actor Security Ontology repository Dissemination integration Collaboration Semantic Nets XML Push technology Files Message Encryption exchange Conceptual trees RDF Content object Data bases Discussion Access Control repository forums Semantic data File systems Knowledge map Business Knowledge Authentication models graphs Intelligence engineering Process graphs Version control Full text Web pages Workflow Electronic management signature Hyper text DBMS Semantic data Legacy Internet/Intranet models net information systems Knowledge-based HSM Semantic nets Intelligent agents reasoning Time modelling Rendering Document management Taxonomies Table 3. Feature requirements of a Knowledge Management System. The KMS features, grouped into six principal feature sets, represent our current views pertaining to the KM technology requirements. Some of the features are already common in the advanced content management systems, referred to as the corporate portal platforms, some other are subject to the on-going KMS research efforts. We discuss each of the principal feature sets in more detail in order to define reference feature requirements for the ICONS architecture presented in the succeeding section. 3.2.1 Domain Ontology features The Domain Ontology features pertain primarily to knowledge representation including the declarative knowledge representation features, such as taxonomies, conceptual trees, semantic nets, and semantic data models, as well as the procedural knowledge representation features exemplified by the process graphs. Time modelling and knowledge-based reasoning features pertain both to the declarative and the procedural knowledge representations. Hyper-text links are considered as a mechanism to create ad hoc relationships between content artifacts comprised in the repository. Taxonomies Taxonomies provide means to categorize information objects stored in the content repository. Categorisation classes may be arbitrary hierarchical structures grouping information objects selected by the class predicates. Class predicates are defined in the form of queries comprising information object property values or as full text queries comprising key word and/or phrases. Categorisation classes are not necessarily disjoint. Dictionaries are a special class of taxonomies, also organized into hierarchical structures, which may comprise any number of categories, usually corresponding to occurring information object property value (e.g. a name directory) with the maximum number of categories equal to the cardinality of the property value domain. Automatic categorisation of information objects may also be based on arbitrary functions defined on object property values and/or content and implemented as an arbitrary analytical algorithm or a knowledge-based reasoning function. In the latter case, an inference engine provides for the actual categorisation of information objects. Analytical algorithms provide for automatic categorisation of formatted data objects, textual objects, as well as multimedia objects, such as audio, images and video frames. Taxonomies provide a powerful navigation device for browsing the content repositories, since they usually represent intuitive semantics of the user information requirements. Conceptual trees Conceptual trees are also a categorisation device used in conjunction with full text queries providing means to define concepts on the basis of its hierarchical relationships with other concepts, key words, and phrases. Usually IST-2001-32429 ICONS Intelligent Content Management System page 19/86
  • 20. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 conceptual trees allow for the full text query relevance ranking. This technique allows for easy extension of the domain ontology terminology with the use of, usually abstract, concepts with arbitrarily rich semantics. Semantic Nets Semantic networks provide means to represent binary 1:1 relationships, expressed usually as named arcs of a directed graph, where vertices are information objects belonging to any of the information object classes. Normally, the linked object classes are determined by the binary relationship semantics of the corresponding named arc. An example of a simple semantic net may be a binary relation Descendants defined as a subset of the Cartesian product of the set of Persons. Semantic nets may be constructed over an arbitrary number of information object classes and binary relationships. Semantic Data Models The Unified Modelling Language (UML) [Rumbaugh1999] is the currently prevailing specification platform for semantic data models allowing for definition of structural as well as behavioural semantics. Class Association Diagrams provide easy to read, intuitive semantics closely matching the mental models of the KMS users. The UML-based knowledge representation, in order to be useful, must be supplemented with a navigation facility allowing the user to transverse the network of specified object associations and to view/retrieve the corresponding object sets. Hyper-text links The hyper-text links support referential link semantics that may exist among the information objects belonging to arbitrary object classes existing in the content repository. The ad hoc character of hyper-text links, usually no schema level information exists, limits their usefulness as a knowledge representation feature. However, they are a useful annotation tool to express, possible transient, referential relationships of information objects stored in the content repository. Time modelling Time represented in domain ontologies, as well as in the content repository, conveys important information. Time valued properties may be important elements of search and automatic categorisation operations. Hence, formal representation of time is of paramount importance for knowledge descriptions and content characterization. Problems that exist today are related to the lack of standard representation of time instances and periods, incompatible time scales, granularities as well as periodicity definitions. Precise rules must be established as to representation and treatment of temporal properties to be comprised in a knowledge management system. Time modelling is also an important element of the procedural knowledge representation. CPM-like (Critical Path Method) have been proposed for representation of time constraints and for optimisation of process execution times in advanced workflow management systems. Knowledge-based reasoning Knowledge-based (k-b) reasoning systems may be built for a wide range of decision-making problems. The reasoning is based on a collection of facts, usually represented by content property values, and heuristics represented as rules. The prevailing paradigms are production rules (forward and backward chaining), logic programming, and neural nets (reasoning about quantitative data). The k-b reasoning may be used for expert knowledge representation, knowledge and content categorisation and distribution, as well as for the intelligent agent implementation. Intelligent workflow management is a new application area for k-b reasoning both for process routing as well as for the dynamic role modification. Process graphs Business processes are usually represented by process graphs, typically by the Event-Condition Petri Nets or by directed graphs. Petri Net representation allows for expressing richer process semantics, in particular the pre-and post-conditions for process activities. The process specification must also be supplemented by the set of role definitions, one definition for each process activity, to enable the workflow management engine to properly assign tasks to KMS actors. The process graph representation should comprise a set of process metrics and, possibly, performance constraints and exception conditions. IST-2001-32429 ICONS Intelligent Content Management System page 20/86
  • 21. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 3.2.2 Content Repository features Extensible Markup Language (XML) Light version, tag-oriented meta-language of SGML standard adapted to the web that provides facilities to describe and diffuse structured documents through Internet. Also used as the emerging industry standard for exchange of data between information systems as well as for storage and retrieval of complex, multimedia objects in content repositories. Resource Description Facility (RDF) Extension of XML used to define complex relationships between documents or data. Popular as the target data structure for mapping UML semantics into the content repository data models. RDF schema is used as a template to define annotation in RDF syntax. File Systems File systems are commonly used in multimedia content repositories to serve as containers for large content objects represented as files. The use of file systems is a convenient technique for mapping content onto diverse hardware storage devices in order to exploit their inherent characteristics. E.g. for permanent non-modifiable storage of electronic documents an optical storage device may be used. File systems are composed into storage hierarchies usually controlled by the content repository management software. Hierarchical Storage Management The hierarchical storage management (HSM) functions control allocation of storage space available in a hierarchy of storage devices to large content object files. Such systems are based on a directory of all content objects including information pertaining to storage allocation rules and migration predicates. Content objects are automatically migrated up and down the storage hierarchy, where the top layer is the object-relational database management system, and the bottom layer may be an optical storage jukebox or a mass storage tape system. Migration predicates usually determine content object residence time at any given storage hierarchy level and serve to fire the storage allocation rules controlling the file migration operations. Database Management System (DBMS) Object-relational database management systems serve as an implementation platform for the domain ontology management functions and the content management functions. Solution architectures vary, yet a typical use would be for storage of all KMS directories and control blocks, for representation of the domain ontology data model, and for storage of content object files and attributes. Main memory relational database management systems may also be used to store frequently used ontology structures as well as to provide a platform for representing data structures representing facts in knowledge-based reasoning algorithms. Version control Content evolves over time. In some cases history of content change is as much important as the content itself. The versioning mechanism allows for transparent identification (incremental revision number) and storage (either full version or increments) of particular versions of content and content object properties. Access schemas pertaining to multiuser access problems is the neighbouring subject. Rendering Content is held within the repository in a variety of native formats. Therefore the content can also be viewed or edited in the tool that originally created the content. However, a uniform web based browser requires rendering that facilitates for presenting all of them in a consistent way. Content can be rendered and renditions include HTML and XML, as well as PDF and other well know formats. 3.2.3 Knowledge Dissemination features Push Technology Push technologies providing facilities for automatic supply of selected content objects to a predefined group of recipients (a role), who are usually the KMS actors (knowledge workers, intelligent agents), are the best approach to combat the information glut. The push technologies are strongly correlated with such knowledge representation features as the automatic content categorisation and knowledge-based reasoning. IST-2001-32429 ICONS Intelligent Content Management System page 21/86
  • 22. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 Content Object Properties Content object properties characterize the principal object properties, such as object identifier, origin, author(s), date, etc, as well as provide information, usually in the form of key words, characterizing the content. The latter type of properties are usually obtain at the object creation (storage) instant through automatic content analysis and categorisation, or through a manual content object description process (e.g. description of an ancient manuscript image). Either way the content object properties provide a convenient access path for content repository queries, taxonomy structure allocations, and for materialisation of content object relationships. Full Text Full text indexing and retrieval is a classical approach to content management. The full text retrieval techniques, used in conjunction with conceptual trees, are commonly used in automatic categorisation features. Often content object property values are automatically obtained through a full text search-based categorisation process. Knowledge Map Graphs Multi-level taxonomy trees, semantic nets and content object associations are usually represented as graphs on the user interface level. This fits nicely with the user mental model of the domain ontology structure and its relationships with the underlying content object model. Because of substantial scope and complexity of knowledge map advanced graph construction and manipulation techniques must be employed to provide the required ergonomic level of the KMS user interface. The knowledge map graphs are used, usually in a query mode, for navigation within the semantically meaningful structures and for browsing the associated content. Semantic Nets Graphic representation of semantic nets (SN-graphs), although quite straightforward, must be supplemented by manipulation functions supporting transversal, SN-graph node visualisation/retrieval, and SN-graph selection (entry). SN-graphs, representing a given semantic net class implementation, may either be materialised dynamically, or, usually in the case of complex association functions and large scope, may be cached as the persistent ontology structures. Transient storage and off-line semantic net materialisation techniques may be used to achieve the required KMS performance levels. Note, that the SN-graph navigation typically occurs at the content object instance level, where the SN-graph arc represents a 1:1 content object relationship. Semantic Data Model Nets SDM net graphs (SDM-graph) are envisaged as a representation of the UML graphic conceptual model notation. Hence, content object classes well represent subsets of the corresponding content object instances constrained by class association used for navigational selection. Hence, navigation, list manipulation, visualisation/retrieval, and SDM structure entry functions are necessary to exploit the rich semantic potential of navigation on the content object class level. Note, the as opposed to the SN-graph navigation presented above, the SDM-graph navigation yields subsets of content object instances at each visit at a corresponding SDM-graph node. The only similarity is the SDM-graph selection effected as selection of the entry content object instance (e.g. a particular Person occurrence). 3.2.4 Content Integration features All entities, regardless of their character (structural, procedural), participating in the content integration process must be accessible via the knowledge map graph, or via other existing access path to the content repository. Any of the integrated content objects, constrained by the corresponding descriptions of the content repository schema, may either be physically stored in the repository as a content object (snapshot, re-freshable), or may be dynamically materialised at the reference time. Usage of the above integration modes should be entirely transparent to the KMS user. Files Files feature among candidates for content integration, due to the widely diffused usage of file systems as repositories of large, multimedia content objects. Little, or no, analysis of the multimedia objects content, apart from the automatic categorisation analysis, is performed during the integration process. Data Bases Heterogeneous databases are a typical source of data for content integration. Multi-database query and integration techniques, as well as the homogenization of heterogeneous data models, are the underlying technologies. The most straightforward cases entail querying a single database to materialise the required content to be further exploited in the KMS context, either as an element of a content object stored in the repository, as a virtual content object materialised on-the-fly. IST-2001-32429 ICONS Intelligent Content Management System page 22/86
  • 23. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 Business Intelligence Systems Data warehouses and OLAP system deliver relevant knowledge content, that should be integrated into the KMS environment. The BIS-generated content may be integrated into repositories as elements of content objects or may be delivered dynamically. Legacy Information Systems Similarly, the legacy information systems are the source of content that may be relevant to the KMS users. Selected legacy system reports may be accessible as content objects, or their elements, via the KMS content repository. Intelligent Agents Intelligent agent (IA) technology is a rapidly growing area of research and new application development. Applications of IA technologies in the KMS context are discussed in [Baek1999]. The definition of an intelligent agent proposed by IBM [IBM1995] states that an intelligent agent is “a software entity that carries out some set of operations on behalf of a user or another program with some degree of independence or autonomy, and in so doing, employs some knowledge or representation of the user’s goals or desires”. The IA technologies are clearly useful and applicable in the KMS context, meeting two broad functionalities, that of a personal assistant or that of a communicating/collaborating agent. In both roles the intelligent agents are relevant as knowledge-based support for the content integration features. Document Management Systems Document management systems are a particular class of legacy information systems providing a rich content infrastructure directly relevant to the KMS users. Electronic documents and image-based information typically integrated into the KMS content repositories as principal factual knowledge artifacts. Some KMS architectures the document management functionalities are subsumed by the KMS features. Web Pages Paradoxically, the genuine knowledge is perfectly hidden in the enormous amount of data volumes that is available on web pages. Therefore even more intelligent and flexible mechanism are to be developed in the area of external knowledge acquisition and, what is even more important, keeping it up-to-date. Interoperability of systems and ability to choose the best offered content are of the primary importance. 3.2.5 Actor Collaboration features Message Exchange Instant messaging relevant to the socialisation process (tacit to tacit knowledge transformation) is an important vehicle supporting the knowledge production process. Hence, the KMS functionality should provide a platform for a semi-disciplined exchange of electronic messages that may subsequently be categorised and stored in the content repository. Some collaboration metrics, similar to activity measures used in e-learning systems, may also usefully applied for management of the knowledge production process. Discussion Forums Discussion forums are the electronic equivalent of the water cooler or cafeteria discussions, that have long ago been discovered as vital knowledge production activities. Again relevant and valuable statements and comments should be categorised, stored in the content repository and measures (e.g. attributed to the originating sources). Knowledge Engineering Knowledge-based reasoning applications and intelligent agents require analytical support to glean the expert knowledge out of individual (outstanding knowledge workers). The process of obtaining expert knowledge, required to build knowledge-based (or expert) applications, called traditionally knowledge engineering, requires specific methodologies and tools for the formal knowledge representation. Such tools may coincide with the knowledge representation paradigms used, both for declarative and procedural knowledge, within a specific KMS environment. Workflow Management The workflow management technology is an important platform supporting, both the knowledge management processes of the KLC and the business processes of the organizations. In the latter case, application of the workflow technology provides in-sight into the organization operations that is an important feed back into the knowledge production process. In fact it may be disputed that, in the case of organizations where knowledge management in an explicit management function, the KLC process may be considered to belong to the realm of IST-2001-32429 ICONS Intelligent Content Management System page 23/86
  • 24. Intelligent Content Management System 1.15 Feature Requirements of a Knowledge Management System April 2002 business processes. We believe that keeping the above distinction may be advantageous in evaluation of the alternative KMS architectures viewed as the enabling platforms for KLC-driven knowledge management processes. Distinct workflow management paradigms have been discussed in [Swenson2001, Eder2001, Stader2001]. It has been pointed out that substantially different application requirements pertain to production business processes that today represent the principal realm of workflow management applications, then to the knowledge worker (called also an information worker) processes, and to the project-oriented activities such as development of a new product. In two latter cases, pertaining directly to the knowledge production processes, a substantially different workflow management paradigm, then that of the Workflow Management Coalition [WfMC1994], is desirable. Indeed, it has been shown in [Stader2001] that intelligent, ontology-based workflow management platform is required to support development of complex new industrial products. It is an open question, as to what degree of interaction should be present between the KMS workflow management processes, and the classical workflow management supporting the business processes of an organisation. It may very well be that, as in the case of the document management technology, the diverse workflow management paradigms will be reconciled and consequently integrated into the KMS environment. Internet/Intranet The web technologies already prevailing in advanced content management systems are paramount to the KMS architectures due to several important factors. First of all, application of the web paradigm removes an important initial barrier between the user and the KMS functions (premise: all educated people use Internet). Secondly, the cost of ownership, particularly high in large, distributed organizations in the context of complex KMS architectures, may be kept under control. Since any useful KMS must constantly scout the content resources to be integrated that are available on the Net, as well as to publish information relevant to organization’s partners and customers, the Internet orientation of the system architecture is a must. 3.2.6 Knowledge Security features The relevance of the knowledge security features is as obvious in the case of a KMS, as in the case of any information system with architecture opened to the Internet. As the result any practical KMS must integrate such security features as electronic signature, encryption, access control and user authentication. Our research is not oriented towards adding value in this particular field and, in fact, the use of security features is identical, as in the case of other information systems. Hence, we shall not elaborate the subject of knowledge security any further. IST-2001-32429 ICONS Intelligent Content Management System page 24/86
  • 25. Intelligent Content Management System 1.15 Architecture of the Intelligent CONtent management System (ICONS) April 2002 4. Architecture of the Intelligent CONtent management System (ICONS) 4.1 The ICONS architecture specification The ICONS schematic architecture model is presented in Figure 6. Consistently with the ICONS project goal and objectives we are aiming at developing a complete ICONS prototype to be demonstrated and verified in a realistic application environment. We propose to adopt an integration strategy combining existing, existing to be expanded, and newly developed modules to provide building blocks for the ICONS architecture. Such approach allows to keep the ICONS project scope under control and to obtain research and development results adding value to the selected technological fields representing the project focus (marked with the thick boarder lines). Figure 6. The ICONS architecture schematic model. The project technological areas are discussed in more detail below. We concentrate on the ICONS project primary technological areas, providing cursory information, representing our view concerning technological environment prerequisites of the project, pertaining to the secondary technological areas. We assume that our research efforts will concentrate on ICONS modules that are planned to be developed from scratch, whereas the specification and development work will also comprise the extension efforts planned for the existing functional modules to be adopted 4.1.1 Development Technologies Development technologies comprise modules providing basic functionalities and development tools required for web-oriented software development. All of the modules comprised in this technological area are to be adopted “as is” into the ICONS project. IST-2001-32429 ICONS Intelligent Content Management System page 25/86
  • 26. Intelligent Content Management System 1.15 Architecture of the Intelligent CONtent management System (ICONS) April 2002 Since no budget has been planned for acquisition of development software licences, preference will be given to “open source” software tools. Detailed specification of the technological requirements with respect to the ICONS modules, comprised in the DT area, will be provided in deliverable [ICONS D5]. 4.1.2 Content Management Technologies The premise of the ICONS project is not to develop solutions in technological areas, where a mature commercial technology already exists. Such approach allows us to realistically plan to achieve the project results on time and budget. The detailed specification of technological prerequisites will be presented in deliverable [ICONS D5]. We present our current views on technological requirements with respect to the CM technological area, in order to allow for a complete overview of the ICONS architecture to be presented in this section. One principal requirement, due to the necessity of developing extensions of the Content Management modules, is that all software is to be available in the source version and with the appropriate licence to modify it. Content Repository Manager The Content Repository Manager (CRM) provides an implementation platform for a XML-based object oriented content repository, controlled by an enhanced RDF schema, and comprising complex XML objects with embedded multimedia objects. Structure of the repository objects is determined and controlled by the DTD statements comprised in the RDF schema. Objects respond to methods implemented in Java classes, each principal class corresponding to a XML object class. The object class inheritance is supported. The embedded multimedia objects are stored as files and their location is managed by the hierarchical storage management functions. Content Semantic Model Manager Selected fields of XML objects, as well as the contents of text-oriented multimedia object types, are used for construction of auxiliary data structures comprising relational database tables, relational database indices, as well as full text search engine indices. These auxiliary structures serve to support the representation of content semantics with the use of such structural constructs as binary N:M relationships, N-ary N:M relationships with attributes, taxonomy hierarchical trees, and dictionaries. Operations on the auxiliary storage structures are available to application programmers creating new content repository objects as the CSMM application programmer interface (API). All structural semantic constructs are named and are used to reflect the application semantics to be implemented in the Content Repository. The auxiliary data structures are also used to support property-based selection operations as well as full text search operations. Workflow Manager The Workflow Manager supports the web-oriented business processes providing standard access to task lists and process execution information via Internet browsers. The process semantics meet the Workflow Management Coalition [WfMC1994] requirements with some enhancements in the area of the dynamic role modification (roles are sets of potential candidates to execute a specified task within a business process). Hierarchical Storage Manager The Hierarchical Storage Manager (HSM) provides functionality to manage allocation of storage space, and the subsequent tracking functions, for the multimedia objects stored in the Content Repository. Hence, the Content Repository storage space extends from the object relational database (objects stored as BLOBs), through an arbitrary path of file systems, to optical or tape mass storage devices. Object migration is performed automatically, triggered by pre-specified events, according to migration predicates defined by the Content Repository administrator. External Content Integrator The external content integration functions accept any schema-compliant XML input, as well as results of predefined parametric queries and procedures developed as the Content Manager applications. Such objects are called “integration objects” and they are treated as first class objects with respect to taxonomies and structural semantics constructs. Integration objects may be materialised and subsequently stored in the repository, usually taking a form of a report file, or they may be created dynamically as transient objects in response to the user specified parameter values. IST-2001-32429 ICONS Intelligent Content Management System page 26/86
  • 27. Intelligent Content Management System 1.15 Architecture of the Intelligent CONtent management System (ICONS) April 2002 Role Manager Roles are subsets of the Content Management System users defined by common access rights and operation permissions, as well as by execution rights within specified workflow processes. Roles may be defined by role predicates or by enumeration and they may be modified on the basis of the processing history. Content Schema Definition Environment The Content Schema defines the data model of the Content Repository including both the XML object structure and the auxiliary data model created to represent the content structural semantics, and to facilitate the selection operations. The RDF schema is additionally annotated with system-defined tags or tag parameters to assign internal significance to the selected XML document fields. The XML schema provides also the structural information for generation of Electronic Form processing functions. 4.1.3 Knowledge Management Technologies Ontology Model Manager The Ontology Model (OM) comprises formal knowledge representation pertaining to a particular application domain, hence we interchangeably use the term domain ontology, as declarative knowledge or procedural knowledge. The declarative knowledge may formally be represented by the structural knowledge representation constructs, such as SDM relationships or Semantic Net links, or by rules supported by an inference engine. The OM Manager is to provide functions to create, maintain, and use the knowledge representation structures and to make those functions available to other KMS modules. Structural Knowledge Navigator The Structural Knowledge Navigator (SKN) is to provide an ontology structure manipulation language, available in the form of an API to developers of other pertinent ICONS modules, to provide navigation and selection facilities supporting the graphic object selection and graph navigation features available to ICONS users on the HCI level. The relationship and object link structures are to be defined in terms of link predicates, so the actual navigation is based on dynamically materialised object sets. Content Categorisation Engine Content categorisation of text files and other multimedia objects are gaining increasing importance in knowledge management systems. The current automatic categorisation methods are based on evaluation of property values with straightforward SQL-like queries, on full text queries supported by appropriate full text indices constructed on-the-fly by full text search engines. In general the content categorisation engines processing formatted (electronic form) or text data address the problem using algorithms to: (i) select words from text that should be used for indexing, (ii) look for close matches to personal names, company names, product names, or places, (iii) extract data from formatted tables or forms, and (iv) search for words that regularly appear in the same context and therefore may be related. In the case of image data algorithms already exist, that search image catalogues, provide face matching facilities, fingerprint identification, or medical image analysis. We are looking at candidate algorithms, open source solutions, or software products to be potentially integrated into the ICONS architecture. Datalog Inference Engine The Datalog Inference Engine is to be based on the DLV system developed by CIES to be accordingly modified and interfaced to the ICONS architecture. DLV is a deductive database system, based on disjunctive logic programming, which offers front-ends to several advanced KR formalisms. Disjunctive Datalog combines databases and logic programming. For this reason, DLV can be seen as a logic programming system or as a deductive database system. In order to be consistent with deductive database terminology, the input is separated into the extensional database (EDB), which is a collection of facts, and the intensional database (IDB), which is used to deduce facts. An In-Core relational DBMS is to be used to host the extensional database, to be materialized as a persistent or transient Content Object comprising the corresponding disjunctive logic programme as one of its methods. Execution of the logic programme on the basis of the EDB structure comprised in the Content Object will be materialized as the In-Core relational database structure. Intelligent Workflow Manager Extending workflow applications beyond the realm of classical production-level business process support into the realm of knowledge workers’ activities and large project control, require extension of the current workflow engine capabilities. The possible directions point at such WfMC architecture enhancements as application of knowledge-based techniques, in conjunction with advanced time modelling capabilities, process routing IST-2001-32429 ICONS Intelligent Content Management System page 27/86
  • 28. Intelligent Content Management System 1.15 Architecture of the Intelligent CONtent management System (ICONS) April 2002 problems and to optimal workload allocation problems. Workload allocation problems, in conjunction with development and maintenance of reliable process metrics, may be solved with the use of knowledge-based techniques. Semi-structured Content Integrator Semi-structured information, such as XML (possibly with RDF annotations) and HTML pages and the associated multimedia objects usually down-loadable as files, represent a wealth of content, that may be directly relevant to a knowledge management system. Such information as competitive content, financial and commercial data, news reports, etc., should be directly accessible via the KMS content repository. Such objects should be associated with the repository content via the structural knowledge representations (relationships, links) as well as though taxonomy trees. Mapping the semi-structured objects into a predefined schema representing the corresponding content repository object classes may present a serious structure homogenization problem, in particular in view of the variety of representations used for the same entities in different, highly volatile Web information sources. The knowledge-based wrapper technology may provide one of promising areas for developing solutions of the above problem. We propose development of a class of intelligent agents, to be called intelligent content integrators, to solve the above problem. Intelligent Agent Development Environment Intelligent agents serving as personal assistants and/or communicating/collaborating agents are an important KMS technology. A framework for specification and development of knowledge-based agents is to constitute an integral part of the ICONS knowledge representation architecture. At this point that the logic programming reasoning features will provide an important ingredient of the knowledge-based IA solution. 4.1.4 Human Computer Interaction Technologies HCI Personalisation Engine Sound personalisation facilities already exist in advanced Web content management systems, called corporate portal platforms, with some of the software already available in the “open source” form. We plan further enhancements of the existing technology principally based on two technical areas: (i) advanced logging facilities of the KMS user activities, and (ii) knowledge-based analysis of user activities in conjunction to dynamic profiling of the user preferences. Personalisation should focus, apart from the preferred layout of the user interface frames (pages), on assisting the user in exploiting the complex ontology structures. Electronic Form Manager Electronic forms (EF) are ubiquitous in content management systems, in particular in the Web content management area, as means to create, update and search content objects. An outstanding problem, in particular pertaining to the Web-oriented solutions, is specification and enforcement of complex integrity constraints that may be enforced on the HCI level. At this point this is the potential area of the EF enhancement research and development to be undertaken in the ICONS project. Content Presentation Manager Content presentation pertains to displaying, usually in the Internet browser, of multimedia content objects comprised in the KMS content repository. Standard viewer technologies exist, with most of the current content management systems using products of few global suppliers of the viewer technology. An appropriate interface is to be developed to accommodate selected viewer technologies, and no further enhancements are planned within the scope of the project. Knowledge Map Graph Manager The Knowledge Map graphs are primarily composed of multi-level taxonomy trees providing navigational, entry level access, to the complex ontology structures combining the taxonomy trees and the structural knowledge graphs. The problem lies in representing large, nested tree structures in a user-friendly graphical way and in providing easy to use navigational facilities. The HCI level navigation is to be implemented with the use of the Structural Knowledge Navigator API. Structural Knowledge Graph Manager The structure knowledge graphs, representing the SDM nets and the Semantic Nets, also represent a hard problem from the point of view of the HCI level presentation and manipulation. The structured knowledge graph navigation is considered of paramount importance in communicating semantics of, and in providing the navigational access to, the content repository objects. Intuitive, user –friendly structure navigation, and the result IST-2001-32429 ICONS Intelligent Content Management System page 28/86
  • 29. Intelligent Content Management System 1.15 Architecture of the Intelligent CONtent management System (ICONS) April 2002 list manipulation, is the cornerstone of the successful ICONS HCI environment. The HCI level navigation is to be implemented with the use of the Structural Knowledge Navigator API. Process Graph Manager The Process Graph Manager is to provide the following principal functionalities: (i) graphical design and consistency checking of the intelligent workflow process graphs, and (ii) to provide a graphic interface for monitoring the state of a particular process instance. All principal process parameters and control data should be accessible via the graphic interface. 4.1.5 Distributed Architecture Technologies Load Balancing Algorithms Load balancing algorithms should control device/media allocation to the active, i.e. process, elements of the distributed ICONS architecture. Distribution may include the ICONS functional modules, or selected processes of such modules, as well as the application object classes. The object-oriented architecture of the system, both on the ICONS and on the application software levels, renders itself well to distribution in the peer-to-peer as well as the hierarchical computer system architectures. Load balancing is important to system performance, due to the diffused use of the processor-intensive knowledge-based techniques in ICONS modules. Distribution Optimisation Algorithms Distribution optimisation pertains to the static elements of the ICONS architecture, i.e. to control data structures, domain ontology structures, and to the content object structures. Optimisation of the device/media allocation, with the possible replication of the above data structures, may be an important technique for the efficient system implementation. Scalable Distributed Data Structure (SDDS) A SDDS system should provide the principal distribution platform for the selected static elements of the ICONS architecture. The system must be adapted to the ICONS requirement, possible to support distribution of the In- Core relational DBMS module. Distributed Workflow Communication Communication among workflow processes, managed by a common or by different workflow platforms, is currently subject to research and standardisation work of the Workflow Management Coalition task groups [Hayes2001]. XML-based messaging protocols are proposed as the means to transfer process information among heterogeneous platforms. Messaging standards are to be implemented in the ICONS Intelligent Workflow Manager and experimented with in the ICONS distributed architecture environment. 4.2 The ICONS architecture vs. the KMS reference architecture The goal of the ICONS project is to develop and demonstrate a KMS prototype meeting most of the feature requirements generally accepted for such systems. We have discussed the KMS reference architecture in section 3 in the context of user requirements identified within the principal streams of the knowledge management research. We shall now show, that the proposed ICONS architecture addresses most of the feature requirements defined in the KMS reference architecture. We relate the ICONS modules to the KMS reference architecture features in cross-reference tables Table 4 through Table 8, one for each principal feature of the KMS reference architecture. We do not discuss the Knowledge Security principal feature, since it clearly lies outside of the project terms of reference as is considered as a ready-to-use development technology. We only consider the ICONS focus technology modules, assuming that all the auxiliary technologies will be used as required and appropriately modified or enhanced as indicated in the ICONS architecture discussed in the preceding section. ICONS functional modules Conc. Semant Taxono Time K-B Hyper- Process SDM Trees Nets mies Model. reason. text graphs (Focus Tech. Areas) Knowledge Management Ontology Model Manager D R D R D R R Structural Knowledge Navigator R D R Content Categorisation Engine D D Datalog Inference Engine R Intelligent Workflow Manager R R Semi-structured Content Integrator Intelligent Agent Development Environment R R R IST-2001-32429 ICONS Intelligent Content Management System page 29/86
  • 30. Intelligent Content Management System 1.15 Architecture of the Intelligent CONtent management System (ICONS) April 2002 Human Computer Interaction (HCI) HCI Personalisation Engine R R R R Electronic Form Manager Content Presentation Manager Knowledge Map Graph Manager R Structural Knowledge Graph Manager R R Process Graph Manager R Distributed Architecture Load Balancing Algorithms R Distribution Optimisation Algorithms R R R Scalable Distributed Data Structure (SDDS) R Distributed Workflow Communication R R – research work S – specification work D – development work Note that the work type notations imply the starting point of the project effort. I.e. R means that research work is necessary and the it will be naturally followed, if successful, by the specification (S), and development (D) efforts. Table 4. The ICONS focus technological area modules and the Domain Ontology features cross reference All Domain Ontology features are addressed, with the most of the work starting at the research level. Development pertains to enhancements of the adopted content management functionality to be utilized in the project. ICONS functional modules XML RDF DBMS File HSM Vers. Render System Contr. ing (Focus Tech. Areas) Knowledge Management Ontology Model Manager S Structural Knowledge Navigator Content Categorisation Engine Datalog Inference Engine Intelligent Workflow Manager Semi-structured Content Integrator D D Intelligent Agent Development Environment Human Computer Interaction (HCI) HCI Personalisation Engine Electronic Form Manager D D D Content Presentation Manager D D D D Knowledge Map Graph Manager Structural Knowledge Graph Manager Process Graph Manager Distributed Architecture Load Balancing Algorithms Distribution Optimisation Algorithms R Scalable Distributed Data Structure (SDDS) Distributed Workflow Communication R – research work S – specification work D – development work Note that the work type notations imply the starting point of the project effort. I.e. R means that research work is necessary and the it will be naturally followed, if successful, by the specification (S), and development (D) efforts. Table 5. The ICONS focus technological area modules and the Content Repository features cross reference Most of the Content Repository features are outside the ICONS project focus technological area and they are to be supported by the content management platform to be selected as the base line development environment. IST-2001-32429 ICONS Intelligent Content Management System page 30/86
  • 31. Intelligent Content Management System 1.15 Architecture of the Intelligent CONtent management System (ICONS) April 2002 There is some adaptation work to be performed with respect to the existing electronic form management, content presentation functions, and version control functions to meet the emerging new requirements of the XML and RDF standards. Research will be performed in the area of hierarchical storage management, where distribution optimisation algorithms could substantially enhance the HSM functionality and performance. ICONS functional modules Seman. SDM K. Map Full C.O. Push Nets Nets Graphs Text Prop. Techn. (Focus Tech. Areas) Knowledge Management Ontology Model Manager R D D Structural Knowledge Navigator R R Content Categorisation Engine S S Datalog Inference Engine S Intelligent Workflow Manager Semi-structured Content Integrator Intelligent Agent Development Environment R Human Computer Interaction (HCI) HCI Personalisation Engine Electronic Form Manager Content Presentation Manager Knowledge Map Graph Manager R Structural Knowledge Graph Manager R R Process Graph Manager Distributed Architecture Load Balancing Algorithms Distribution Optimisation Algorithms Scalable Distributed Data Structure (SDDS) Distributed Workflow Communication R – research work S – specification work D – development work Note that the work type notations imply the starting point of the project effort. I.e. R means that research work is necessary and the it will be naturally followed, if successful, by the specification (S), and development (D) efforts. Table 6. The ICONS focus technological area modules and the Knowledge Dissemination features cross reference. The main thrust of the research effort to be undertaken in the area of Knowledge Dissemination will be directed towards advanced graphic user interfaces to represent the knowledge map nested taxonomical trees and the structural knowledge graphs. The remaining work will focus on adaptation of the existing content management functions. ICONS functional modules Data Files Doc. Intell. Legacy Web Busin. Bases Manag. Agents IS Pages Intell. (Focus Tech. Areas) Syst. Knowledge Management Ontology Model Manager Structural Knowledge Navigator Content Categorisation Engine Datalog Inference Engine Intelligent Workflow Manager R Semi-structured Content Integrator R R Intelligent Agent Development Environment S S S R S S Human Computer Interaction (HCI) HCI Personalisation Engine Electronic Form Manager IST-2001-32429 ICONS Intelligent Content Management System page 31/86
  • 32. Intelligent Content Management System 1.15 Architecture of the Intelligent CONtent management System (ICONS) April 2002 Content Presentation Manager Knowledge Map Graph Manager Structural Knowledge Graph Manager Process Graph Manager Distributed Architecture Load Balancing Algorithms R Distribution Optimisation Algorithms Scalable Distributed Data Structure (SDDS) Distributed Workflow Communication R – research work S – specification work D – development work Note that the work type notations imply the starting point of the project effort. I.e. R means that research work is necessary and the it will be naturally followed, if successful, by the specification (S), and development (D) efforts. Table 7. The ICONS focus technological area modules and the Content Integration features cross reference The research and specification work in the area of Content Integration will pertain to semi-structured content integration, that may be used to extract information out of the Web, and possibly document management, information resources. Intelligent agent technologies are candidate for the formatted data integration, mainly from pre-existing databases and files, and from legacy information systems or business intelligence systems. ICONS functional modules Know. Wfk. Inter. Messag Discuss Eng. Manag Intra Exchan Forum (Focus Tech. Areas) net Knowledge Management Ontology Model Manager Structural Knowledge Navigator Content Categorisation Engine R S S Datalog Inference Engine R Intelligent Workflow Manager R Semi-structured Content Integrator R R Intelligent Agent Development Environment R Human Computer Interaction (HCI) HCI Personalisation Engine S Electronic Form Manager S Content Presentation Manager Knowledge Map Graph Manager Structural Knowledge Graph Manager Process Graph Manager Distributed Architecture Load Balancing Algorithms R Distribution Optimisation Algorithms Scalable Distributed Data Structure (SDDS) Distributed Workflow Communication R R – research work S – specification work D – development work Note that the work type notations imply the starting point of the project effort. I.e. R means that research work is necessary and the it will be naturally followed, if successful, by the specification (S), and development (D) efforts. Table 8. The ICONS focus technological area modules and the Actor Collaboration features cross reference. The major research interests of the ICONS project in the area of KMS agent collaboration pertain to the intelligent workflow management and to the intelligent agent (IA) technologies. Some enhancement of the existing content management technologies is planned to provide support for the knowledge engineering features. IST-2001-32429 ICONS Intelligent Content Management System page 32/86
  • 33. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 5. The ICONS Knowledge Representation Features 5.1 Requirements for Knowledge Management (KM) Current syntactic approaches to search for information and, in its broadest sense, knowledge, over networks have proved useful for many applications – most conspicuously in applications using the Internet. However they do not retrieve the semantic content of documents. Semantics are needed if we wish to retrieve facts and other knowledge. They can be used for shared practical problem solving by several agents (computers or people). They support concatenation of knowledge with that from elsewhere and are therefore poorly suited to automated access and analysis. Knowledge representation (KR) and extraction techniques are at the centre of these knowledge management requirements, in particular the value of shared domain definitions and the conceptual reasoning approach, have been convincingly presented in [O’Leary 1998]. The activities of acquisition (including content-based retrieval of multi-media knowledge and information, such as images), indexing, filtering, linking, distribution and application of knowledge must be supported in ICONS. To match these requirements, the technical skeleton of ICONS is based on ontologies. Ontologies are “specifications of shared conceptualizations of particular domains”. They support knowledge access, integration and mediation. The present focus is upon structured and unstructured textually-represented information. But part of our research will be seeking to widen this scope with the ultimate goal being to represent and access multimedia information using semantic methods. The basic vision is of a representation and inference superstructure [Fensel, et al, 2000], based on ontologies, over distributed repositories. Three components of the structure of this layer can be discerned. The first is the provision of a formal semantics and efficient reasoning support sub-structure. At this level knowledge is described in terms of concepts, interrelationships and roles. The specific ICONS mechanisms for this will be detailed below. The second sub-structure supplies a rich set of primitives for modelling the Universe of Discernment. No single technique is adequate for this. The main ICONS techniques for this is Datalog, and the other methods outlined in the following sections will be invoked to supplement to this whole necessary. The third sub-component of the ICONS KM superstructure supports the sharing and co-operative usage of knowledge. In practice within ICONS the first 2 components are combined to allow knowledge to be described in a disciplined manner that supports rich modelling of application domains through the use of ontologies. From this it will be possible to derive classification taxonomies. In the research stream attention will be paid to the need for grounding of concepts, and knowledge (especially that derived by data mining). The idea is to be able to support explanation of the answers supplied to users. However our initial concern is to provide back-bone modelling capability, and for this reason we focus on Datalog, although other techniques will be used when needed for specific functionality. As suggested earlier (mined) knowledge has to be shared among compliant applications via an “ontology base’, and used as a “content base”. Hence a commonly accepted representation is required. 5.2 Syntax/Semantics One thing above any other distinguishes syntactic and semantic manipulation of information (including conventional web searching), and it applies to next-generation knowledge management in general. Syntactic manipulation is geared up to people rather than computers, while semantic manipulation is intended to bring structure to the meaningful content of pages of information [Berners-Lee 1999]. It suitably represented, it can be invoked and exploited by application programs. To build a semantic web, for example, requires • access to structured collections of information • sets of inference rules with which to reason automatically Sophisticated KR is required. Now, first-generation KR is centralized, although work has been done on distributed heterogeneous expert systems, for example [Zhang and Bell 1990]. Early systems were also IST-2001-32429 ICONS Intelligent Content Management System page 33/86
  • 34. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 ‘shallow’, in that complied hindsight was recorded rather than deeper principles. A third conspicuous failing of first-generation KR was the absence of an explicit well-understood representation of uncertainty of knowledge. All three of these deficiencies will be met to varying extents in the ICONS system. A language that expresses both data and rules is required, and this makes it possible to export rules from any KR system to the web. The task of developing such a language has been simplified because much of the information we need is of the form • “An ancestor of a parent is an ancestor”; or • “A truck is a kind of land vehicle which is a kind of vehicle” Datalog is the obvious choice for this. Three important technologies are already in existence to help in the endeavour of providing a Data/Rules language in a web context: • XML – tags (hidden tabs) can be created to arbitrarily annotate part of pages, and thus structure them. However this gives no meaning, although scripts (programs) can use these in sophisticated ways. • RDF – expresses meaning via a triple: things, their properties, and their values. For example, “this web page was authored by D. Bell”. Things/values are each identified by Universal Resource Identifiers (URIs), like URLs, and their properties. They can be added to the syntax by just defining a URI for them somewhere on the Web. • Ontologies As has long been recognised by DDB designers, DBs may use different identifiers or names and structure for a single concept, so there is a need to discover common meanings. An ontology can be a document or file that formally defines relations among terms, or more commonly, a taxonomy plus a set of inference rules. An ontology base is a collection of ontologies. A research stream will be carried out on the content base and how it can cooperate with the ontology base to support a range of inference functionality in ICONS. The goal of this work will be to explore how to capture XML objects (metadata) out of data from external data sources using content models, which will be stored in content repositories, and transfer essential metadata as facts to the ontology bases for storage, using the formal knowledge representation and manipulation methods. An ontology base holds domain ontologies, each of which provides a declarative knowledge representation (Datalog and see below) including concepts and semantics which can be exemplified by hierarchical relationships (semantics nets). It is not normally directly associated with specific applications. The theories and technologies described below will be utilized to implement the ontology bases. In relation to a particular content model, it may directly pertain to particular applications, which provides a generic way to represent a range of data sources as XML objects which are metadata information for storage and retrieval of complex multimedia objects in external data sources. In ICONS, a mechanism will be developed to specify all aspects of the data transfer from content bases to ontology bases as required by the ontology base, including the different kind of metadata, such as orders and locations and relational structures. All of these can be represented as facts, rules, and semantics. In the ICONS context, a content base is assumed to hold a variety of content models, each model can be represented as an XML DTD which is associated with external data sources. The content model determines what data is extracted and how it is ultimately represented in the XML object. A content model contains several pieces of information: • The original data structure, in the form of a data element. For example, if we take a data source as a relational table, in the form of an SQL statement. In this way, we can use the content model to specify that data should be drawn from more than one relational table. • The overall structure of the XML DTD. This is in the form of the root element, which, through attributes, specifies the name of the destination root element and the name of the elements that are to represent tuples. IST-2001-32429 ICONS Intelligent Content Management System page 34/86
  • 35. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 • The names and contents of data elements. These are contained in a series of elements. The elements include the name and attribute or content elements. These two elements designate the data that should be added, and, in the case of attributes, what it should be called. The meaning of XML codes used on web pages can be defined by pointers from the pages to an appropriate ontology. More complex applications use ontologies to relate the information on a page to associated knowledge structures and information rules. The semantic web, in naming every concept simply by a URI, lets users express new concepts with minimum effort. Its unifying language also enables these concepts to be progressively linked into a universal web. 5.3 Formal foundations of knowledge representation The prevailing approach to representing knowledge embodied in existing information resources, in particular in the web information resources, is by using metadata representing the complex information object relationships and in some cases inference rules. A summary and comparative analysis of knowledge management frameworks is presented in [Holsapple 1999]. A knowledge representation approach based on separately defined semantic schemes, usually based on special purpose knowledge representation languages, is increasingly gaining importance. An approach based on conceptual graphs has been proposed in [Martin 2000]. Representation of procedural knowledge and specified domain knowledge is proposed in [Fensel 1998]. Two separate knowledge representation language for procedure (P-Karl) and logic-based inference knowledge (L-Karl) are proposed. The use of logic as a knowledge representation scheme has also been postulated in [Lambrix 1999]. Conceptual reasoning and the semantic net approach have been proposed in [Lassila 1998, Martin 2000]. Prototype system implementations and knowledge management application frameworks have been discussed in [Bassiliades 2000, Bouguetaya 2000, Chang 2001, Goeschka 2001, Hammer 1997, Knoblock 1998, Lawrence 2001]. A novel approach of integrating the data mining results into the knowledge representation framework was presented in [Buchner 2000]. We now present the main formalisms to be used for KR in ICONS. The research stream of the project will seek to harmonise the use of these with Datalog methods – both for knowledge acquisition and for knowledge use. 5.3.1 Rules and uncertainty In recent years, much emphasis has been placed on the “softness” required to model our imperfect world. One aspect of this on which the University of Ulster has been working on for many years (since the ideas of Second Generation knowledge representation, e.g. distribution, deep and shallow reasoning (grounding), and uncertainty first appeared) is reasoning under uncertainty, and the implications this has for knowledge representation. This has been based on the Dempster Shafer theory of evidence, and we have extended it to general Boolean algebras (instead of merely applying it to subsets or propositions). The hypothesis in that the disjunctive nature if DLP matches well the disjunction inherent in the relational representation outlined in Section 3.2/3.3. One aim of the ICONS project is to include uncertainty in data representations (e.g. relations) and (ultimately, after research) include uncertainty in Datalog representation and use, and in multi-media knowledge representation. 5.3.2 Data Representation using Dempster-Shafer theory The Dempster-Shafer Theory of Evidence [Guan and Bell 1991] is a well-accepted basis for reasoning under uncertainty. It has been applied to reasoning using both uncertain rules and uncertain evidence. A domain (frame of discernment) is a finite set of mutually exclusive and exhaustive values. Let t be a data object, ai be an attribute of t, and Dj be the domain of ai (i and j do not have to be equal). An attribute ai is a mapping from a set of data objects to a domain Dj ∪ {⊥} where ⊥ represents an undefined value, and t.ai represents the mapped value in domain Dj ∪ {⊥}. The inclusion of ⊥ in the range of ai allows us to handle the special case where applying an attribute ai to a data object does not make sense. One major feature of the conventional relational database model is that every attribute value is atomic. In order to represent imprecise and uncertain information, we should modify this feature. Instead of a single attribute value, a set of values should be allowed for the representation of imprecise data. A probability distribution should be allowed for the representation of uncertain data. Definition 3.1. For any attribute aj of a data object ti, let Dk denote the domain the attribute maps into, and let mij represent the mass function for attribute aj of data object ti. Then, the attribute value IST-2001-32429 ICONS Intelligent Content Management System page 35/86
  • 36. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 ti.aj={<d, mij (d) > | d ⊆ Dk ∪ {⊥}, mij (d) > 0}. This definition says that a probability distribution of the power set of a domain is allowed in every attribute value (see an example as illustrated in Figure 7). Note that | ti.aj| > 1 implies that ti.aj is uncertain. 5.3.2.1 Patient # 5.3.2.2 Disease 006 Heart disease (.90) Stomach upset (.10) 175 Flu (0.25), pneumonia (0.64) δ (0.11) … … */δ represents a full domain of disease and the implication is that 11% of our believe is assigned to ignorance Figure 7. Treatment relation. This mechanism also provides a solution to the traditional problem of handling null values in databases. A null value can be naturally handled using a set. The null value is subdivided into three different cases such as unknown, inapplicable, and unknown or inapplicable, denoted by the special strings, respectively. The string nk represents the corresponding domain D itself for an attribute. Similarly, na, and nka represent {⊥} and D ∪ {⊥}, respectively. Refer to [Bell, et al, 1996] for details. 5.3.3 Extended relational database model In the conventional relational model, information is represented by set-theoretic relations, which are subsets of the Cartesian product of a list of domains D1 × D2 × … × Dn. With the data representation in Definition 3.1, which is a probability distribution on the power set of a domain (a mass function), the definition of a relation is changed to the following. Definition 3.2. A relation (or table) T based on D1, D2, …, Dn is defined as T ⊆ G1× G2 × … × Gn × CL where Gi is a set of all the probability distributions on the power set of a domain Di and CL={ [b, p] | b, p ∈ [ 0, 1]; b ≤ p}. Each Gi corresponds to a domain, each element of which can be interpreted as a set of pairs – each being a focal element and its value for some mass function m. In the set of CL, a pair of value [b, p] is used to represent the confidence level for each tuple in a relation T. CL will be used also as a system attribute name included in every relation. Specifically, b and p represent the bel and pls functions, respectively. For example, in the Treatment Relation of Figure 7, this could represent a doctor’s opinion, which could, for example, be valued less strongly consultant than for a newly qualified and for an experienced practitioner. It should be noted at this point that the CL (Confidence Level) value is not, in any way, derived from the attribute value uncertainties. It is an independent measure of the strength of the predicate represented by the tuple. In ICONS, uncertainty will be expressed, again, using “special cases” of conventional relation in standard DBMS, and can be manipulated by supplementary (application) programs. 5.3.4 Hyperrelations used for representing mined knowledge Hyperrelations generalise the database concept of relation, and are particularly useful for representing rules derived from data mining exercises. There exists a semilattice structure with a (“more inclusive / less inclusive” ordering) in the set of all hypertuples of a domain, where hypertuples generalise traditional tuples from value- based to set-based. Hyperrelations can represent rules just as decision trees can represent rules. We hypothesise that hyperrelations can also represent semantic nets, and this will be investigated in the ICONS research stream. 5.3.5 Hyperrelations as knowledge representation The semilattice structure in hypertuples can be used as a base for a hypothesis space. We take a hypothesis to be a hyperrelation, i.e., a set of hypertuples. A hyperrelation can be interpreted as a disjunction of conjunctions of disjunctions of attribute-value pairs. Such a hypothesis space is much more expressive than the conjunction of attribute-value pairs and the disjunction of conjunction of attribute-value pairs. For a dataset there is a large number of hypertuples which are consistent with the data, some of which can be merged (through the semilattice IST-2001-32429 ICONS Intelligent Content Management System page 36/86
  • 37. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 operation) to form a different consistent hypertuple. By definition, each field in a hypertuple is a set of values. For example, the following table is a hyperrelation where in the first row, the symptom field consists of two alternative values of “sore throat” or “high temp”. Symptom Disease Sore throat ∨ high Flu temperature High Blood Pressure Heart disease ∨ High Cholesterol … … Figure 8. A hyperrelation. In ICONS, we propose to focus on those hypertuples which are consistent with the given data and can not be merged further - they are said to be maximal. The version space is defined as the set of all these hypertuples, which is clearly a subset of the semilattice. An algorithm exists which is able to construct the version space. Implementation for the ICONS system will include use of conventional relational systems to represent hyperrelations, again, as “special cases”. These represent knowledge mined from databases. 5.3.6 Metadata Additional expressiveness to data content is supplied by relationally-specified metadata. We can store such useful information in relational format as a series of tables – e.g., in ADDSIA / MISSION [McClean 2002, McClean 2000] we used categorical table, numerical table, note table, correspondence table, etc. to tackle heterogeneity inherent in multiple data sources. Each table can be represented as a conventional relation and a selection from these table types will be available in ICONS. Metadata is often described as “data about data”. It has increasingly become recognised over the last few decades that such metadata must be encoded alongside data in databases so that it may be used in both a passive and an active role. We consider metadata as providing contextual and operational knowledge about the data in a broad sense and widen the scope to cover the encoding of general knowledge. [Grossman, 1996] defines metadata as formatted, structured description elements. Metadata may be used for: (1) documentation (passive), (2) automated support (active). Metadata may contain relevant contextual information concerning issues of comparability or elaboration, even interoperability. More generally, we categorise metadata into the following roles (using database examples again for illustration): 1. for data processing, e.g., schema information 2. for data access, e.g., locational information 3. for data harmonisation and integration in a distributed, heterogeneous environments, e.g., schema matching 4. providing rules concerning the data integrity constraints 5. providing contextual information to aid interpretation 6. providing information on quality 7. providing information on costs. Agents collaborate within an agency, using metadata concerning processing, access, fusion, rules and context. These can be regarded as forms of knowledge that are utilised by the various agents. Agents compete using metadata on quality and costs. Thus rival agencies may offer higher quality, or lower cost, services to the user. Time representation issues are an important ingredient of the knowledge representation schemes of a wide class of content repositories. Current results in the area of temporal aspects of knowledge management are presented in [Dyreson 2000, Gregersen 1999]. A pragmatic representation of time will be included in ICONS. 5.3.7 Sharing data Modelling primitives and their semantics together give a very important aspect of an ontology-based information/knowledge exchange language. The syntax of such a language must of course be formulated using existing web standards for information representation. IST-2001-32429 ICONS Intelligent Content Management System page 37/86
  • 38. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 The knowledge representation approach based on introducing tags in HTML and/or XML objects to represent the content semantics has been presented in [Dieng 2000, Ginsburg 1999, Shim 2000]. Prototype system solutions based on this approach have been presented in [Corby 1999, Raborijaona 2000]. The disadvantages of tag-based knowledge representation approach have been discussed in [Martin 2000]. In ICONS, XML will be used as a serial syntax definition language for ontology- based information exchange. RDF / RDFS can also do this (encode /exchange/reuse of metadata). The Resource Description Framework (RDF) is the emerging semantic interoperability and knowledge management standard for the web information resources. The RDF standard has been exhaustively discussed in [Decker 2000a, Decker 2000b, Lassila 2000]. It provides a means of adding semantics to a document without making assumptions about its structure. RDF has the advantage of providing a standard syntax for writing ontologies, and a standard set of modelling primitives. RDF schemas (RDFS) provide a basic type schema for RDF. Object oriented concepts such as objects, classes, and properties, can be described. RDF provides a standardised syntax for writing ontologies, and a standard set of object oriented modelling primitives. Therefore, ICONS may offer two syntactical variants: one based on XML schemas and one based on RDF schemas. 5.4 Disjunctive Logic Programming Disjunctive Logic Programming (DLP) is nowadays widely recognized as a valuable tool for knowledge representation and common sense reasoning. DLP is, just like Datalog [Ullman 1989], a deductive database language, but, as is explained below, it extends Datalog's expressivity by allowing disjunction in the head of rules. In this way, the conclusion of implications can be indefinite, which create different possible models of reality, as is shown in the examples below. In general, according to the stable model semantics, a DLP program may have several alternative models (possibly none), each corresponding to a possible view of the reality. In [Eiter et al. 1997f] it has been shown that, under stable model semantics, DLP has a very high expressive power: it captures the complexity class ∑P2. This is strictly higher than Datalog's, as it is not always possible to emulate disjunction through (non-stratified) negative rules. The use of both disjunction and constraints makes DLP a language well-suited to represent and solve a wide class of knowledge-based problems, including deductive database queries, incomplete knowledge, classical optimisation problems, planning, abduction, etc., in a very simple and natural way. For the ICONS project we have selected the DLV system as an implementation for DLP. In the following, we will briefly discuss the characteristics of knowledge representation with DLP, and the kind of problems that it is suited for. Considering the advantages and disadvantages of this approach, we will propose a way of incorporating the DLV system into the ICONS architecture, and we will address the questions and research issues that arise from this choice. Syntax and semantics In this section, we provide a formal definition of the syntax of the Disjunctive Logic Programming (DLP). For further background, see [Lobo et al. 1992, Eiter et al. 1997f, Gelfond and Lifschitz 1991]. We also provide a short informal description of the semantics; for the formal definition, see [Gelfond and Lifschitz 1991]. The main notion in DLP is the rule, which is built from variables, constants, atoms, and literals as follows. An atom is an expression p(t1, … ,tn), where p is a predicate of arity n and t1, … , tn are constants or variables. For example, supervisor(barbara, george) is an atom consisting of the 2-ary predicate supervisor and the two constants barbara and george. Similarly, one can use variables X, Y to form atoms like supervisor(X, george) and supervisor(X, Y). Strings starting with lower case letters denote constants and predicates, while strings starting with upper case letters denote variables. Such atoms or their negated versions (as in ¬supervisor(X, Y)) are called literals. Finally, a rule is a formula of the form a1 v … v an :- b1, …, bk, not bk+1, …, not bm where a1,…,an,b1,…,bm are literals and n ≥ 0, m ≥ k ≥ 0. This rule can be read as "the disjunction of a1,…,an is implied by the conjunction of b1,…, bk and not bk+1, …, not bm". Note that the D in DLP stands for the possible disjunction (i.e. logical "or") in the rules. Furthermore, we call the disjunction a1 v … v an the head of the rule and the conjunction b1,…,bk, not bk+1,…,not bm the body. IST-2001-32429 ICONS Intelligent Content Management System page 38/86
  • 39. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 For example, the rule employee(X) :- supervisor(X, Y) can be read as "if X is a supervisor of some Y, then X is an employee", and the rule female(X) :- person(X), not male(X) can be read as "if X is a person and not male, then X is female". Finally, as an example with a disjunction and with an empty body, the rule female(X) v male(X) can be read as "X is male or female". Note that, when the body is empty, we leave out the implication sign ":-" at the end. A disjunctive datalog program is a finite set of such rules. From the definition it can be seen that many kinds of rules are possible, each with different kind of knowledge that is represented by it. When the body is empty and the rule contains no variables, we call it a fact. Facts are the representation of the intensional database, and there exists a correspondence between e.g. rows of a relational table and DLP facts. The rules person(barbara) and supervisor(barbara, george) are examples of facts. In the ICONS project, the translation of relational and other external database data into DLP facts is of great importance. When the head of a rule is empty, the rule is called an (integrity) constraint, as it expresses a condition of what should not occur in the model of reality. For example, the constraint :- male(X), female(X) expresses that any X cannot be both male and female. Integrity constraints play an important role in database systems. When the head of a rule is either empty or contains only one literal, the rule is called normal. Normal rules express the definite knowledge of implications, where instantiations of the conditions of the body only lead to either an contradiction (in the case of a constraint) or to an instantiation of one literal, in other words: to a fact. Consider the example rule employee(X) :- supervisor(X, Y), which using the knowledge supervisor(barbara, george) leads to the sure conclusion that employee(barbara), which is a fact. A rule that is not normal is called disjunctive, and it expresses indefinite knowledge. The rule boss(X, Y) v boss(Y, X) v equal_worker(X,Y) :- same_team(X, Y) expresses that if X and Y are in the same team, then X is Y's boss, or vice versa, or they are equal co-workers. That means that given the knowledge same_team(tony, beth) we cannot conclude a new fact, but we have only so-called incomplete knowledge that boss(tony, beth) or boss(beth, tony) or equal_worker(tony, beth). The DLV system gives for every DLP program (which includes the facts, i.e. the data) zero or more possible models of reality, called answer sets. Informally, a model can be seen as a consistent set of facts, which are interpreted to be true in that model. An answer set of a DLP program is built up from the constants which appear in the program and it is closed under that program, that is: applying program rules to the facts in the set only lead to facts that are already in that set. Furthermore, answer sets of a DLP program are minimal with respect to set inclusion: that is, there exists no subset model closed under the program. (Note that these descriptions are informal: more precise definitions can be found in [Gelfond and Lifschitz 1991].) As an example, the program consisting of the two rules female(X) v male(X) and person(beth) has only two answer sets: the model {person(beth), female(beth)} and the model {person(beth), male(beth)}. Note that there are no answer sets introducing new constants (like {person(beth), female(beth), male(tony)}, because tony has not been mentioned in the program). Also note that the model {person(beth), female(beth), male(beth)} is not an answer set, even though it is consistent and closed under the program (recall that the disjunction female(X) v male(X) is not exclusive!), because it is a superset of one (in this case both) of the answer sets. Applications Note that the language of DLP programs is declarative: it is not needed to provide the DLV system with a procedure of how to find the matching answer sets, it suffices to tell the system what the rules are that it should obey. Combined with DLP's high expressivity this allows for a human-understandable description of complex problems. Compared to standard query languages like SQL, which only handle so-called local queries, DLP gives a much more powerful mechanism, with which it is possible to answer questions about the structure of relations, like reachability and 3-colorability, of which we will discuss examples below. In this section we will show how Disjunctive Logic Programming (DLP) allows us to represent and solve a large variety of problems in a simple and highly declarative way. In particular, we will concentrate on the following three classes of problems: deductive database, incomplete knowledge and search problems. Deductive database A typical deductive database query (inexpressible in SQL) is the transitive closure of a (binary) relation. As an example, consider the classical reachability problem: given a directed graph G, determine all pairs of nodes (a, IST-2001-32429 ICONS Intelligent Content Management System page 39/86
  • 40. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 b) of G such that there is a (directed) path from a to b. When we use edge(a,b) to denote the fact that there is an edge between node a and node b, the encoding of this problem is the following recursive program: edge(a,b) edge(a,c) ... reach(X,Y) :- edge(X,Y) reach(X,Y) :- edge(X,Z), reach(Z,Y) In other words, one can reach Y from X if there is an edge from X to Y or if there is an edge from X to another node Z, from where Y can be reached. Finding relatives in a family relation defined by the predicate parent is an example of the usage of the reachability query. Incomplete Knowledge Besides database queries, DLP is suitable to represent common sense reasoning. The following is a simple example of how DLP enables the treatment of incomplete knowledge. Consider this situation: ! we’ve seen Michael having a broken arm, but we do not remember which one. ! we know that Michael is used to write using his left hand, so Michael is able to write if its left arm is not broken. The problem is to decide whether Michael can or can not write. Because of the uncertainty due to our incomplete knowledge about Michael’s arms, we cannot definitely answer. Anyway, we can trace two different sceneries: • “Michael’s left arm is broken, so he cannot write”. • “Michael’s right arm is broken, so he can write”. This situation can briefly be represented by the following disjunctive logic program: PMichael = {r1: left_arm_broken v right_arm_broken. ; r2: can_write :- not left_arm_broken.} What is represented by PMichael is very intuitive. It has two models: M1 = {left_arm_broken, not right_arm_broken, not can_write} and M2 = {not left_arm_broken, right_arm_broken, can_write}. M1 e M2 are the two possibile meanings of the problem, and match the sceneries we wanted to represent. Note that it is possibile to represent this situation even through a normal logic program (i.e. without disjunction), simply replacing the rule r1 with the two {r’1: left_arm_broken :- not right_arm_broken ; r’’1: right_arm_broken :- not left_arm_broken}. It is easy to see that this second variant (with the so-called stratified negation instead of the disjunction) makes the program less intuitive. Search Problems Another class of problems that naturally can be represented and solved by DLP is that of search problems. To this end, we show how the Guess&Check paradigm is a suitable technique which supports a highly declarative problem representation. The power of disjunctive rules allows one to uniformly express problems which are even more complex than NP over varying instances of the problem using a fixed program (i.e., a fixed program containing variables that work on any possible input). Given a set FI of facts that specify an instance I of some problem P, a Guess&Check program P for P consists of the following two parts: Guessing Part: The guessing part G ⊆ P of the program defines the search space, in a way such that answer sets of G ∪ FI represent “solution candidates” for I. Checking Part: The checking part C ⊆ P of the program tests whether a solution candidate is in fact a solution, such that the answer sets of G ∪ C ∪FI represent the solutions for the problem instance I. IST-2001-32429 ICONS Intelligent Content Management System page 40/86
  • 41. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 In general, we may allow both G and C to be arbitrary collections of rules in the program, and it may depend on the complexity of the problem which kind of rules are needed to realize these parts (in particular, the checking part). Without imposing restrictions on which rules G and C may contain, in the extreme case we might set G to the full program and let C be empty, i.e., all checking is moved to the guessing part such that solution candidates are always solutions. This is certainly not intended. However, in general the generation of the search space may be guarded by some rules, and such rules might be considered more appropriately placed in the guessing part than in the checking part. We do not pursue this issue any further here, and thus also refrain from giving a formal definition of how to separate a program into a guessing and a checking part. In order to solve a number of problems, however, it is possible to design a natural Guess&Check program in which the two parts are clearly identifiable and have a simple structure: ! The guessing part G consists of a disjunctive rule which “guesses” a solution candidate S. ! The checking part C consists of integrity constraints which check the admissibility of S, possibly using auxiliary predicates which are defined by normal stratified1 rules. Thus, the disjunctive rule defines the search space2, in which rule applications are branching points, while the integrity constraints prune illegal branches. As an example which matches this scheme, let us consider the well-known 3-Colorability problem. 3COL: Given a graph G=(V,E) in the input, assign each node one of three colors (say, red, green, or blue) such that adjacent nodes always have different colors. 3-Colorability is a classical NP-complete problem. Assuming that the set of nodes V and the set of edges E are specified by means of predicates node (which is unary) and edge (binary), respectively, it can be encoded by the following Guess&Check program: r: col(X,r) v col(X,g) v col(X,b) :- node(X). } Guess c: :- edge(X,Y), col(X,C), col(Y,C). } Check The rule r nondeterministically guesses color assignments for the nodes in the graph, and the constraint C checks that these choices are legal, i.e., that no two nodes which are connected by an edge have the same color3. More precisely, let us suppose that the nodes and edges of the graph G are represented by a set F of facts with predicates node and edge. Then the (“guessing”) rule r above states that every node is colored either red or green or blue, while the (“checking”) constraint C forbids the assignment of the same color to two adjacent nodes. The answer sets of F ∪ {r} are all possible ways of coloring the graph. Note that minimality of answer sets guarantees that every node has only one color. If an answer set of F ∪ {r} satisfies the constraint C, then it represents an admissible 3-coloring of the graph. There is in fact a one-to-one correspondence between the solutions of the 3-coloring problem and the answer sets of F ∪ {r,c}. The graph is thus 3-colorable if and only if F ∪ {r,c} has some answer set, and each of the answer sets of F ∪ {r,c} represents a (different) legal 3-coloring of G. The problem 3COL is a popular example of NP-complete problems. We next show that even some harder problem, which is located at the second level of the polynomial hierarchy, can be encoded in a straightforward way in DLP. To this end, we consider the following problem Strategic Companies. 1 For a definition of stratification, see [Apt et al. 1988]. 2 In some cases it would be possible to replace the disjunctive guessing rule by rules with unstratified negation. However, this is not possible in general. Disjunctive rules also have the advantage of being more compact and usually also more natural. 3 In this example, we assume that G contains no loops, i.e., edges from a node to itself. Such loops can be easily handled by adding X<>Y to the constraint. IST-2001-32429 ICONS Intelligent Content Management System page 41/86
  • 42. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 STRATCOMP: Given the collection C of companies owned by a holding, together with information about the products each company produces and company control, compute the set of the strategic companies in the holding. Let us recall from [Cadoli et al. 1997] what a “strategic company” is in this context. Each company in the holding is producing a collection of goods, such that the holding produces a collection of goods G which consists of all goods produced by its companies. Company control information models that a set of companies D ⊆ C jointly may have control (e.g., by majority in shares) over another company c ∈ C. (Companies not in C, which we do not model here, might have shares in companies as well). The company control information in STRATCOMP lists records of such control information in terms of “controlling sets” D for “controlled” companies c. Note that, in general, a company might have more than one controlling set, and only non-redundant controlling sets (i.e., no proper subset is a controlling set) are recorded then. Now, some companies should be sold by the holding, while the following two conditions have to be maintained: 1. After the transaction, the remaining set of companies C’ ⊂ C still allows one to produce all goods. 2. No company is sold which would still be controlled by the holding after the transaction, i.e., if D is a controlling set for c ∈ C and D ⊆ C’ holds, then also c ∈ C’ holds. A set C’ ⊆ C is called a strategic set, if it is minimal with respect to inclusion, that is, it satisfies both (1) and (2), and no proper subset of C’ satisfies both (1) and (2). In general, the strategic set is not unique, and multiple solutions for C’ exist. A company c ∈ C is called strategic, if it belongs to at least one of these strategic sets. Computing the set of all strategic companies is relevant when companies should be sold, as selling any company which is not strategic for sure does not lead to a violation of any of the conditions (1) and (2). This problem is ΣP2-hard in general [Cadoli et al. 1997]; reformulated as a decision problem (“Given a particular company c in the input, is c strategic?”), it is ΣP2-complete. To our knowledge, it is the only KR problem from the business domain of this complexity that has been considered so far. We next present a program, which solves the complex problem STRATCOMP in a surprisingly elegant way by a few rules: r: strat(Y) v strat(Z) :- produced_by(X,Y,Z). } Guess s: strat(W) :- controlled_by(W,X,Y,Z), strat(X), strat(Y), strat(Z). } Constraint Here strat(X) means that X is strategic, produced_by(X,Y,Z) that product X is produced by companies Y and Z, and controlled_by(W,X,Y,Z) that W is jointly controlled by X,Y and Z. We assume that a set of facts for company, controlled_by and produced_by is part of the input and have adopted the setting from [Cadoli et al. 1997], where each product is produced by at most two companies and each company is jointly controlled by at most three other companies (in this case, the problem is still ΣP2-hard). The answer sets of the program together with the encoded facts correspond one-to-one to the strategic sets of the holding. Thus, the set of all strategic companies is given by the set of all companies c for which the fact strat(c) is true under brave reasoning. In fact, it is possible to encode the same problem with the Guess&Check paradigm, in the same shape as the previous example. For details about that, see [Eiter et al. 2000]. Strategic Companies is a good example of the kind of complex knowledge a user of a knowledge management system may want to extract from the repository. Along the same line, one could think of personnel allocation and management problems, which could be solved by similarly straightforward programs. Further examples can be found in [Eiter et al. 1997f]. DLV system in the ICONS architecture As is clear from the above examples, the enhancement of a knowledge management system with DLP techniques is a major innovation, as a number of complex problems can be solved that are not solvable (and expressible) within existing traditional systems. Data from the repository can be used within the DLV system by transferring the relational data model into data as modelled by DLP (that is, facts). There are already existing tools within the DLV system to use SQL queries to extract the needed data. Still, the incorporation of the DLP techniques within the ICONS project is not entirely IST-2001-32429 ICONS Intelligent Content Management System page 42/86
  • 43. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 straightforward, as some complexity issues have to be taken into account. Tests of the DLV system show that its strength lies in solving complex problems on reasonable amounts of data. Because the system does not have its own internal DBMS, it does not effectively deal with larger amounts of data. However, within the ICONS system the amounts of available and accessible data will be large. For that reason, we seek to change the DLV system to a fruitful co-operation with a main memory database system, which would maintain the data management for DLV. To this end, a mapper is to be developed, which selects the needed data from the data repository and stores it into the MMDB, before invoking the DLV system. Research issues are: - For what kind of problems is it needed to speed up the DLV handling time by pre-selecting data? - How to select the needed data given a DLP program plus optionally a query on the answer sets? - Given a particular program, can we develop a mapper which selects data with the actual query (or the constants in the query, as focal points) as parameters? - How can we prove that the data we select give the correct answer, i.e. if they give in all cases the same answer sets as the full data would have given? - Is it possible to considerably decrease the number of selected data (and hence increase efficiency) without losing correctness, or will we have to pay a bit more efficiency with a great decrease of correctness? - Is correctness for all problems a discrete notion, or can we think of applications where a scale of correctness would make sense? Think of optimization problems as the travelling salesperson where we may not be interested exactly in the most optimal solution (which would take a lot of time), but rather in a fast solution which is, say, at least 90% optimal. Are there straightforward ways to decrease the selected data considerably, while retaining a level of correctness (or optimization) that is "good enough"? - Would it be possible to have the user choose a level of correctness (e.g. 100% or 90%) which would have a semantics that is easy to understand, also for the user who is not a specialist? Part of the selection can be done in a quite straightforward way by selecting only the relational tables which are mentioned in the program, or by calculating the maximum needed "distance" from focal points, for relations that are defined in a non-recursive matter. Another part is addressed by ongoing research, like the research on so- called magic sets. Usage The integration of DLV system as described above will allow for many different user applications. As the complexity of the problems that can be solved with DLP causes the language to be somewhat complicated, it may be difficult for the incidental user to use it to its full power. This is not a problem in itself; it is the nature of any computer system that different users will use different powers of the system. In the ICONS system, less experienced users can still be offered the possibility of querying via DLV, by means of available help schemas or pre-defined queries. In that way, we can distinguish the following 3 ways of accessing the DLV engine, in decreasing order of required familiarity with the system. First, there will be the availability of a direct user interface to DLV, where users can construct their own DLP programs and queries, possibly enhanced with options of keeping track of the individual search history and the sharing of often-used programs with others. Secondly, one can think of a shared library of expert programs that can serve as schemas to be edited for individual use by experts or other users. Experts could maintain this library, possibly in co-operation with a database expert. Thirdly, there may be often-used queries that could be ready for use without knowledge of DLP. This kind of queries could be implemented at the system installation phase, and could be maintained by local database experts. As an example, one can think of regular dependency checks, like in fraud checks or testing, which could be executed at regular times (once a week, overnight), or at individual instances of problems like dividing a set of persons into groups, with several constraints. These kinds of settings can be generalized and made available to people who do not have much knowledge of DLP (yet). On the other hand, this sliding scale can also be seen as a natural means of education: after having used the standard queries several times, one may try to edit an expert query. And after having dealt with several expert queries, one could be ready to write his/her own programs. 5.5 Procedural knowledge representation features As was stated earlier, there are several types of knowledge representations. One of them is procedural knowledge that defines algorithms how to achieve a given goal. In the context of organisations, such algorithms are called business processes. A business process defines what units of work, when, and by whom should be performed in order to achieve a given goal, that is to produce a product or to provide a service. Innovate, efficient and flexible business processes help an organisation to be competitive and play the leading role on the market. IST-2001-32429 ICONS Intelligent Content Management System page 43/86
  • 44. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 From repeatability point of view there are two types of business processes: repeatable and non-repeatable processes. The former are well-defined and mass processes. Usually the influence of the management on process control is rare. Changes in such process occur seldom and are evolutionary. The latter are requiring high degree of flexibility and can be well-defined only at the high level of abstraction. They are unique – usually they can be executed only once. Changes of such processes occur frequently and can be revolutionary. Business processes can be supported by computer automation (partially or fully). One of the most popular and effective tools to support business processes are workflow management systems (WFMSs). In a WFMS an automatable part of a business process is represented as a workflow definition. According to the WfMC’s meta-model defined in [WfMC2001], the main elements of a workflow definition are: ! activities – pieces of work that form logical steps within a workflow process. An activity is performed by one or more workflow participants; ! transitions – a point during the execution of a process instance where one activity completes and the thread of control passes to another, which starts. A transition can has a condition, which may be evaluated in order to decide the sequence of activity execution within a workflow process; ! workflow participant – a resource set, resource (specific resource agent), organisational unit (within an organisational model), role (a function of a human within an organisation), human (a WFMS user) or system (an automatic agent) that performs activities; ! control data - representing the dynamic state of workflow instances and the WFMS (e.g. workflow definitions); ! audit data - representing the history of workflow instances execution; ! relevant data - used for evaluation of conditional expressions, for instance, expressing transitions or participant assignments. WFMSs enable workflows to be designed, executed, monitored and optimised. If a workflow process is executed for a given case it is called a workflow process instance. Other elements of workflow definition are fully described in the WfMC’s workflow glossary in [WfMC1999]. In the ICONS project, workflow definitions will be stored as ordinary information objects and treated as a part of organisational knowledge. Usually, in order to increase the readability of the defined workflows, a workflow definition is modelled in a graphical tool. Such tool helps users in understanding defined processes, and during execution, checking which activity(s) of a given process is being performed. In addition such tool is used to simulate and test workflow processes before their implementation at customers. In the ICONS projects we are going to use a well-known, commercial workflow modelling tools such as Aris Toolset and iGrafx. Organisations expect that implementing their business processes as workflow processes in a WFMS can help them to produce a product or to provide a service: ! of optimal quality, ! by optimal period of time, ! with optimal resource effort, ! at optimal cost. In this context, optimal means that something is done at expected or the best level that it can be done with respect to the other factories of the workflow process. In order to satisfy the above factories, WFMSs should support: ! flexibility – a WFMS should be able to adapt dynamic changes that are required during a workflow process instance(s) execution in order to satisfy the expected criteria. Dynamic changes can apply to all aspects of workflow definition such as control flow, workflow participant assignments, and time management. Dynamic workflow modifications, depending on their durability, concern workflow definitions or workflow instances. A WFMS should use statistical, heuristic and artificial intelligence to modify workflow definition or workflow process instances. Adaptation of dynamic changes should be done on the basis relevant as well as control and audit data. Especially for non-repeatable processes, flexibility is very important, since these processes can not be fully specified a priori, at the workflow definition stage. In the ICONS we would like to implement a method of dynamic modification of control flow presented in [Aalst1999], extend a language for dynamic workflow participant assignments as well as control flow conditions (referred further to as WPAs and CFCs respectively). In order to increase the flexibility of the IST-2001-32429 ICONS Intelligent Content Management System page 44/86
  • 45. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 defined WPAs and CFCs we would like to use Datalog rules as WPA and CFC functions. The extension to the WfMC’s definition of WPA has been described in [Momotko2002]. ! risk management - the main aim of the risk management is to avoid undesirable situations as well as to minimalise the negative results of those that already occurred. In our opinion the risk management should, at least, take into consideration such aspects of workflow as time management and task scheduling. The former is described in detail in the section 7.3 and the latter – in the section 7.4. As it is stated in [Koloupulos1995] and [Stader2001] the above requirements are not fully supported by the current WFMSs and should be developed in a knowledge-base, or intelligent WFMS. Moreover, it seems that at the moment the above features of an intelligent WFMC are not well-defined in the appropriate WFMC standards. In the ICONS project we will suggest some extensions to the WfMC standards and to develop a prototype to test practically their usefulness. 5.6 Knowledge representation and manipulation in the graphic user interface ICONS Graphic User Interface (ICONS GUI) is a tool to be used by a Web application developer for visualisation of user requests and outputs from the ICONS data / knowledge base. ICONS GUI cannot be separated from other issues related to the general data/knowledge base architecture. Its main role is visualisation of data stored in data/knowledge base. More precisely, it has to deal with visualisation of user requests to a data/knowledge base, together with visualisation of data retrieved from the database as the result of the requests. The interface should also allow some manipulations on the data/knowledge base, for instance, altering, creating or deleting some data. Hence during ICONS GUI design we must deal with the following issues: • A data model of a data/knowledge base that a graphic user interface will operate on. • Stored data structures (presented on the proper level of data independence) that will be searched or manipulated during requests. The data structures must be designed on the level of algorithmic precision, as their semantic properties will be directly used by ICONS GUI. • A user language for data description that will allow the user to have a view what data/knowledge base contains. This language can be designed on the level of database schema (c.f. CORBA IDL or ODMG ODL) or on the level of business ontology that describes not only structural properties of the data/knowledge base, but also some metadata related to the business domain. • Some universal API (a query language) that will allow one to make retrievals and manipulations on the database. The API must contain not only specification of a retrieval/manipulation language, but also specification of formats that will be returned by retrieval requests. These formats may be (and usually are, cf. ODMG) different from stored data structures (but based on the same notions). Since the results of requests will be an input to the GUI module, they must also be specified on the level of algorithmic precision. • ICONS GUI should contain features that will allow an application developer to customise the package to a particular application. The customisation can concern graphical icons that will be presented for the end user, navigation or browsing paradigms (i.e. additional actions connected with a single navigation act), as well as database views that will simplify the conceptual model of the application. Application program graphic API customization GUI module API to a database: results of requests GUI DB API to a database: queries, manipulation requests Figure 9. Architecture of the GUI module. IST-2001-32429 ICONS Intelligent Content Management System page 45/86
  • 46. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 The general GUI architecture, including the context of its use, is presented in Figure 9. The following elements must be considered during the development: • GUI module: it is generic software used by a developer of a Web application to prepare programs making interactions of Web end users with the data/knowledge base. To this end the developer has to use the following interfaces: − Customisation: it means parameterization of the entire GUI, according to some wishes of the developer, e.g. fonts, colours, kinds of icons to be displayed, etc. The customization may also require some (virtual or materialized) database views on the database, which will be used by the developer for conceptualisation of an application. − Graphic API: the interface makes it possible to activate/deactivate particular graphical widgets (buttons, menus, pictures, input/output text fields, tables, etc.) on the Web end user screen. Graphic API should enable presenting various forms of graphs on many levels of detail and with some possibilities of manipulations, e.g. changing colors to present the user navigation in the graph. − API to database: it is used by a developer to write scripts associated with events that can hold on particular widgets. For example, clicking a button named GetCompanies means issuing a request to a database “select * from Company”. API includes facilities to process results of requests received from the database. These facilities are used within an application program prepared by the application developer. The results of the requests are the input to the GUI module. • GUI DB: it is a database or a file storing customisation information (e.g. a palette of icons) and the current state of the interaction with a particular user (e.g. the history of operations, current results of search, views, etc.). An important feature of the whole interface is genericity, which means flexibility, robustness and independence on a particular application domain. The ICONS project architecture assumes multi-paradigm data and knowledge representation and processing. In particular, the architecture assumes (more or less explicitly) the following data models and corresponding paradigms: • “pure” object-oriented model, • relational or object-relational model, • XML model including typing facilities such as DTD and XML • Schema and mapping facilities such as XSL and XSLT, • RDF model, • Rodan Portal model, • Datalog model, semantic network, • temporal model, • model for process knowledge such as a workflow model assumed by WfMC, • perhaps other models that will appear as results of contributions of ICONS participants. This variety of considered and potential models has led us to the necessity to establish and develop a kind of a canonical data model that will present a “common denominator” of the various other models. As a candidate canonical model we have chosen a variant of an object-oriented database model in the spirit of ODMG, with significant improvements concerning enhancing it with dynamic object roles, cleaning up its semantics, observing principles such as object relativism, total internal identification and orthogonal persistence. The model will be quipped with a data query/manipulation API based on a query language SBQL built in the spirit of ODMG OQL, but based on fundamentally new semantic principles known as the Stack-Based Approach (SBA). In Figure 10 we present architecture of the wider context of the ICONS GUI interface, which includes the interface to canonical model through SBQL and wrappers to databases proprietary to particular data./knowledge representation paradigms. We plan that more sophisticated mapping of source data structures into canonical objects will be possible through object-oriented virtual views built on top of SBQL queries. In effect, the ICONS GUI will be conceptually and physically isolated from particular solutions concerning representation of data, thus allowing IST-2001-32429 ICONS Intelligent Content Management System page 46/86
  • 47. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 the developer and user of Web applications to have unified view on heterogeneous data resources that the ICONS architecture will deal with. This idea is much influenced by the CORBA IIOP bus, but shifted to higher conceptual level (i.e. the level of a query language). Web client user requests HTML Page HTML Page Generator User requests processor Application program other APIs graphic API customization GUI module GUI DB SBQL queries API(canonical model) Object view processor Object query processor SBQL DB CRUD API Rodan Portal wrapper XML/RDF DB wrapper another wrapper Rodan Portal API XML/RDF API another API RODAN Portal DB XML/RDF files another DB/file Figure 10. ICONS GUI module with interfaces to databases. Navigation in a graph of inter-linked objects is a very attractive searching paradigm, which is so far not sufficiently explored in the context of Web applications. We can distinguish two kinds of such navigation: • Direct manual browsing in a graph of explicitly presented graphically objects. For instance, we can present on the screen the graph of connected objects and the user is allowed to move along named edges of this graph according his/her wishes. Another example of this kind of searching is navigation in a network of concepts (semantic network), navigation in a network of HTML pages, etc. • Manual browsing and searching in a graph presenting some data description or conceptual model of stored data. In contrast to the previous case, where the navigation concerns explicitly visualised objects, in this case only some description of objects is visualised, e.g. a UML schema. The user navigates in this schema; the effect of navigation is retrieval of objects that are of interest to the user. There are several problems connected with this kind of interfaces: • Size of end-user screen: usually it is impossible to present a very big and complex graph, hence it must be presented partly, with zooming facilities, perhaps with 3D views and with hiding details of objects depending on the mode or stage of searching. • User awareness: the user can very quickly lose orientation during navigation in a complex graph, thus special graphic facilities are necessary to keep him/her aware of current sub-goals or results of the search. • Combining manual and predicate-based automatic navigation. • Elliptic queries: for some kinds of navigation it would useful for the user to omit some details of navigation. In the graph navigation facility we would like to combine manual browsing in a graph of associated objects, selecting objects by predicates, and collecting results in user baskets. The idea is that the user during navigation collects interesting information within his/her personal baskets. This metaphor is illustrated in Figure 11 and Figure 12. IST-2001-32429 ICONS Intelligent Content Management System page 47/86
  • 48. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 B x y C A w B z D x v y t A C v z x B z w D A C y t B Figure 11. A graph of objects. In Figure 11 we present a graph, where objects (named A, B, C and D) are connected by directed edges x, y, z, t, w, v. As seen, we do not require the names of objects and names of edges to be unique. Objects can store information (attributes and their values) which can be displayed for the user. The user can select starting objects for navigation through the following actions: • Manual choice through clicking and marking proper objects on the screen (e.g. on the basis of their content, which can be optionally displayed). • Introducing a name of objects and a condition on their contents. • Taking proper objects from his basket (which has been filled in at previous search). After selecting initial objects she can navigate in the graph through named edges (selected from the menu). Suppose the user initially selects 1st and the 3rd object A, and then uses edge y. This means that she is moving to 2nd and 4th object B. If then she uses edge z, then she is moving to the 2nd object C. Then if she is using edge v, she is moving to both objects D. Objects that are selected during this search we will call marked; other objects are unmarked. During this process the user is allowed to do any actions, such as marking/unmarking objects, display object, move references to objects to her private basket, etc. The idea of basket is directly corresponds to the virtual shops metaphor. It has to support user awareness. A basket is a graphical element with icons representing selected objects. A basket has a unique name. Baskets can be organized hierarchically (similarly to operating system catalogs). They are persistent structures, i.e. they are stored in the database. In this way a single search can be subdivided into many user sessions. My today search B D D B C A A A A Figure 12. The idea of the user basket. Each basket has a name. The user can also assign to the basket some longer description or comment. The content of the basket can be presented in the 3D graphics. Icons representing particular kinds of objects can be different (they could be the subject of customization). The content of each object in the basket can be displayed. Each object in the basket should be supported by the following information (e.g. presented as a table): an icon representing the object, object identifier and name, representation of the object content, date/time of finding the object, and any string comment (annotation) introduced by the user. Example of the content of a basket is illustrated in the following table. IST-2001-32429 ICONS Intelligent Content Management System page 48/86
  • 49. Intelligent Content Management System 1.15 The ICONS Knowledge Representation Features April 2002 Icon Id Object Retrieval Object content Comment Name date " 23156 Person 02.01.03 John Smith I have checked him yesterday. " 23456 Person 02.04.05 Mike Brown Smart client! # 766585 Document 02.08.19 Order 234527 Currently processed, ready in 2 days ! 3453453 Company 02.07.19 Brainstorm Ltd. Our best supplier. Navigation in a graph of objects could be connected with additional options, in particular, calling applications. For instance, if the navigation concerns semantic network used as an intelligent searching index, then after the search within the network the user can display the corresponding objects from the database, or display Word file, go to the Web through a URL, etc. A similar idea is assumed to navigation in a database schema, but a schema graph is displayed rather than the graph of objects. The schema graph should correspond to the canonical data model and data description of stored data according to the model. The graph will be presented as an improved subset of UML class diagrams (or ODMG ODL), to make it relevant to data description language assumed for the canonical object model. All other rules of marking, collection references to objects within baskets and calling applications should be similar as for the case of navigation within a network of objects described previously. An unexplored area in graphical querying concerns the paradigm known as Query By Example or Query By Forms (or simply Forms). This paradigm was extremely successful for relational database. We can consider to apply it for object/XML bases. The basic idea of this paradigm is that the system is displaying for the user an empty form based on a data description statement. For instance, it can present an DTD form, where corresponding XML values are initially empty. The user is filling in an empty field A in the form with a string value V (and possibly with some additional mark determining the kind of comparison). Then the system is filling in the rest of the form by values stored in the database, where field A has the value V. This paradigm can easily be adopted for object or XML databases. IST-2001-32429 ICONS Intelligent Content Management System page 49/86
  • 50. Intelligent Content Management System 1.15 The ICONS Intelligent Content Integration Features April 2002 6. The ICONS Intelligent Content Integration Features The ICONS Content Repository (ICR) is to comprise content objects (CO’s) representing knowledge artifacts stored and manipulated by the ICONS content management functions. The knowledge artifacts directly represent results of intellectual work or may be derived from external information sources, such as information systems, databases and web sites. The ICONS Global Knowledge Schema (IGKS) is to include partial definitions of the content object data structures, content object methods, definitions of the content object relationships, as well as the content object taxonomies. The ICONS content objects are stored as XML documents conforming to the corresponding XML schema and comprising un-interpreted binary elements stored as files in a hierarchical memory system. The IGKS comprises meta-information pertaining to all CO classes represented in the repository regardless of the storage and access modes used to materialize their values. An important characteristic of the ICR is the flexible data structure, partially defined in the repository schema that escapes the traditional database requirement of consistent and complete database schema vs. database instance correspondence. Rather, the IGKS may be treated as a guide for interpreting the structured parts of the content objects and for navigation in the CO relationship structures. On the other hand, all CO methods must be defined and implemented with support of the object model inheritance structure, in order to provide facilities to manipulate the content object values. There are two dimensions of the ICONS content distribution. The first pertains to distribution of the system content repository comprising the Content Base and the Ontology Base and the hierarchical storage management processes among the ICONS servers. The second concerns integration of external information sources, such as pre-existing heterogeneous databases, legacy information processing systems, and web information resources. The first case is addressed in chapter 8. Integration of the external information resources is to be performed with the use of the XML-based wrapper technology. Wrapper programs producing required XML documents for extracted data serving as containers for file elements are to be enriched with RDF specifications resulting from extracting semantics from database schemata of the external databases, or appending semantic information in the case of the legacy information processing system outputs. The wrapper programs will be generated in the form of Enterprise Java Bean modules including the necessary query statements. Due to the open nature of the ICR the content integration features are envisaged as natural extensions of the ICR management features and they are discussed below in the context of the repository schema as well as in the context of the repository data structure. Finally, the content integration support to be developed within the ICONS project is outlined in the final section of this chapter. 6.1 The ICONS Global Knowledge Schema The Icons Global Knowledge Schema (IGKS) is to comprise the structural knowledge representation features, including partial specification of CO data structures, definition of CO methods and inheritance hierarchies, and CO relationship bindings, as well as the knowledge map representation features to be developed as multi-level taxonomic trees. The XML schema is to provide the partial specification of the CO data structure representing an arbitrary XML document tree. The leaf nodes may represent unstructured binary objects stored as files in the ICONS hierarchical storage structure. The CO class methods are to be defined within Java classes, where a Java class corresponds to a CO class defined in the XML schema. The inheritance structure is to be specified within the Java classes. We propose that special CO methods called inference methods are defined as a triple <R, M, F>, where R is a set of Datalog rules, M is the materialization algorithm to dynamically create F, and F is the relational data structure representing facts. The inference methods are to be executed by the ICONS inference engine based on DLV (Buccafurri1998). The CO relationship bindings, to represent relationships implementing the structural knowledge meta- information, are to be specified as relationship predicates. The relationship predicates are logical expressions defined on CO properties. The CO relationship bindings are to specify binary and n-ary object relationships with arbitrary relationship cardinalities (1:1, 1:N, N:M). It is proposed that all CO relationships represented in the ICR IST-2001-32429 ICONS Intelligent Content Management System page 50/86
  • 51. Intelligent Content Management System 1.15 The ICONS Intelligent Content Integration Features April 2002 are materialized dynamically during the corresponding query execution. Appropriate data structures are to be developed to support efficient materialization of CO relationships. The knowledge map consists of multi-level taxonomic trees representing either the closed taxonomies based on a specified list of categories, or open taxonomies based on an arbitrary value (values) of CO properties. Taxonomies are defined as logical expressions defined on CO properties and are to be materialized dynamically. We introduce a special class of implicit taxonomies grouping content objects by CO class and CO identifier. Thus, content objects are always accessible by navigation via some taxonomy. 6.2 The ICONS Content Repository The ICONS Content Repository consists of two distinct, strongly inter-related parts, the Content Base and the Ontology Base. The Content Base, organized as a hierarchical storage configuration, is to store Content Objects in the form of XML documents including binary file elements. The Ontology Base is to be organized as a relational database including tables comprising selected XML object properties represented by table attributes. Appropriate relational tables are to be created for each CO class. The table attributes are to used for attribute- based CO selection, or as arguments of relationship binding and taxonomy expressions. The XML document properties will typically represent meta-information pertaining to the contents of the included file elements. Such property values are to be either defined manually, or extracted automatically from contents of the file elements. Properties representing structural or taxonomic knowledge will be replicated in the Ontology Base. Data redundancy is introduced in order to enable efficient manipulation of meta-information and to avoid complex data mappings during XML object manipulation operations. Content objects may either by persistent in the ICONS repository or they may be materialized on request during a repository user session. The life cycle of a persistent content object starts from the object create operation and expires after an explicit destroy operation. Content objects as well as their file elements may be organized in the form of version trees reflecting content modifications taking place during the object life cycle. The transient content object classes are to be represented by class templates providing means to specify properties of objects to be dynamically materialized during the user session. The object materialization algorithms must be implemented in the object class methods. Transient objects may be stored in the repository for a specified period between user sessions, either as frames comprising the desired content materialization parameters, or as complete content objects. In the latter case, the content object property values and elements may be refreshed at specified interval times. 6.3 Integration of the heterogeneous content sources Integration of heterogeneous, pre-existing databases has been an active research field in 1980ties and early 1990ties. A collection of papers comprised in [Hurson1994] provides a good insight into the state of the art in the area of multidatabase systems. The current research and development efforts have gone in direction of integrating the Web information resources, as shown in [Goeschka2001, Hammer1997, Knoblock1998], integration of object-oriented and multimedia databases [Chang2001], and extracting database semantics into a global dictionary [Lawrence2001]. Extracting semantic information from text-based information sources has been presented in [Soderland1997]. Integrating information from legacy information processing systems, in particular dealing with results of data mining queries has been discussed in [Buchner2000]. The emerging approach is to represent a common schema of integrated information resources as a XML repository and the technique for extracting and representing the underlying semantics is based on construction of wrappers to encapsulate the heterogeneity in accessing the diverse information sources. Wrappers are software modules that can transform data from a less structured representation into a more structured one. Examples if the wrapper- based solutions may be found in [Hammer1997, Kushmerick1997, Sahuguet1999]. The ICONS architecture provides facilities in the form of standard interfaces to accommodate diverse wrapper technologies ranging from Java beans including database queries and the required data mapping algorithms, to intelligent agents scanning predefined information sources for the required information. In all cases, we assume that the required data integration and mapping rules must be specified manually at the ICONS application development time. Typically the integrated data will be stored as the XML content object file element with IST-2001-32429 ICONS Intelligent Content Management System page 51/86
  • 52. Intelligent Content Management System 1.15 The ICONS Intelligent Content Integration Features April 2002 semantics determined by the integration and mapping rules. The element meta-information may automatically be extracted and stored as the XML content object properties. The bulk of our research effort will be directed towards development of knowledge-based wrappers supporting integration of semi-structured information comprised in XML documents possible enhanced with the RDF semantic information. The XML technologies are the emerging information exchange standard facilitating information interchange and inter-operability of web-based as well as legacy information systems. The knowledge-based wrappers will be developed as Datalog programs to be executed by the ICONS DLV module. Similar approach to integration of semi-structured data has been reported in [Baumgartner2001]. IST-2001-32429 ICONS Intelligent Content Management System page 52/86
  • 53. Intelligent Content Management System 1.15 The ICONS Intelligent Workflow Features April 2002 7. The ICONS Intelligent Workflow Features 7.1 Dynamic workflow participant assignment As it is reported in [Momotko2002] a modern WFMS need to adapt dynamic changes. Especially dynamic changes in WPA are important. Some of the main requirements for WPA declared by WFMSs customers are: • Control and audit data – data on finished or currently executed workflows, for example: • a person that has the lightest workload or minimal number of tasks to perform, • a workflow participant that started the workflow, • a workflow participant that performed the previous/preceding activity, • a worker that does not have activities that have to be executed by Friday, • a salesman that in the last week performed more than 30 workflows. • Relevant data – processed data, organisational structure or other data, for example: − a user participant that is defined as a tester of a given system bug, − an employee that is the supervisor of Mr John Bean, − a manager that is the chief of the sales department, − a person that knows Java and XML, − a workflow participant that has the ‘knows English’ role, − a salesman who is responsible to the region of the customer who sent the claim; • A WPA should be able to express the situation when workflow participants assigned to a given activity are selected ad-hoc, manually during workflow execution; • A WPA should be able to express organisational and functional structures, in particular user groups that exists in an organisation; • A WPA should be able to express the situation when exactly one workflow participant from a selected group should perform an activity; • A WPA should be able to define a workflow participant who will perform an activity if workflow participant assignments return inadequate set of workflow participants (e.g. an empty set). In order to satisfy the above requirements and to assure the high level of flexible WPA, in the ICONS project we will use the WPAL language to define dynamic WPA presented in [Momotko2002]. The above mentioned approach proposes an extension of the WfMC’s definition of WPA. Moreover we consider using Datalog rules as WPA functions and an approach of assigning intelligent agents to activities on the basis of knowledge available from ontologies. This approach has been described in [Jarvis1999]. 7.2 Dynamic control flow condition definition Similarly, to the notation of WPA, we suggest to define a procedural language to express control flow conditions (CFCs). A control flow condition is a pre or post activity condition and a transition condition. A flow condition should be built on relevant as well as control and audit data. It should also use logical operators (AND, OR, NOT) and predefined functions, for example a function to check if the activity of testing a repaired car is necessary or can be omitted. We consider using Datalog rules as such functions. In addition, there should be possible to have a library of the flow conditions already defined in order to reuse them. Such feature could reduce the cost of implementing a new workflow process. Moreover such approach can express optional activities. The same idea but different implementation is presented in [Klingemann2000]. 7.3 Time management In the ICONS project we would like to extend the idea of time management presented by Eder and Panagos in [Eder2001], [Eder1999], and [Eder1997]. In order to represent time information, they defined two basic temporal types, namely durations and deadlines. Both durations and deadlines can be defined for individual activities and to the whole workflow process. Duration is a duration time to perform a given activity/process. Duration can be either calculated from past workflow executions or it can be assigned by specialists based on their experience and expectations. The most common duration values are minimum, maximum, and average. A deadline IST-2001-32429 ICONS Intelligent Content Management System page 53/86
  • 54. Intelligent Content Management System 1.15 The ICONS Intelligent Workflow Features April 2002 corresponds to maximum allowable execution time for an activity/process. Deadlines do not have to be assigned to every activity of a workflow process, but it is beneficial to assign deadlines to all activities. In our opinion, the above approach to manage time in WFMSs seems to be promising. However, on the basis of our experience we think that in real workflows also waiting time has to be considered. A waiting time is time between placing an activity in a given workflow participant’s task list and the moment when the participant begins to perform the activity. Especially for workflow participants that have many activities to perform, such time can be significant. Waiting time depends at least on the type of performed activity, a workflow participant assigned to the activity, and the number of activities that have to be performed by the participant. Moreover, since in distributed WFMSs time to transfer control flow between two consecutive activities (i.e. workflow participants that perform those activities) can also be significant, we suggest to consider waiting time as well as transfer time. Transfer time depends mainly on the quality of communication links between workflow engines. Users who define a workflow process can not assign waiting and transfer times. They should be calculated from past/current workflow process executions 7.4 Task scheduling In order to reduce waiting time we will adopt well-known task scheduling algorithms to WFMS’s requirements. In our opinion, the function to prioritise activities should be flexible, and defined in the context of a given workflow process. Such function could use relevant application data as well as control and audit data, for example information about deadlines and durations, the cost of resources that have to be used to perform a given activity, the significance of the activity, etc. For each type of data, an administrator of the workflow process would be able to define its importance. For example – duration violation – 10%, deadline violation – 30%, the overdraft of the activity cost – 60%. 7.5 Extensions with respect to the WfMC's workflow process meta-model In order to disseminate the described features of an intelligent WFMS the following extensions to the WfMC’s standards are needed: • introducing the WPAL language to express dynamic workflow participants assignments, • sorting out the language to represent CFCs. Introducing CFC functions and the CFCs reuse mechanism, • representation of a complete model for time management. IST-2001-32429 ICONS Intelligent Content Management System page 54/86
  • 55. Intelligent Content Management System 1.15 The ICONS Distributed Processing Organisation April 2002 8. The ICONS Distributed Processing Organisation 8.1 The ICONS scalable, distributed architecture To reach the practical acceptance, the ICONS goals require especially efficient data storage and processing architecture. This condition is difficult although crucial. It prohibited most of ambitious projects with similar goals from becoming more widely used (or used at all). The main prerequisites can be listed as follows: 1. Permanent data volume is large (many GBs). It is continuously growing, because of new knowledge. The current practice shows that the growth rate could easily reach 100 % year. 2. Temporary data can have largely unpredictable volume. Joins, or transitive closures, or more complex recursive computations often lead to tuple number explosion. Selectivity of these operations may be impossible to evaluate in practice. Even large temporary files have to be nevertheless accommodated in real-time and without performance deterioration. 3. Queries have to be processed in a way where response time is as independent of data size as possible. Definitively, this time cannot be a linear function of the file size. 4. Permanent data are highly valuable. They have to be reliably protected against loss and corruption. They have to be also highly available. With the Web available anytime & everywhere, 24/7 access is today a must. It becomes well accepted that no traditional centralized architecture can meet such goals, [CACM97], [Gray1996]. The single server CPU capacity, even if it is a multi-CPU one, or an expensive supercomputer, must become overfilled. Likewise, the available RAM storage quickly suffices for a fraction of the data only. Access to those on disk deteriorates the response time easily by two orders of magnitude. For many GB data sets, disk may overfill to the next level of the storage hierarchy with a similar performance deterioration ratio. The number of disk units that can be connected is also often rapidly reached in practice, and must be reached in any case when a scaling data collection should be managed. Sophisticated data operations often use scans, which has a response time, at any single server, at least linearly dependent of the data size. These were the constraints that basically no research or industrial system could successfully overcome till now. Finally, failure of the data server, may entirely prohibit the access to data at best, or may cause data destruction at worst. Many folks at World Trade Center made a bitter experience of this kind at 9/11/2001. This state-of-the-art and technological progress brought a new type of architecture, often termed a scalable distributed architecture (SD-architecture). Today, this framework seems the only one able to fulfil the ICONS goals and constraints. Our goal is to base ICONS on an SD-architecture. The keyword distributed in an SD-architecture is basically quite classical. It means that both data and processing are supported by multiple interconnected nodes. It seems reasonable to assume, and in our case is necessary, that most of processing nodes are linked by a high-speed network. This is typically assumed to be a local network, a 1Gb/s Ethernet most often these days. An important new twist is that the nodes the network should link, are mass-produced. They can be cheaply available computers, workstations, PCs… in this way in large numbers. They also often pre-exist the distributed system to build-up. Finally a node role can be largely alternative: as data server, or as the client, or as the application tier… All together, such configurations, proposed by prominent US researchers already a while ago, e.g., from UC Berkeley, [Culler1994], seem today the most efficient practical approach. If not the only one not utopian for most users, by their unbeatable price-performance ratio. Needless to stress, they have triggered a growing interest, especially in recent years, at highest decisional levels [President1998]. The literature designated such configurations as multicomputers, or as networks of workstations (NOW) [Culler1994]. More and more often, one is also buzzwording about the peer-to-peer architecture, and most recently, about the grid computing. Finally, IBM is pushing the concept of autonomic architecture [Gibbs2002]. The distributed architecture potentially meets also much better the goals of data reliability and high-availability. Data can be mirrored or partitioned over multiple nodes. Unavailability of a node still leave available all the data values, i.e., provides the high-availability of the data, through the access to the mirror, at the expense perhaps of some throughput deterioration, if both mirrors were regularly in use. In the case of partitioning, the unavailability of a storage node does not block access to other parts of the collection. Redundant partitioning with parity data may further provide the high-availability as the mirroring with much smaller storage overhead [Litwin2000]. The keyword scalable in an SD-architecture is more novel. It appeared in early 90s & basically means that performance of data unit access should be independent of data volume. One is often talking about the flat scaleup. For a relation or file constitution or scan time, it means that this time should be a linear function of the IST-2001-32429 ICONS Intelligent Content Management System page 55/86
  • 56. Intelligent Content Management System 1.15 The ICONS Distributed Processing Organisation April 2002 size at worst. Likewise, this property is often termed linear scaleup. If the scan time, or more generally, an operation time, becomes too long because of the size of the data collection it operates upon, the speed-up resulting from a partitioning of the collection over more nodes should be linear as well. While all these goals are clearly in theory a wishful thinking, research has proven that they are often reachable in practice. The goal of scalability puts new requirements on the distribution management, with respect to more traditional architectures. Traditionally the distribution was designed for some fixed collection of data server nodes, often called cluster, [Gray1996]. Any cluster, at some level of scale-up, must progressively fulfil its storage and CPU capabilities and start presenting the limitations of a centralized system. This must adversely affect the goal of scalability. The new and only way out is that the data and processing capabilities are dynamically distributed over the appropriate collection of nodes. The collection may need to scale up, in the number of nodes, or less often, scale down. Research is active these days to investigate the underlying technical issues. A probably most advanced trend for building an SD-architecture are techniques for scalable distributed data structures (SDDS)s. This concept has appeared in early 90s, [Litwin1993] and is actively investigated since. Dozens of references are available at CERIA Web site [CERIA]. An SDDS is a new type of a data structure that dynamically partitions the application data over a collection of available server nodes. The number of servers increases with the data size, the distribution itself is transparent to the application. The data may remain entirely in distributed RAM or at local disks. The partitioned data can be also mirrored for high availability or provided with the parity data for this purpose. The CERIA team has widely recognized competence in SD-architectures based on SDDSs. A number of technical papers is available at [CERIA]. Research co-operation with HP Labs in Palo Alto and IBM Almaden Research led to three US patents (see IBM Patent Repository through http://www.ibm.com/). Recently, in March 2002, CERIA hosted an international workshop on Distributed Data & Structures (WDAS-2002). 1st known prototype of an SDDS manager was also developed by CERIA. A version is available for public non- commercial download at CERIA Web site. Its allows for very large data sets in distributed RAM with demonstrated data unit access performance of hundred times faster than to the disk. This know-how and performance should be crucial to the ICONS efficiency. It will be used by CERIA to develop the ICONS SD-architecture. It is planned to be based on SDDSs. More precisely, it should obey the following principles we now overview. The ICONS SD-architecture should be multi-tier. The ICONS private and permanent data should be stored at SDDS-server nodes, servers in short. The application agents, whether dealing with knowledge or database management should interact with SDDS client nodes, clients in short. The servers manage data storage (data buckets) and scalable distributed partitioning. More precisely, an overloaded server may split its bucket evacuating a part of it, usually a half of the data, to another node allocated dynamically. The main goal of this process is to keep the data for processing in the distributed RAM. The corresponding performance gain with respect to the disk storage (and centralized or cluster processing of scaling data) should provide to ICONS application data processing a leverage that crucially lacked to previous attempts in the domain. The clients are not made aware of the splitting process. Each client has an image of the data distribution, not necessarily the actual one. The client uses the image to issue the key queries. Such queries address (search, insert, update, delete) data units with identifiers (keys): records, tuples… Since its image can differ from the actual one, the client can send the query to an incorrect ICONS server. All servers have therefore the capability to recognize such a query and forward it towards the server that could be the correct one. This process should ultimately, possibly in at most few hops, to the correct server. This one processes the query. It also sends a specific message to the client, termed the Image Adjustment Message (IAM). The client uses this message to adjust its image. It still may be not the actual one. However, at least the same addressing error should not happen twice. In addition to the key queries the ICONS SD-architecture should support the scan queries. A scan addresses in parallel all servers in some data range, or, ultimately, all the servers. The processing time is then basically bound by the size of data collection at each server, instead of the entire data size. As this size remains fixed, the scalability should be largely attained. The RAM processing speed should add up to new levels of performance in processing of the complex operations. One new problem with the scans is that client may not know all the servers it should address. Hence, it may send the query to only some, but not all. The servers should forward the query to those who did not get it. The process should guarantee that each server gets the query once and only once. The client gets replies. There are several policies for organizing that reception to avoid the client’s overcharge. Furthermore, the client has then the choice between the probabilistic and deterministic termination protocols. The former means that the client terminates IST-2001-32429 ICONS Intelligent Content Management System page 56/86
  • 57. Intelligent Content Management System 1.15 The ICONS Distributed Processing Organisation April 2002 when no further reply comes after some time-out. The latter corresponds to a subsumption algorithm that guarantees that all replies were received. The servers should also guarantee the high-availability. In ICONS it should be done by providing the parity data to the groups of servers. A group with the parity should then be able to transparently tolerate k ≥ 1 unavailable servers. The degree of protection k should scale-up transparently with the collection size. These properties will be provided by a variant of erasure correcting codes derived from the well-known Reed-Salomon error correcting codes [Litwin2000a]. There are various choices for the message passing between clients and servers, as well as for the system architecture at each node. Those will be analyzed during further work. As the overall assumption, one will use whenever possible standard and popular components. Hence, for the communication, one should use the TCP/IP stacks, and faster UDP messaging, unicasting and multicasting, for service messages, with a dedicated flow control when needed. Likewise, a multithread processing at each node seems the best basis as well [Diene2000]. Summing up, the ICONS SD-architecture should offer a number of novel features, to accommodate stringent performance requirements. These features should allow for the practical acceptance of the project results, as performance is then the key need. 8.2 The ICONS distributed processing optimisation and load balancing The distributed processing optimization for a data management system at the gross architecture level passes traditionally by the load balancing among the nodes and the inter-query optimization on the clients and servers [Ozsu1999]. Main reason is that at this level, the semantic of a query is unknown, hence intra-query optimization may only be quite general. The inter-query optimization passes then by possibly executing a query, while another query is waiting for a resource, especially the network transfer. The most widely accepted approach is the organization of the client and server processing as threads manipulating queues. We adopt this approach as the basis for the ICONS SD-architecture as well, for both clients and servers. More in depth, there should be a query queue at the where an application leaves its request at the client. The request consists of the query and data or a local pointer to. This queue should be read by a number of threads which remains to be determined for a given client. Each thread processes a query, finds the addressed server(s) and places the query in some internal send queue. Its role temporarily ends up by the request(s) to the sockets to send-out the query, using UDP or TCP/IP messaging depending on the case. Other threads may continue the data processing during this time, hence realizing the client side intra-query optimization. At the server, all incoming requests are to be placed in the listen queue. Several threads process this queue and search or update the storage. Any data to return, as well as IAMs if any, are sent out. A thread working in a pipeline mode can then be blocked while its current reply is being sent. The other threads may continue the data processing during this time, hence realizing the sever side intra-query optimization. Several threads at the client listen to the network buffers, and transfer ASAP the incoming replies into a reply queue. In the case of an SDDS this approach is particularly useful, as a key query may be sent to one server while the replies comes from another one. Other threads explore the reply queue, match it to the query queue and finally reply to the applications. Some processing may be in pipeline mode making a thread waiting blocked for next data item. Other reply can be processed in the meantime during this time, hence realizing the other facet of the client side intra-query optimization. Likewise, the servers in ICONS SD-architecture should possibly support the load in adequacy with processing capability of each server. Numerous research results, especially on the load balancing in a parallel DBMS show that the processing load balancing usually follows the data load balancing [Vocking2002]. Sophisticated and complex research attempts of processing load balancing by analyzing query frequency, resources consumption etc. did not lead yet to any practical acceptance. An SDDS may then allow for the load balancing in at least two ways. Those follow the similar ideas for a parallel DBMS. These offer the hash partitioning of the application data, e.g., DB2, or range partitioning, e.g., SQL Server, or both, e.g. Oracle. The most used one is the hash partitioning. A well performing hashing randomizes the data location and renders a server load naturally uniform. One can expand it into the double or triple hashing with symmetric or asymmetric record placement schema [Vocking2002]. In our case, this type of balancing, translates to a scalable distributed hash partitioning scheme. An LH* type of scheme appears best candidate [Litwin1996]. Especially, since variants of this scheme are known that provides also for the high-availability [Litwin2000] and others (see [CERIA]). IST-2001-32429 ICONS Intelligent Content Management System page 57/86
  • 58. Intelligent Content Management System 1.15 The ICONS Distributed Processing Organisation April 2002 The range partitioning is another common type of partitioning. This one leads to an ordered collection of data. In our case, an RP* scheme appears best candidate. Such schemes provide at present ranges such that each server stores about the same number of data items. As for the hashing, this property usually provides good load balancing. However, the opposite is also naturally more frequent. Consider for instance that the range partitioning concerns a phone book of a region with partitioning key being the city and that some cities have important administrative centres, whose phones are retrieved therefore much more often. That would lead to more processing load of the servers with the ranges including those cities. The solution we plan for the ICONS SD-architecture consists in a modification of the RP* schemes, [Diene2000], to be selected later for the ICONS needs, so that ranges on overloaded severs, are made dynamically smaller. For instance, they are halved. Such a decision can be made locally by each server, on the basis of some statistics with respect to those from other servers. Through the splits triggered by the range change, the data items of an overloaded server spread on several servers. The processing load re-balances accordingly. Likewise, the under-loaded servers could merge. Summing up, distributed processing optimization and load balancing are complex matters. At SD-architecture level in particular the query semantics is unknown. One should concentrate on the inter-query optimization and the data load balancing, [Ozsu1999], [Vocking2002]. The ICONS solution for the latter should pass then through the concepts of threads co-operating through the queues, at both servers and clients. It will also be based on the load balancing, generalizing for the scalable distributed environment the more traditional widely-used techniques of data partitioning. 8.3 The ICONS distributed workflow process communication and synchronisation One of the most challenging features of WFMSs is workflow interoperability. Such interoperability enables two or more workflow engines to communicate and work together to co-ordinate their work. There are several different models of workflow co-operation, namely: the chained process model, the nested subprocess model, and the parallel synchronised model. Figure 13. Models of workflow co-operation. In the chained process model after one workflow process is completed, another workflow process inherits the processing and starts. This is the most basic model. In the nested subprocess model, one workflow process has a part of its processing done by another workflow process. In the parallel synchronised model, two workflow processes that are proceeding independently become synchronised at some point and exchange information, and IST-2001-32429 ICONS Intelligent Content Management System page 58/86
  • 59. Intelligent Content Management System 1.15 The ICONS Distributed Processing Organisation April 2002 then continue independently. When an activity reaches the synchronisation point, it waits for the other to arrive there, and then they exchange information. On the basis of the WfMC’s reference model, and the Interface 4 standard described in [WfMC1996], the Object Management Group (OMG) had developed JointFlow specification. JointFlow defines a framework for distributed workflow applications in the world of business objects ([OMG1998]). This specification enables interoperability of workflow process components, monitoring and workflow execution, and association of workflow components to resources involved in a workflow process. In the next step a simple workflow access protocol (SWAP) has developed. SWAP was envisioned as a binding of the jointFlow object model and related WfMC standards to an HTTP-based interaction protocol. Finally, in 1999, WfMC has presented the Wf-XML specification. This specification enhances some of its predecessors’ capabilities, providing: ! a structured and well-formed XML body protocol that consists of message containing headers and data ! logical interact model with synchronous, asynchronous, and batch capabilities ! independence from transport mechanisms ! easy extensibility through the use of XML and dynamic workflow context data. In a synchronous messaging a process A can may wish to initiate a sub-process and suspend its normal processing until that sub-process completes. In an asynchronous messaging, the initiating process sends a request to the enacting process. The enacting process then sends only an acknowledgement back to the initiator, informing that the request has been received. At some later point in time, the enacting process sends a response to the initiating process. The initiating process sends then an acknowledgement back to the initiator, informing that it received the response. In the batch messaging it is possible to place multiple Wf-XML interaction in a single message. In the ICONS project we will implement Wf-XML specification and used e-mail to transport XML workflow messages. IST-2001-32429 ICONS Intelligent Content Management System page 59/86
  • 60. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 9. Demonstration of ICONS prototype capabilities 9.1 The “Newly-associated States Best Practices” Portal 9.1.1 Introduction There is a proliferation of Web content management systems in various application realms. The integration of internal information repositories with the external data sources, is the current trend in the architecture of management information systems. Examples of active development in the areas of government, energy industry, and general B2B systems are presented in [Ambite2001, Bouguettaya2001, Elmagarmid2001, Mecella2001, Shim2000]. Although the current systems are designed according to disciplined life-cycles based on various design methodologies, there exists a clear need to formulate a life-cycle and the underlying methodology for development of large scale, knowledge-based content management systems. Such methodology must be substantiated by at least a pilot development of an application based on an intelligent content management system. The novelty of the ICONS project within the realm of this objective is exemplified by the following solution characteristics: 1. Specification of a prototype life-cycle and the underlying methodology for design and development of the intelligent content management systems applications. 2. Demonstrating the viability of the ICONS architecture and application development methodology by developing of a pilot knowledge-based content management application. In terms of project organisation, all this corresponds to Objective 4 of the ICONS project, i.e. to develop an analysis and design methodology for large, knowledge-based content repository systems. ICONS research results, especially those related to the ten technologies identified in Section 4 as ‘to be developed’ will be demonstrated both at application level and at methodological level. The planned work (WP7), includes three tasks and corresponding deliverables: T1-> D35, T2->D24, and T3->D25. T1 has already been started. Indeed D35 “Conceptual analysis of ‘NAS Best Practices’ portal” will be developed first. D35 will contain the essential requirements for the ICONS prototype. These requirements will provide relevant “attraction” points for technology developers, active in other WPs. Of course, during the last semester of the project, these same requirements, possibly updated, will serve as basis for the development of the prototype (D25). Another basic input will be provided by D24 “The knowledge-based content management application design methodology”. For the sake of being specific, the pilot application is first described. The specific objectives of ICONS prototype portal and its pilot application ‘NAS Best Practices’ are: Development and publishing for general use over Internet of a knowledge repository concerning procedures, management practices, and “best practice” projects funded by PHARE, ISPA, and SAPARD funds. The knowledge repository is to contain public information to be made available to all interested parties over the Internet. [ICONS D02] 9.1.1.1 NAS By “Newly Associated States (NAS)” is meant in fact the ten candidates to EU membership from Central and Eastern Europe (CEE), see [Enlarg-Report-2001]. These candidates are: Bulgaria, Czech Republic, Estonia, Latvia, Hungary Lithuania, Poland, Romania, Slovakia and Slovenia. “This year’s Regular Reports and the present stage of the accession negotiations do not yet allow the Commission to conclude that the conditions for accession are fulfilled by any of the candidate countries. Among the twelve negotiating4 countries, ten have target dates of accession compatible with the Göteborg timeframe. 4 “The Copenhagen political criteria continue to be met by all presently negotiating candidate countries. Turkey still does not meet these criteria.” (Political criteria/Conclusions of [Enlargement-Rep2001]. IST-2001-32429 ICONS Intelligent Content Management System page 60/86
  • 61. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 The Union should therefore be prepared to conclude accession negotiations by the end of the Danish Presidency in 2002, in view of accession in 2004, with all countries meeting the necessary conditions. Necessary administrative preparations inside the Institutions are already under way and should be continued. (Conclusion/§4). “The 2002 Regular Reports will examine whether the candidate countries will have, by accession, adequate administrative capacity to implement and enforce the acquis.” (Conclusion/§5) If we look at such a regular report, e.g. for Poland, we will see that progress towards the adoption of the acquis is examined in 29 chapters. Here is the list of examined topics: 1: Free movement of goods 16: Small and medium-sized enterprises 2: Free movement of persons 17: Science and research 3: Freedom to provide services 18: Education and training 4: Free movement of capital 19: Telecommunications and information technologies 5: Company law 20: Culture and audio-visual policy 6: Competition policy 21: Regional policy and co-ordination of structural instruments 7: Agriculture 22: Environment 8: Fisheries 23: Consumers and health protection 9: Transport policy 24 - Co-operation in the field of justice and home affairs 10: Taxation 25: Customs union 11: Economic and monetary union 26: External relations 12: Statistics 27: Common foreign and security policy 13: Social policy and employment 28: Financial control 14: Energy 29: Financial and budgetary provisions 15: Industrial policy Plus : Translation of the acquis into the national languages Table 9. Checklist of the acquis (chapters in Regular Reports). In [Enlargement-Rep2001-A] it can be seen that for Poland 11 chapters were still in negotiation in 2001. 9.1.1.2 Phare, ISPA, and Sapard “During the period 2000-2006 financial assistance from the European Communities to the candidate countries of Central and Eastern Europe will be provided through three instruments: the Phare programme (Council Regulation 3906/89), ISPA (Council Regulation 1267/99) and Sapard (Council Regulation 1268/99)...” The ten countries are listed above. Turkey, Cyprus, and Malta have access to other funds (namely MEDA). Notice that Phare funds exist since 1989. A synthetic, very simplified view of Phare is given in the following table. We skip the other two instruments as to remain focused on ICONS project and because Phare is the instrument which is the more important, and documented. Item Phare Aim/name To assist the candidate countries of central Europe in their preparations for joining the European Union. Budget For the period 1995-99, funding under Phare totalled roughly EUR 6.7 Billion and covered fifteen sectors, the main five of which were: infrastructure development of the private sector education, training and research environmental protection and nuclear safety agricultural restructuring. The revamped Phare programme with a budget of over EUR 10 Billion for the period 2000-2006 now has two specific priorities, namely: institution building, financing investments. [EU-Glossary] Instrument(s) Accession Partnerships National Programmes for the Adoption of the Acquis (NPAAs) ; Regular Reports. Reforms Phare exists since 1989. In 1997, important reforms were introduced (decentralisation / deconcentration). An Extended Decentralised Implementation System is currently being prepared (EDIS). New approaches should help the countries to prepare for a smooth transition from pre-accession assistance to Structural Funds. Web sites Phare: http://europa.eu.int/comm/enlargement/pas/phare/index.htm Tenders:(EuropeAid): IST-2001-32429 ICONS Intelligent Content Management System page 61/86
  • 62. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 Item Phare http://europa.eu.int/comm/europeaid/cgi/frame12.pl Control of the EC ex-ante Main actors National Aid Co-ordinator (NAC) EC Delegations National Authorising Officer (NAO) DG for Enlargement National Fund EuropeAid Cooperation Office (formerly the Implementation Agencies (IAs) SCR) Central Financing and contracting Unit (CFCU) Final beneficiaries (institutions, municipalities, ministries). Practical Guide The main features of the Practical Guide fall into three categories (simplification and harmonisation, increased transparency and more rights to companies participating in tenders, and eligibility criteria and other essentials) -- for Sapard only applicable to procurement. Three types of contracts: services, supplies, works. Procedures (+/- complex) vary according to type of contract and value. See: Practical Guide to Phare, ISPA and SAPARD Contract Procedures at http://europa.eu.int/comm/enlargement/pas/phare/procedures.htm#6.1 Table 10. Overview of Phare. 9.1.1.3 “Best Practices” To become member of the EU, candidate countries have to implement a large number of reforms. To help them funds are especially made available to them by present Members States though the central services of the EC, located mainly in Brussels and ‘deconcentrated’ services i.e. in EC delegations. Since Phare exists, thousands of projects have been tendered, contracted, implemented, assesses, and audited. The fact that organisational, and procedural context of these projects evolves is an indication that lessons learned by various actors have been, explicitly or not, transformed into knowledge, and eventually changes in rules and procedures. It is obvious that Phare programming (see [Phare_Review_2000]) is complex: it spans at least four years and it concerns multiple institutions and responsibility functions. Even if we limit ourselves to two phases: 1. “Implementation -- Tenders -- Contracts and Management “ 2. “Monitoring and Assessment Reports” It is clear that large amounts of information, structured or not, quantitative or not, could be considered as prime material for our prototype application of ICONS. Here are some examples of best practice, which could be supported by our prototype: Elements of context Relevant Knowledge (adapted on context!) Steps in Main actor/unit Elements/questions for Best Practice Project Design IA and Beneficiary (see[Phare_Review_2000, p.44] for main requirements) Which chapters/sections of the acquis are relevant ? Criteria for mixing or separating supply with/from services. How to estimate necessary budget and duration (study, tendering, implementation)? Technical specification or terms of references by ad hoc tender expert or by Beneficiary. Big projects or smaller numerous ones? Tender preparation CFCU Variants allowed in tenders? Clarification meeting desirable? Which sections of Practical Guide apply ? Visit of premises by tenderers to be organised (instead of more detailed specifications) How to formulate evaluation criteria for service contracts? Tender evaluation CFCU Composition of evaluation committee, duration of evaluation Prequalification CFU & Beneficiary Optimal length of the “shortlist” Tendering Tenderer How to evaluate own strong and weak points, compare with other shortlisted firms ? Contracting CFCU Addenda if any, payment schedules; guarantees, certificates or origin: sources of administrative problems ? Project realisation Beneficiary Demanded reporting (frequency, details, languages) Financial control CFCU Budget was correct ? Assessment of results Beneficiary and EC Assessment costs, duration, and results. Assessment of results Contractor Added value for “goodwill”, individual experts; need for developing IST-2001-32429 ICONS Intelligent Content Management System page 62/86
  • 63. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 Elements of context Relevant Knowledge (adapted on context!) other/new skills. Table 11. Best practice taxonomy. IC O N S P o rtal D A TA SO U R CE S EN D USER S D ATA collection IN FO presentation processes & distribution EC represen tatives central / local O ther K B ases N ation al A ll M etadata / O ntologies ? coordin aton & available Usable deciders and End K nowledge Q uery In form ation & K n owledge relevant User C FC U M gr Base data In terface & Context of O riginal interest IAs D ocuments --- D B E xtracts queries control C ontrol Data & R eports centralised / Ben eficiaries feedback decentralised IN FO quality & Knowledge control T en derers C on tractors enhancem ents E xperts Know ledge W orkers System M anager Figure 14. Main Concept of ICONS portal for NAS Best Practice. The main functional requirements are outlined hereafter, by considering the different ‘actors’ in succession. It must be underlined that during the elaboration of D35, these requirements will be made more precise and also adapted to the data which will be actually be made available. (See Remarks below). 9.1.1.4 Actors (1): End Users These end users will be or belong to the institutions as listed in Table 10. End Users will be identified personally and by their role, if this one is not unique. Possible outputs of the system are relevant parts of: 1. NPAA, NPD, Regular Report & Negotiations 2. Funding Programmes 3. Community legislation in force, National legislation 4. Fund request procedures 5. Forms / templates, Contact points 6. Success stories 7. Call for Tenders (including technical specification or terms of references (ToRs)) 8. Contracts and addenda, if any 9. Project implementation reports (from contractors) 10. Project assessment reports (from independent auditors/assessors)... selected on the basis of (assumed) end user interests combined with his own current indications. A second kind of outputs of the system are advanced queries in DB describing projects and their progress. EC DBs like DESIREE and PERSEUS, or their successors, plus National DBs like newly developed PENELOPA in Poland. 9.1.1.5 Actors (2): Knowledge Workers Application Developers and Maintainers On the basis of the common structure of the text documents (e.g. ToR, Tender forecast), they will establish the links between classes of documents representations, and classes of relevant and queriable DB (when permitted). The management and maintenance of the necessary Ontologies is an essential part of their work. IST-2001-32429 ICONS Intelligent Content Management System page 63/86
  • 64. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 In particular, how end users own knowledge (or experience) in their domain of interest can be amplified by the system along the successive interactions, and how this knowledge can made reusable by other users facing similar problems is of the outmost importance. Data Collectors Data sources will evolve, URLs are modified, decision centres can change (centrally, locally), therefore means to detect these changes have to be established. The need and permission to make cache copies of original documents (to guarantee permanent access) have to be established. Therefore, co-operation protocols need to be defined, especially in case classified information has to be accessed, either as such by authorised end users, and/or only through statistical queries (reductions). 9.1.1.6 Actors (3): System Administrators Their functions are classical. They will be responsible for giving permissions to access the specific knowledge/data to authenticated actors. The management and monitoring of all security and availability aspects of the system will be in their hands. 9.1.2 Key Issues for Application Development Reminder 1: ICONS => Knowledge based access to pre-existing, distributed information in various forms (web pages, databases, legacy information systems, etc.) Reminder 2: in general, Knowledge = “understanding gained from experience”, [Weidner2002, p. 18]. Hence: a working ICONS has to be developed and put into operation in a progressive manner: knowledge needs knowledge to grow, and this growth will be more sustainable if the right information is effectively identified and made accessible in the most efficient way. 9.1.2.1 The Idea Knowledge growth can be viewed as spirals made of growing from Knowledge Life Cycles (KLCs). Initially a basic ontology is selected as to seed the system together with a minimal set of information sources and reference documents. The following cycles will develop and consolidate what has been integrated in the previous cycles: • Ontology cycle: enrich the initial ontology (domain of interests such as chapters of the acquis, technologies (IT, environment, civil engineering, agro-bio-technologies, etc.), time models (dates), space (countries, regions, borders, rivers...), programmes, actors, projects etc.); add connections between these main concepts.5 Ontology is subject to validation, hence status of ontological objects has to be managed too. • Knowledge extraction: establish mechanisms (Intelligent agents) (i) to identify existing & accessible sources of information, (ii) to extract knowledge from these sources using standardised RDF, and (iii) to populate an ‘extensional’ base of “facts” (EDB). • Knowledge derivation : establish knowledge production rules (an ‘intensional’ DB) to derive additional knowledge from the base of facts (EDB + already derived and integrated facts). • Intelligent access: by combining informational goals expressed by end users, to deliver relevant facts and supporting information (original documents or parts of ). Finally, current achievements will be assessed; extensions or improvements proposed. It must be underlined that the underlying workflow and co-operation mechanisms between human knowledge workers and automated agents (hence also their developers) constitute an ubiquitous challenge for this project. Schematically, a prototype development cycle is 5 See references given in Section 9.2.1, e.g. [Holsapple2002]; initial concepts can be drawn form EU and Phare glossaries published on the Web (e.g. [EU-Glossary], [Phare-Glossary], and [PG-Glossary]). IST-2001-32429 ICONS Intelligent Content Management System page 64/86
  • 65. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 Initial C onfiguration C ontents sourc es: : well known U R Ls (<50) + reference doc um ents. R elated Intelligent A gents : none O ntology: m ain topic s IN IT K P roduction rules: em pty K W orkers : none (only the adm inis trator) W f: to m anage K W orkers R E LIA B LE K nowledge K nowledge K nowledge K nowledge K nowledge P R O D U C T IO N C LA IM S V A LID A T IO N IN T E G R AT IO N Feedback: adapt/c hange/extend... O ntology: refine, relate, cut M ore C ontents s ources Intelligent A gents: K extraction K production rules K workers: m ore roles W f... to s upport change proces ses: ontology, agents, rules, sourc es . Figure 15. The Knowledge life cycle of the NAS Best Practices Portal. N.B. After its validation, knowledge is ready for integration, i.e. use and dissemination within the users community. It is not restricted to one organisation or to be of “organisational” nature. Four cycles will be necessary to reach a situation where “intelligent access” will be really meaningful in the prototype application. Key Technological Issues The key issues are further identified in Table 12. Technologies in grey background are those “to be developed” in ICONS project. Development Cycle focus Technology Ontology Knowledge Knowledge Intelligent access extraction production Ontology Model Manager Time modelling has to (application domain (OMM) = functions to create, available from the time, to be maintain, and use knowledge outset; distinguished from representation structures Example. “concept of system time) + formal knowledge NPAA exists since representation pertaining to a yyyy.mm.dd”. particular application domain. Structural Knowledge Link predicates can be Navigator (SKN) time dependent. makes OMM available to other KMS modules Content Categorisation Essential when Engine (potentially integrated metadata are missing, into the ICONS architecture) or to assess quality of original information. Datalog Inference Engine Seamless interface with Seamless interface with Fast queries imply in- ontology = the ontology = the core operations on metadata of the EDB, metadata of the IDB, extensional database the extensional DB the intensional database (EDB). which is used to collect (IDB), which is used to facts. deduce facts. Intelligent Workflow For each cycle a specific workflow has to be developed, and implemented. Manager Cooperation between concurrent workflow processes has to be foreseen. Interactions between knowledge workers and Intelligent agents will depend on ICONS system time. Semi-structured Content Agents, to be called Uniform view on a Integrator intelligent content large quantity of (knowledge-based wrapper integrators, will provide various documents, is technology?) uniform view on precondition for relevant documents; “intelligent access” also need seamless interface with ontology IST-2001-32429 ICONS Intelligent Content Management System page 65/86
  • 66. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 Intelligent Agent OMM-A Ground-A Flying-A IA-A (Intelligent Development Environment when ontology model especially relevant to to produce derived facts Access Agent ) (a different specific agent will becomes complex: care populate and maintain (‘on the fly’, when corner-stone of be central during early KLCs) must be taken not to the EDB6 (search for needed) prototype HCI; time undermine integrated K relevant information dependent, possibly sources) and generation recurrent and/or of uniform descriptors recursive queries are to be captured and managed HCI Personalisation Engine Dialogue with IA-A essential to end users (not ICONS experts) Electronic Form Manager Same relevance as above. Content Presentation Ancillary service. Manager Knowledge Map Graph Visible interface to Manager OMM. Structural Knowledge Graph Visible interface to Manager SKN Process Graph Manager Again a challenge, because many users will need to cooperate, and will possibly be involved in several graphical design and KM related processes at the same time. monitoring the state of a particular process instance Load Balancing Algorithms These three technologies should be totally transparent to end users and to knowledge workers. Distribution Optimisation Algorithms Except that when usual performances of the (distributed or not) system are not possible, than human users Scalable Distributed Data (and possibly Intelligent Agents) should be duly informed. Structure (SDDS) Distributed Workflow To support the Intelligent Workflow Manager Communication During the early cycles of ICONS prototype, it is better to ensure first good cooperation between workflow processes monitored by a common Workflow engine than to tackle heterogeneous Workflow engines. Table 12. Key technological issues for development of the NAS Best Practices Portal. 9.1.3 Key Success Factors Main one is the capability to provide quality information (or the right document), without duplication, nor omission. For filled in documents (e.g. Tender (forecast), ToRs, Final Reports) it is of course critical to link them together, to corresponding assessments, and to records in the DBs. As an example, imagine some Customs officer (Beneficiary expert) has to create the terms of reference for a MIS development and corresponding training to introduce Taric (European Customs code) in his administration. With very high probability there are similar developments completed or in progress in other candidate countries. Accessing documents related to similar projects, if permitted, would be very helpful. If, in addition, he can access assessment report(s), he can deduce that similar project X was well “tendered” and realised, except maybe for the duration of the project (manifestly too short, if it can be observed that final duration was 3 months longer than in tender documents). If nothing similar is found and Customs officer knows that similar project (Phare funded) exist, than it should be possible to him to signal this to “knowledge workers” in such away that, next time, the system will be more helpful. In case the systems is not allowed to access some known information sources, it should signal it to the user, maybe with hints about the way to reach this information. 9.1.4 Remarks If the external data are limited to “public data”, it is likely that some of the outlined functionality has to be reoriented. Project team has to establish contacts with EC officers (in DG enlargement and EC delegations) to assess the interest and feasibility of the concept outlined above. It is assumed that, for the prototype, language at the user interface, and for textual contents is only English. 6 Records of the EDB are also known as “ground axioms”. IST-2001-32429 ICONS Intelligent Content Management System page 66/86
  • 67. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 9.2 The Knowledge Management System Design Methodology 9.2.1 Approaches to Knowledge Management methodologies Knowledge Management is a very broad area of human interests. It covers various activities from those of business and managerial nature to very technical problems associated with building software systems. The result of these activities is a knowledge management system (KMS). Such a system can be treated as a group (corporate) memory [Kuhn1997] that involves not only a software system, but also all the associated business processes and methodologies [Abecker1999]. As it was presented in previous chapters, the software part of KMS involves many novel technologies and a complex architecture in which six main groups of functions are identified: domain ontology, knowledge dissemination, content integration, knowledge security, KMS actor collaboration, and content repository management. Each group has its own requirements typically referring to a particular technology. Each of these features or technologies needs a specific approach to its analysis, design and implementation. On top of that we have the need to organise appropriately all the business processes (workflows) associated with acquiring, storing and retrieving knowledge. In fact IDM -- our development methodology to be introduced below-- will consider best practices and specific methodologies, and place them within a common framework (methodology architecture). Design, or rather development of KM systems is thus a sophisticated process containing many activities and tasks that result in many complex products. This raises an obvious need to introduce a systematic arrangement of these activities, and create guidelines for creating the products, i.e. to specify a KM system development methodology. Such a methodology can be seen as Knowledge Management metaprocess [Staab2001] in contrast to the Knowledge Management process that is one of elements defined by it. So far, most of the research in this area was limited to design methodologies for specific KMS features, as Domain Ontologies, Content Repositories or Knowledge Dissemination. Other approaches have focused only on the managerial issues associated with introducing a knowledge management system to business organisations [Tiwana2000, Dieng1999]. Especially broad research has been made in the area of methodologies for ontology construction. Various methodologies have been proposed [Uschold1995, Sure2001, Maedche2001, Holsapple2002]. An interesting overview of ontology methodologies and their analysis against IEEE Standard for Developing Software Life Cycle Processes (IEEE Std 1074-1995) can be found in [Lopez1999]. With such a variety of methods, there also exist efforts toward unification [Uschold1996]. A very interesting work presented in [Firestone 2001] is an exception from the rule of concentrating on one feature of Knowledge Management. It is an approach to describe a full lifecycle methodology based on iterative process with definition of all the system development disciplines (business modelling, requirements, analysis and design, implementation, project management, etc.). This process can also be seen as a modelling process, where models are constructed incrementally by refining previously built ones (see: [Studer1998]). This proposition seems to be a good start for constructing a comprehensive methodology for the ICONS-based KMS development projects. 9.2.2 Requirements for defining a comprehensive KMS development methodology While developing any system, we perform certain tasks, use appropriate techniques, and produce specific deliverables that conform to the used technologies. All these activities are carried out by people playing various roles in a system development project. A Knowledge Management System (KMS) is not an exception here. Such a system contains a software system and a business system surrounding it. To define a methodology for building a KMS we should specify elements from the following three groups (see: [Henderson-Sellers1999]): ! technical process ! techniques ! notation (a modelling language). At a larger granularity, the process includes not only a methodology but also consideration of the people (organisational culture) and the tools (technology) which are available. The KMS development methodology should thus provide an integrating framework. All the projects based on it would use this framework by instantiating a particular part of the framework for their own circumstances. This means that “tailoring of the process” should be part of every KMS development project. 9.2.2.1 Technical process A very important feature of technical process is incremental (iterative) delivery [see: [Firestone2001, Studer1998]. This provides immediate feedback from the users and instantaneous verification of the employed architecture and technologies. Such a feature of the software lifecycle is very important when developing systems with the use of new and untested technologies (as is the case for the ICONS based systems). IST-2001-32429 ICONS Intelligent Content Management System page 67/86
  • 68. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 A project using iterative development has a lifecycle consisting of several iterations. An iteration incorporates a loosely sequential set of activities in business modelling, requirements, analysis and design, implementation, test, and deployment, in various proportions depending on where in the development cycle the iteration is located. Iterations in the inception and elaboration phases focus on management, requirements, and design activities; iterations in the construction phase focus on design, implementation, and test; and iterations in the transition phase focus on test and deployment. Iterations should be managed in a timeboxed fashion, that is, the schedule for an iteration should be regarded as fixed, and the scope of the iteration's content actively managed to meet that schedule. An iterative approach is generally superior to a linear or waterfall approach for many different reasons. ! Risks are mitigated earlier, because elements are integrated progressively. ! Changing requirements and tactics are accommodated. ! Improving and refining the product is facilitated, resulting in a more robust product. ! Organisations can learn from this approach and improve their process. ! Reusability is increased. Another feature of the process is requirement management, treated as systematic approach to finding, documenting, organising, and tracking a system's changing requirements. Requirements management can be formally defined as a systematic approach to both: ! eliciting, organising, and documenting the requirements of the system, ! establishing and maintaining agreement between the customer and the project team on the system's changing requirements. The employment of requirement management allows for clear distinction between requirements and allows for establishment of clear traces between the requirements and their realisations (design models, components). This prevents the projects from falling into the following difficulties: ! Requirements are not always obvious, and can come from many sources. ! Requirements are not always easily or clearly expressed in words. ! There are many different types of requirements at different levels of detail. ! The number of requirements can become unmanageable if they're not controlled. ! Requirements are related to one another and also to other deliverables of the software engineering process. ! Requirements have unique properties or property values. For example, they are not necessarily equally important nor equally easy to meet. ! There are many interested parties, which means requirements need to be managed by cross-functional groups of people. ! Requirements change. Managing functional requirements is important for KM type systems where the value to the user is not only the knowledge itself, but also the way of using this knowledge. The process should also put stress on constant verification of quality. It is important that the quality of all artifacts is assessed at several points in the project's lifecycle as they mature. Artifacts should be evaluated as the activities that produce them get complete and at the conclusion of each iteration. In particular, as executable software is produced, it should be subjected to demonstration and test of important scenarios in each iteration, which provides a more tangible understanding of design trade-offs and earlier elimination of architectural defects. This is in contrast to a more traditional approach that leaves the testing of integrated software until late in the project's lifecycle. Finally, the process should have also incorporated issues associated with change management. This is very important in projects (like those based on ICONS) which produce many products that change throughout the lifecycle. Co-ordinating iterations and releases involves establishing and releasing a tested baseline at the completion of each iteration. Maintaining traceability among the elements of each release and among elements across multiple, parallel releases, is essential for assessing and actively managing the impact of change. Controlling changes to software offers a number of solutions to the root causes of software development problems: ! The workflow of requirements change is defined and repeatable. ! Change requests facilitate clear communications. ! Isolated workspaces reduce interference among team members working in parallel. ! Change rate statistics provide good metrics for objectively assessing project status. ! Workspaces contain all artifacts, which facilitates consistency. IST-2001-32429 ICONS Intelligent Content Management System page 68/86
  • 69. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 ! Change propagation is assessable and controlled. ! Changes can be maintained in a robust, customisable system. 9.2.2.2 Techniques To define a development methodology we need to specify “who” does “what” and “how” it should be performed. This leads us to the definition of roles, activities, and most importantly – techniques. The most central concept in any technical process is that of a role. A role defines the behaviour and responsibilities of an individual, or a set of individuals working together as a team, within the context of a software engineering organisation. The roles are not individuals; instead, they describe how individuals should behave. The mapping from individual to role, is performed by the project manager when planning and staffing the project All the roles should have associated activities that define the work they perform. An activity is something that a role does that provides a meaningful result in the context of the project. An activity is a unit of work that an individual playing the described role may be asked to perform. The activity has a clear purpose, usually expressed in terms of creating or updating some product, such as a model, a class, or a plan. Every activity is assigned to a specific role. The granularity of an activity is generally a few hours to a few days, it usually involves one role, and affects one or only a small number of artifacts. Activities may be repeated several times on the same artifact, especially when going from one iteration to another, refining and expanding the system, by the same role, but not necessarily the same individual. Activities are broken down into tasks. Tasks fall into three main categories: ! Thinking tasks: where the individual performing the role understands the nature of the task, gathers and examines the input artifacts, and formulates the outcome. ! Performing tasks: where the individual performing the role creates or updates some artifacts. ! Reviewing tasks: where the individual performing the role inspects the results against some criteria. Tasks have associated Techniques, which present practical advice that is useful to the role performing the activity. Techniques range across project management through to detailed theories and practices for requirements engineering and system modelling. An interesting overview of techniques can be found in [Henderson- Sellers1998]. 9.2.2.3 Notation The third important component of a methodology is the notation for the products produced. This gives the development team a common language for communication. The existence of such a common language is very important for unambiguous communication when developing a very complex system as is the case for KMS. In a typical project that involves software development and business modelling we need to produce various artifacts that describe all the aspects of the system. The best way to present these aspects is to use a graphical modelling language (visual modelling [Simons1994]). Among groups of artifacts that can be represented graphically in a knowledge management system, are (for reference, see sections: 3.2, 4): ! Description of the KM business processes and workflows ! Overall architecture of the system ! Definition of the ontology ! Structure of the knowledge base ! Detailed analysis and design models for the software system ! Requirements for human-computer interaction ! Design of a suitable Human Computer Interface (HCI). Some products cannot be represented only graphically. These include: ! Vision - defines the stakeholders view of the product to be developed, specified in terms of the stakeholders key needs and features. Containing an outline of the envisioned core requirements, it provides the contractual basis for the more detailed technical requirements. ! Glossary - defines important terms used by the project. ! Supplementary specification - captures the non-functional system requirements that are not readily captured in a graphical form. Such requirements include: legal and regulatory requirements, and application standards; quality attributes of the system to be built, including usability, reliability, performance, and supportability requirements; other requirements such as operating systems and environments, compatibility requirements, and design constraints. ! Software architecture description - provides a comprehensive architectural overview of the system, using a number of different architectural views to depict different aspects of the system. IST-2001-32429 ICONS Intelligent Content Management System page 69/86
  • 70. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 ! Change request - Changes to development artifacts are proposed through Change Requests (CRs). Change Requests are used to document and track defects, enhancement requests and any other type of request for a change to the product. The benefit of CRs is that they provide a record of decisions and, due to their assessment process, ensure that change impacts are understood across the project. ! Software development plan - a comprehensive, composite artifact that gathers all information required to manage the project. It encloses a number of products developed during an inception phase and is maintained throughout the project. ! Development case - describes the development process that has been chosen to follow in the specific project. This product includes all the decisions associated with tailoring of a methodology in a project. All the above products can be represented in many different forms and notations. It is very important for a development methodology to define a common form of expressing them. The basic role of the methodology here is thus to prevent different teams from using different notations which could lead to many communication problems. A methodology should provide us with precise guidelines for producing coherent and unambiguous products. 9.2.3 The ICONS Development Methodology One of the important tasks of the ICONS project is to propose a development methodology for systems being produced on the basis of its results. We shall call this methodology the ICONS Development Methodology (IDM). Unfortunately, we cannot use directly any of the existing methodologies described in the first section of this chapter. Although none of them fulfils all the requirements presented above their best features will be reused whenever possible. The complexity and technological scope of ICONS based systems enforce us to develop a methodology that is very comprehensive and at the same time – common to all the paths of the development process. Said that we are convinced that the starting point for developing the IDM should be an existing methodology in the area of software development (see also: [Firestone2001]). During the course of the ICONS project, specific decisions have to made about the development process that has to be followed, notations used to represent products and techniques used to create the products. These decisions, in many cases, mean the creation of completely new approaches to these three aspects of the development methodology. We would postulate that the technical process and techniques of IDM be based on best practices of software development (see e.g. [DoD1997]). Some of them were presented in the previous section. This enforces our postulate of basing the methodology on an existing one. We should however seek for those that enable practising the best practices. This criterion is certainly met by several existing methodologies like ISO 12207 [ISO1995], RUP [RUP2002, Kruchten2000], OPEN [Henderson-Sellers1997], Adaptive Software Development [Highsmith2000]. The main challenge in this area is to choose from those practices that are applicable to KMS construction projects and possibly to describe new ones that might be based on experience from the ICONS project. Some of them are very general, like incremental development or requirements management. Others are specific to developing knowledge management systems and need certain amount of research efforts. We think that a good starting point would be RUP, OPEN and ASD. These three methodologies introduce an iterative, incremental software construction process that seems to be crucial for the construction of complex systems like KMS. A promising direction we plan to choose is to verify the applicability of adaptive process [Highsmith2000] in the knowledge management applications. The IDM needs also a comprehensive and common notation. We have to remember that we need to create a notation for very disparate models associated with various features of the knowledge management system’s reference architecture. However, before building a unified notation it is necessary to baseline the information contents of all the possible models that can be created when developing an ICONS based system. The notation should also take into account all the technologies used to develop the software system. An example of such approach can be found in [Connalen2000], where a notation for systems using different Web technologies is defined. Here again we think that a good approach is to start with an existing language for notation. The language should be already well known and spread throughout the software development and business modelling community. It should also cover broad aspects of system modelling (static structure diagrams, dynamic system behaviour diagrams, temporal diagrams, diagrams and models for describing requirements, diagrams for modelling business processes and workflows). It is also very important for the language to have extension mechanisms to add notational elements specific to KM domain. Currently we postulate that the best choice for the starting point in this area is UML [Booch1998]. The analysis of all the features in the KMS reference architecture (see section 3.2) shows that this language would be already applicable for modelling many of them. UML is also widely known, comprehensive in its definition of various models, and has flexible extension mechanisms (like stereotyping). However, it has to be stressed that the applicability of UML in the area of KM is not yet fully explored (for an attempt, see [Aksit2001]). It already can be seen that UML alone is not capable of IST-2001-32429 ICONS Intelligent Content Management System page 70/86
  • 71. Intelligent Content Management System 1.15 Demonstration of ICONS prototype capabilities April 2002 representing all the aspects of KM systems. There can be seen some promises from applying agent-oriented (as opposed to object-oriented as in UML) notations and methods [Iglesias1998]. One path of our research would be thus to explore applicability of different UML (and non-UML) models to KM. Another path would be to create an extension of UML for building knowledge management systems. This second path seems to be very interesting and challenging in view of the fact that KM uses very broad range of different technologies that often need very specific approach to their modelling. It is also very promising, as this new extended language could serve as a common way of communication for the KM community (and specifically – the ICONS community). IST-2001-32429 ICONS Intelligent Content Management System page 71/86
  • 72. Intelligent Content Management System 1.15 Conclusions April 2002 10. Conclusions 10.1 Compatibility with the stated ICONS project goals and objectives The relationships of the ICONS functional modules comprised within the ICONS project focus technological areas (see Figure 6) with the project objectives are shown in a cross-reference table (Table 13). Clearly demonstration of the ICONS prototype capabilities entails usage of all developed system features, therefore only the focus technological areas are marked for objective 4. All ICONS project objectives are met by the proposed system architecture. We are attributing more weight, than in the initial project proposal, to the procedural knowledge representation and the corresponding intelligent workflow functionality. This is the result of the on-going research performed by the consortium members, who agree, that the procedural knowledge pertaining to business processes is an important element of the learning organization intellectual capital. Also stringent new requirements are formulated with respect to workflow management platform, that are to support knowledge creating processes. ICONS functional modules The ICONS project objectives (Focus Tech. Areas) Objective 1 Objective 2 Objective 3 Objective 4 Knowledge Management X X Ontology Model Manager X Structural Knowledge Navigator X Content Categorisation Engine X Datalog Inference Engine X X X Intelligent Workflow Manager X X Semi-structured Content Integrator X X Intelligent Agent Development Environment X X X Human Computer Interaction (HCI) X HCI Personalization Engine X Electronic Form Manager X Content Presentation Manager X Knowledge Map Graph Manager X Structural Knowledge Graph Manager X Process Graph Manager Distributed Architecture X Load Balancing Algorithms X Distribution Optimisation Algorithms X Scalable Distributed Data Structure (SDDS) X Distributed Workflow Communication X Objective 1: Development of knowledge representation techniques and methodologies for a multimedia content repository. Objective 2: Development of user interface design and management tools meeting the requirements of the information architecture methodology Objective 3: Design and implementation of efficient algorithms for management of large, distributed multimedia content repositories Objective 4: Develop an analysis and design methodology for large, knowledge-based content repository systems. Table 13. The ICONS project focus technological areas and the project objectives cross-reference While demonstrating that the proposed ICONS architecture meets the stated project objectives, we concentrated on the project focus technological areas. Since enhancements that must be developed for the adopted content management functions also represent substantial specification and development effort, all such modules will be shown below, cross referenced with workpackages and their respective tasks. 10.2 Overview of the ICONS project development plan The overall project objective is to develop a mature prototype of an intelligent content management system (ICONS) supported by an application design and development methodology and a realistic pilot application providing the final verification platform. IST-2001-32429 ICONS Intelligent Content Management System page 72/86
  • 73. Intelligent Content Management System 1.15 Conclusions April 2002 The project plan comprises three principal phases; namely the theoretical research phase, the ICONS prototype construction phase, and the methodology and the pilot application development phase. The theoretical research phase comprising workpackages WP1, WP2, WP3, WP5, aims at integrating and extending existing research results relevant to the overall project objective. The research results will be presented in a series of reports and external publications. The objective of this phase is to provide a sound theoretical base for the ensuing phases of the project. The principal research directions aim at extending existing research results in the area of knowledge representation based on logic (disjunctive Datalog) integrated with the semantic data model approach represented by the RDF standard, as well as in the area of integration of distributed heterogeneous information resources. Additionally, the consortium plans to work in the area of advanced graphic user interfaces providing novel tools and techniques in the fields of information architecture and graphic representation of knowledge. The ICONS prototype construction phase comprising workpackages WP4 and WP6 aims at developing a fully functional system prototype exploiting the research results achieved during the preceding phase and providing a test bed for their evaluation and presentation. The ICONS prototype will be developed as an extension of existing software platforms to be selected during WP1. This approach will allow the consortium partners to concentrate on the novel aspects of the knowledge-based content management without the need to create the required software environment from scratch. The methodology and the pilot application development phase comprising the worpackage WP7 has two objectives: (i) to create an Internet portal comprising content and the corresponding ontologies of high interest to a large community of potential users, thus attracting attention and consequently securing growth of the selected application realm, (ii) to develop a design methodology for the knowledge-based content management systems. The proposed “NAS Best Practices” portal is to provide much needed information and practical examples of projects and procedures required by the EC adhesion process. The second objective has a general applicability to the fast growing knowledge management field The technical track of the project is divided into nine workpackages consisting of several tasks each. Each workpackage describes a coherent objective; the task structure details the steps necessary to reach it. The close collaboration between industrial and academic partners is secured by the fact that partners of the above types participate in the all workpackages. The work starts with the assessment of tools, standards and methods (WP1) relevant to project objectives. Since the aim of the project is to integrate and extend the relevant research results in the area of knowledge management and information integration, the stress is laid upon selecting the leading edge research results to provide the starting point for the project. The principal approach of the project is to extend the state-of-the-art technology in the realm of the multimedia content management with powerful knowledge representation and information integration capabilities by providing a fully functional software platform. Thus, in order to contain the size and cost of the project, the consortium plans to select an eligible software platform to provide the development environment and the test-bed for the novel research results produced by the project team. The principal research streams of the project are, namely the multi-paradigm knowledge representation, and the distributed content repository, are represented by workpackages (WP2) and (WP5) respectively. The knowledge representation research aims at integration of two distinct knowledge representation schemes, namely the logic approach (disjunctive Datalog) and the semantic data model approach (UML and the RDF standard), into a consistent, multi-paradigm knowledge model. The distributed content repository research is to approach two distinct problem areas, namely the content repository data structure and control process distribution among an arbitrary number of servers and data storage hierarchy, and integration of pre-exisiting, heterogeneous information sources accessible over the Inter/Intranet. The graphic user interface research (WP3) aims at providing advanced solutions within two distinct information architecture problem areas; presentation and manipulation of content maps and multimedia information objects, as well as the graphic knowledge representation. The technical standards to be utilized as the GUI platform are XML/XSL and the corresponding software tools. The ICONS prototype construction phase entails definition of the system architecture (WP4) and subsequently development of the prototype (WP6). Both workpackages involve advanced technological issues related to novelty of the underlying research results, hence close integration of both type of consortium partners (academic and industrial) is planned. In order to ensure a stable, high quality software prototype, seamlessly integrated with IST-2001-32429 ICONS Intelligent Content Management System page 73/86
  • 74. Intelligent Content Management System 1.15 Conclusions April 2002 the selected content management software environment providing an initial platform to be extended with new functionality, the component-based software development methodology and tools are to be used throughout the entire software development process. The ICONS prototype stability is to be ensured by a well defined disciplined quality assurance process. Demonstrating the new, advanced functionality requires selection of an application area that promises potentially high attraction to the target user community. Additionally, a facility to publish useful knowledge, dealing with a problem-ridden area such as efficient execution of complex technological and organizational projects in newly associated states funded by the EC aid programmes, is a desirable by-product. The NAS Best Practices” portal (WP7) amply meets the above requirements. The knowledge-based content management applications will surely avail of a disciplined design methodology, as the new, Internet-based information processing systems gain popularity. The methodology should be sufficiently general to prove useful for a wide class of knowledge-based system featuring advanced knowledge representation capabilities. Exploitation and dissemination of project results (WP8) aims at taking advantage of the project potential in the area of KM technology development within partners’ organisations and ensuring that the project results are capable of industrialisation as soon as possible. The primary activities include co-ordination with other relevant projects, publication of the achieved results, planing further implementation of the technology developed, assessment of the ICONS prototype by end users and workshop organisation. Project management (WP9), necessary for such a challenging project, covers both administrative and technical management and is carried out on strategic and daily base. Project milestones The following milestones are employed to measure the progress of the project: M1 By 6th month of the project technological base for the ICONS project will be selected and accepted by the consortium partners. The technological base will comprise the standards and software tools, as well as the content management platform to be extended, underlying the ICONS architecture. M2 By 6th month of the project the feasible research base for the project will be defined and the integration and extension work will commence. M3 By 12th month of the project the multi-paradigm knowledge representation scheme will be specified and accepted as the principal platform for knowledge management for the ICONS prototype. M4 By 12th month of the project the ICONS architecture will be defined and accepted by the consortium partners. M5 By 20th month of the project the ICONS prototype will be developed, tested and installed at selected partners’ sites, as well as made available on Internet for the consortium partners. M6 By 24th month of the project the “NAS Best Practices” portal will be operational, and the project final report will be accepted by the Commission. By the end of the project the ICONS software and methods will be fine-tuned with feedback from the pilot knowledge-based portal application. In order to strengthen the project management attention on the principal objectives of the ICONS project, we cross reference the modules of the project focus technological areas with workpackages, and their respective tasks, where the actual research work is being carried out. Note that some of the modules may not be assigned meaning that the specification and/or development work will be performed in prototype development workpackages. Since the ICONS prototype development concentrates on the prototype specification (Workpackage 4) and the prototype implementation (Workpackage 6) and it, by definition, concerns all system modules, there is no point in presenting cross reference information for these workpackages. It is worth noting, at this point, that the preliminary analysis shows that the enhancement work on the adopted technology modules may represent a substantial specification and development effort, as well as possibly also some research work. Note that the distributed workflow communication module has not entered into the principal research stream. This is due to the currently on-going standardisation work, co-ordinated by WfMC [Hayes2001], will probably result in an industry standard to be implemented by all workflow engine developers. Most of the pre-existing database integration work has been re-focused towards the intelligent agent development environment. We believe that the IA technology may be a reasonable answer to problems related to IST-2001-32429 ICONS Intelligent Content Management System page 74/86
  • 75. Intelligent Content Management System 1.15 Conclusions April 2002 extracting information from heterogeneous sources, although it may not lead to solutions providing a general answer to multidatabase management problems. ICONS functional modules Workpackage 2 Workpackage 3 Workpackage 5 (Focus Tech. Areas) Task: 1 2 3 4 1 2 3 1 2 3 Knowledge Management Ontology Model Manager X Structural Knowledge Navigator X Content Categorisation Engine X Datalog Inference Engine X Intelligent Workflow Manager X Semi-structured Content Integrator X Intelligent Agent Development Environment X X Human Computer Interaction (HCI) HCI Personalisation Engine X Electronic Form Manager Content Presentation Manager Knowledge Map Graph Manager X X Structural Knowledge Graph Manager X X Process Graph Manager X X Distributed Architecture Load Balancing Algorithms X Distribution Optimisation Algorithms X Scalable Distributed Data Structure (SDDS) X Distributed Workflow Communication X Workpackage 2 Multi-paradigm knowledge representation (WP leader Ulster) Task 1: Representing knowledge about complex content objects in an ontology base with disjunctive Datalog and the underlying relational data model (RDM) (Task leader CIES) Task 2: Mapping UML semantic data model (SDM) into the Resource Description Facility (RDF) specification (Task leader Ulster) Task 3: Representing procedural knowledge (WfMC compliant workflow specifications) in a RDM ontology base (Task leader ICS) Task 4: Specification of the ICONS multi-paradigm integrated knowledge schema and query language (Task leader Ulster) Workpackage 3 Advanced graphic user interface (WP leader ICS) Task 1: Methodology and tools for information architecture design (Task leader ICS) Task 2: Representing knowledge in the graphic user interface (Task leader ICS) Task 3: Design and implementation environment for the ICONS GUI (Task leader RODAN) Workpackage 5 Distributed content repository (Task leader CERIA) Task1: Access algorithms and data structure supporting the ICONS ontology base (Task leader ICS) Task2: Distribution of ICONS processes and data structures (Task leader CERIA) Task3: Design of a system architecture for integration of pre-existing, heterogeneous information sources (Task leader CERIA) Table 14. The ICONS focus technological area modules and the research stream workpackages IST-2001-32429 ICONS Intelligent Content Management System page 75/86
  • 76. Intelligent Content Management System 1.15 Appendix A. List of workpackages and deliverables April 2002 Appendix A. List of workpackages and deliverables Workpackages Workpa Workpackage title Leader Deliverable ckage No No WP1 Assessment of tools, standards, and methods ICS D4, D5, D6 WP2 Multi-paradigm knowledge representation UU D7, D8, D9, D10 WP3 Advanced graphic user interface ICS D11,D12, D13 WP4 ICONS Architecture Rodan D14, D15, D16, D17 WP5 Distributed Content Repository CERIA D18, D19, D20 WP6 Development of the ICONS prototype Rodan D21, D22, D23 WP7 Design and development of the “NAS Best Practices” Portal SEMA D24, D25, D35 WP8 Exploitation and dissemination of project results Rodan D34 WP9 Project Management Rodan D1, D2, D3, D26, D27, D28, D29 IST-2001-32429 ICONS Intelligent Content Management System page 76/86
  • 77. Intelligent Content Management System 1.15 Appendix A. List of workpackages and deliverables April 2002 Deliverables list Deliverable Deliverable title No D1 Project presentation D2 Consortium agreement D3 Evaluation criteria D4 Standards base for the ICONS project D5 Technological base for the ICONS project D6 Research base for the ICONS project D7 Extracting knowledge from complex content objects into an ontology base with logic inference capabilities D8 Equivalence of UML semantic data model and the RDF content mode D9 Capturing procedural knowledge from process class definitions and from process instance execution measures D10 A multi-paradigm ontology base schema D11 Information architecture: Evaluation of tools and methods D12 Visualisation of domain knowledge: methods and techniques D13 The ICONS graphic interface – design specification D14 Specification of the ICONS software development platform D15 Installation of the integrated ICONS software development platform D16 Specification of the ICONS architecture D17 The ICONS prototype implementation plan D18 Access algorithms and data structures underlying a distributed knowledge base D19 Optimization of a distributed knowledge-based system architecture D20 Integration of pre-existing, heterogeneous information sources D21 The ICONS software technical design manual D22 ICONS installed at selected consortium partners’ sites D23 Software test cases and acceptance protocol D24 The “NAS Best Practices” portal accessible via Internet D25 The knowledge-based content management application design methodology D26 1 Progress Report D27 2 Progress Report D28 3 Progress Report D29 1 Management Report D30 2 Management Report D31 3 Management Report D32 4 Management Report D33 Final Report D34 Technology Implementation Plan D35 Conceptual analysis of the ‘NAS Best Practices’ portal IST-2001-32429 ICONS Intelligent Content Management System page 77/86
  • 78. Intelligent Content Management System 1.15 Bibliography April 2002 Bibliography External references Aalst1999 van der Aalst, W., M., P. Flexible Workflow Management Systems: An Approach Based on Generic Process Models, Proceedings of the 10th International Conference on Database and Expert Systems Applications (DEXA'99), volume 1677 of Lecture Notes in Computer Science, pages 186- 195. Springer-Verlag, Berlin, 1999. Abecker1999 Abecker, A., Decker, S., Organizational memory. Knowledge acquisition, integration and retrieval issues, Knowledge-Based Systems, p. 113-124, 1999. Aksit2001 Aksit, M., Marcelloni, F., Tekinerdogan, B., Developing object-oriented frameworks using domain models, http://wwwhome.cs.utwente.nl/~bedir/papers/ FrameworkDomainModels.ps, 2001. Ambite2001 Ambite, J.L., Arens, Y., Philpot, A., Gravano, L., Hatzivassiloglou, Klavans, J., Simplifying Data Access: The Energy Data Collection Project, IEEE Computer, February 2001. Apt1988 Apt, K. R., Blair, H. A., and Walker, A., Towards a Theory of Declarative Knowledge. In Minker, J., editor, Foundations of Deductive Databases and Logic Programming, pages 89-148. Morgan Kaufmann Publishers, Inc., Los Altos, California, 1988, USA. Baek1999 Baek, S., Liebowitz, J., Prasad, S.Y., and Granger, M., Intelligent Agents for Knowledge Management – Toward Intelligent Web-Based Collaboration within Virtual Teams, in Knowledge Management Handbook, J. Liebowitz (Ed.), CRC Press LLC, 1999, USA. Baral1994 Baral, C., Gelfond, M., Logic Programming and Knowledge Representation, J. Logic Programming, Vols. 19/20, 1994. Bassiliades2000 Bassiliades, N., Vlahavas, I., Elmagarmid, A.K., E-DEVICE: An Extensible Active Knowledge Base System with Multiple Rule Type Support, IEEE Transactions on Knowledge and Data Engineering, Vol.12., No. 5, September/October 2000. Baumgartner2001 Baumgartner, R., Flesca, S., Gottlob, G., Visual Web Information Extraction with Lixto, in Proceedings of the 27th VLDB Conference, Rome, Italy, 2001. Becker1999 Becker, G., Knowledge Discovery, in Knowledge Management Handbook, J. Liebowitz (Ed.), CRC Press LLC, 1999, USA. Becker2001 Becker, S.A., Mottay, F.E., A Global Perspective on Web Site Usability, IEEE Software, January/February 2001. Bell1996 Bell, D. Guan, J. and Lee, S. (1996). Generalized union and project operations for pooling uncertain and imprecision information. Data & Knowledge Engineering. 18 (1996) pp 89- 117. Ben-Eliyahu1994 Ben-Eliyahu, R., and Dechter, R., Propositional Semantics for Disjunctive Logic Programs. In Annals of Mathematics and Artificial Intelligence, 12:53-87, 1994. Berners-Lee1999 Berners-Lee, T., J.,Syntax/Semantics, W3C, 1999. Booch1998 Booch, G., Rumbaugh, J., Jacobson I., The Unified Modelling Language User Guide, Addison Wesley, 1998. Bouguettaya2000 Bouguettaya, A., Benatallah, B., Hendra, L., Ouzzani, M., Beard, J., Supporting Dynamic Interactions among Web-Based Information Sources. Bouguettaya2001 Bouguettaya, A., Ouzzani, M., Medjahed, B., Cameron, J., Managing Government Databases, IEEE Computer, February 2001. Buccafurri1998 Buccafurri, F., Leone, N., Rullo, P., Disjunctive Ordered Logics: Semantics and Expressiveness, Proceedings of International Conference on Principles of Knowledge Representation and Reasoning (KR ’98), 1998. Buchner2000 Buchner, A.G., Baumgarten, M., Mulvenna, M.D., Bohm, R., Anand, S.S., Data Mining and XML: Current and Future Issues, Proc. of the International Conference on Web Information System Engineering (WISE’00). CACM97 Comm. of ACM. Special Issue on high-performance Computing.(Oct. 1997). CERIA Centre des Etudes et de Recherches en Informatique Appliquée. U. Paris 9 Dauphine, Culler1994 France. http://ceria.dauphine.fr/ Culler, D & al. NOW: Towards Everyday Supercomputing on a Network of Workstations. EECS Tech. Rep. UC Berkeley. Cadoli1997 Cadoli, M., Eiter, T., and Gottlob, G., Default Logic as a Query Language. In IEEE Transactions on Knowledge and Data Engineering, 9(3):448-463, 1997. IST-2001-32429 ICONS Intelligent Content Management System page 78/86
  • 79. Intelligent Content Management System 1.15 Bibliography April 2002 Chang2001 Chang, S-K., Znati, T., Adlet: An Active Document Abstraction for Multimedia Information Fusion, IEEE Transactions on Knowledge and Data Engineering, Vol., 13, No., 1 January/February 2001. Chen1999 Chen, C., Information Visualisation and Virtual Environments, Springer-Verlag, London, 1999. Chen2001 Chen, C., Paul, R.J., Visualizing a Knowledge Domain’s Intellectual Structure, IEEE Computer, March 2001. Coleman1999 Coleman, D., Groupware: Collaboration and Knowledge Sharing, in Knowledge Management Handbook, J. Liebowitz (Ed.), CRC Press LLc, 1999, USA. Connalen2000 Connalen, J., Building Web Applications with UML, Addison Wesley, 2000. Corby1999 Corby, O., Dieng, R., The Webcokace Knowledge Server, IEEE Internet Computing, November/December 1999. Davenport1999 Davenport, T., H., Knowledge Management and the Broader Firm: Strategy, Advantage, and Performance, in Knowledge Management Handbook, J. Liebowitz (Ed.), CRC Press LLC, 1999, USA. Decker2000a Decker, S., Melnik, S., Van Harmelen, F., Fensel, D., Klein, M., Broekstra, J., Erdmann, M., Horrocks, I., The Semantic Web: The Roles of XML and RDF, IEEE Internet Computing, September/October 2000. Decker2000b Decker, S., Mitra, P., Melnik, S., Framework for the Semantic Web: An RDF Tutorial, IEEE Internet Computing, November/December 2000. Deutsch2000 Deutsch, A., et al., XML-QL: A Query Language for XML, WWW Consortium, www.w3.org/TR/NOTE-xml-ql (current May 2000). Diene2000 Diène, A. W. Litwin, W. Performance Measurements of RP*: A Scalable Distributed Data Structure for Range Partitioning. 2000 Intl. Conf. on Information Society in the 21st Century: Emerging Techn. and New Challenges. Aizu City, Japan, 2000. Dieng1999 Dieng, R., Corby, O., Giboin, A., Ribiere, M., Methods and tools for corporate knowledge management, Int. Journal of Human-Computer Studies, vol. 51, no. 3, pp. 567-598, 1999. Dieng2000 Dieng, R., Knowledge Management and the Internet, IEEE Intelligent Systems, May/June 2000. DoD1997 The Program Manager’s Guide to Software Acquisition Best Practices, ver. 2.1, U.S Department of Defense, 1997. Düntsch1997 Düntsch, I. and Gediga, G. (1997). Statistical evaluation of rough set dependency analysis. International Journal of Human--Computer Studies, 46:589--604. Düntsch1998 Düntsch, I. and Gediga, G. (1998). Simple data filtering in rough set systems. International Journal of Approximate Reasoning, 8(1--2):93--106. Dyreson2000 Dyreson, C.E., Evans, W.S., Lin, H., Snodgrass, R.T., Efficiently Supporting Temporal Granularities, IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 4, July/August 2000. Eder1997 Eder, J.; Pozewaunig, H., Liebhart, W., ePERT: Extending PERT for Workflow Management Systems, Proceedings of the 1st East-European Conference on Advances in Databases and Information Systems (ADBIS’97), 1997. Eder1999 Eder, J.; Panagos, E., Pozewaunig, H., Rabinovich, M., Time Management in Workflow Systems, Proceedings of the 3rd International Conference on Business Information System (BIS’99), p. 265-280, 1999. Eder2001 Eder, J., Paganos, E., Managing Time in Workflow Systems, in Workflow Handbook 2001, Layna Fischer (Ed.), Future Strategies Inc., Book Division, 2001, USA. Eder2001 Eder, J.; Panagos, E., Managing Time in Workflow Systems, Workflow handbook 2001. Eiter1994a Eiter, T., Gottlob, G., and Mannila, H., Adding Disjunction to Datalog. In Proceedings of the Thirteenth ACM SIGACT SIGMOD-SIGART Symposium on Principles of Database Systems (PODS-94), pages 267-278. ACM Press, 1994. Eiter1997 Eiter, T., Gottlob, G., and Mannila, H., Disjunctive Datalog. ACM Transactions on Database Systems, 22(3):315-363, 1997. Eiter2000 Eiter, T., Faber, W., Leone, N., and Pfeifer G., Declarative Problem-Solving Using the DLV System. In Jack Minker, editor, Logic-Based Artificial Intelligence, pages 79-103. Kluwer Academic Publishers, 2000. Elmagarmid2001 Elamagarmid, A.K., McIver, W.,J., The Ongoing March Toward Digital Government, IEEE Computer, February 2001. Enlarg-Report2001 Strategy Paper 2001 in Key Documents related to the Enlargement Process http://europa.eu.int/comm/enlargement/report2001/index.htm IST-2001-32429 ICONS Intelligent Content Management System page 79/86
  • 80. Intelligent Content Management System 1.15 Bibliography April 2002 Enlarg-Report2001 - http://europa.eu.int/comm/enlargement/report2001/annexes_en.pdf Annexes EU-Glossary http://europa.eu.int/scadplus/leg/en/cig/g4000.htm Faber1996 Faber, W., and Pfeifer, G., DLV homepage. URL:http://www.dbai.tuwien.ac.at/proj/dlv/, since 1996. Fairchild1988 Fairchild, K., Poltrock, S., Furnas, G., SemiNet: Three-Dimensional Graphic Representations of Large Knowledge Bases, Cognitive Science and Its Applications for Human Computer Interaction, in R. Guidon (Ed.) Lawrence Erlbaum Associates, Hillsdale, N.J., 1988. Fensel 2000 Fensel, D., van Harmelen, F., Klein M., Akkermans, H. (2000). On-To-Knowledge: Ontology-based Tools for Knowledge Management. Report of EU-IST project No. 10132. http://www.ontoknowledge.org. Fensel1998 Fensel, D., Angele, J., Struder, R., The Knowledge Acquisition and Representation Language, KARL, IEEE Transaction on Knowledge and Data Engineering, Vol. 10, No., 4, July/August 1998. Firestone2000 Firestone, J.M., Knowledge Management: A Framework for Analysis And Measurement, White Paper No 17, Executive Information Systems, Inc, October 1, 2000, www.dkms.com. Firestone2001 Firestone, J., M., Knowledge Management Process Methodology: An Overview, Knowledge and Innovation: Journal of the KMCI, vol. 1, no. 2, 2001. Garvin1993 Garvin, D., A., Building a Learning Organization, Harvard Business Review, July- August, 1993. Gelfond1991 Gelfond, M. and Lifschitz, V., Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing, 9:365-385, 1991. Gibbs2002 Gibbs, W. Explore Autonomic Computing. Scientific American. May 06, 2002 Ginsburg1999 Ginsburg, M., Kambil, A., Annotate: A Web-based Knowledge Management Support System for Document Collections, Proc. of the 32nd Hawaii International Conference on System Sciences, IEEE 1999. Goeschka2001 Goeschka, K.M., Schranz M.W., Client and Legacy Integration in Object-Oriented Web Engineering, IEEE Multimedia, January/March 2001. Goldman1991 Goldman, A.H., Empirical Knowledge, 1991, Berkeley University, USA. Gray1996 Gray, J. Super-Servers: Commodity Computer Clusters Pose a Software Challenge. Microsoft, 1996. http://www.research.microsoft.com/ Gregersen1999 Gregersen, H., Jensen, Ch., Temporal Entity-Relationship Models – A Survey, IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 3, May/June 1999. Grossman1996 Grossman, W., Metadata. In Proceedings of New Technologies and Techniques in Statistics, pages 183-185. Hammer1997 Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A., Extracting semi- structured data from the web, Proc. of Workshop on Management of Semi-structured Data, IEEE 1997. Hayes2001 Hayes, J.G., Peyorovian, E., Sarin, S., Schmidt, M-T., Swenson, K.D., Weber, R., Workflow Interoperability Standards in the Internet, in Workflow Handbook 2001, Layna Fischer (Ed.), Future Strategies Inc., Book Division, 2001, USA Henderson- Henderson-Sellers, B., Younessi, H., Graham, I.S., The OPEN Process Specification, Sellers1997 Addison Wesley, 1997. Henderson- Brian Henderson-Sellers, Tony Simons, Houman Younessi – The OPEN Toolbox of Sellers1998 Techniques, Addison Wesley, 1998. Highsmith2000 Highsmith III, J.A., Adaptive Software Development, A Collaborative Approach to Managing Complex Systems, Dorset House, 2000. Holsapple1999 Holsapple, C.W., Joshi, K.D., Description and Analysis of Existing Knowledge Management Frameworks, Proc. of the 32nd Hawaii International Conference on System Sciences, IEEE 1999. Holsapple2002 Clyde, W., Holsapple, K., Joshi, D., A collaborative approach to ontology design, Communications of the ACM, vol. 45, no. 2, pp. 42-47, 2002. Huntington1999 Huntington, D., Knowledge-Based Systems: A Look at Rule-Based Systems, in Knowledge Management Handbook, J. Liebowitz (Ed.), CRC Press LLC, 1999, USA. Hurson1994 Hurson, A.R., Bright, M.W., Pakzad, S.H., (Editors), Moltidatabase Systems: An Advanced Solution for Global Information Sharing, IEEE Computer Society Press, 1994. IBM1995 IBM, Intelligent Agent Strategy, White Paper, (http://activist.gpl.ibm.com:81/WhitePaper/ptc2.htm, 1995. IST-2001-32429 ICONS Intelligent Content Management System page 80/86
  • 81. Intelligent Content Management System 1.15 Bibliography April 2002 Iglesias1998 Iglesias, C.A., Garijo, M., Gonzales, J.C., A survey of agent oriented methodologies, In: M. P. Singh J. P. Muller and A. S. Rao, editors, Intelligent Agents V. Agent Theories, Architectures, and Languages - 5th International Workshop, number 1555 in Lecture Notes in Artificial Intelligence, Paris, France, Springer Verlag, 1998. ISO1995 ISO/IEC 12207, Information Technology – Software lifecycle processes, 1995-2001. Jarvis1999 P. Jarvis, J. Stader, A. Macintosh, J. Moore, and P. Chung. 1999: "What Right Do You Have to Do That? Infusing Adaptive Workflow Technology with Knowledge about the Organisational and Authority Context of a Task"; In Proceedings of the First International Conference on Enterprise Information Systems (ICEIS-99), Setubal, Portugal. (Paper at http://www.aiai.ed.ac.uk/~jussi/pubs.html) Kahn2001 Kahn.,P., Lenk, K., Mapping Web Sites, Rotovision SA, 2001. Kirda2001 Kirda, E., Jazayeri, M., Kerer, C., Schranz, M., Experiences in Engineering Flexible Web Services, IEEE Multimedia, January/March 2001. Klingemann2000 J. Klingemann; Controlled Flexibility in Workflow Management. In Proceedings of the 12th International Conference on Advanced Information Systems Engineering (CAiSE'00), Stockholm, Sweden, June 5-9, 2000. pp. 126-141. (Copyright Springer- Verlag). KMForum2001 Weber, F., Kemp, J., Common Approaches and Standarisation in KM, EKMF Workshop on Standarisation, Brussels, June, 2001, www.knowledgeboard.com. KMForum2001_D11 Kemp, J., Pudlatz, M., Perez, P., Ortega A.M., KM Technologies and Tools, European KM Forum, IST Project No 2000-26393, March, 2000, www.knowledgeboard.com. KMForum2001_D11 Kemp, J., Pudlatz, M., Perez, P., Ortega A.M., KM Terminology and Approaches, a European KM Forum, IST Project No 2000-26393, March, 2000, www.knowledgeboard.com. KMForum2001_D12 Simpson, J., Aucland, M., Kemp, J., Pudlatz, M., Jenzowsky, S., Brederhorst, B., Toerek, E., Trends and visions in KM, European KM Forum, IST Project No 2000-26393, April, 2000, www.knowledgeboard.com. Knoblock1998 Knoblock, C.A., Minton, S., Ambite, J.L., Ashish, N., Modi, P.J., Muslea, I., Philipot, A., Tejada, S., Modeling web sources for information integration, Proc. of AAAI Conference, 1998. Koulopoulos1995 Koulopoulos, T., M., the Workflow Imperative, van Nostrand Reinhold, 1995. KPMG1999 KPMG Consulting, Knowledge Management Research Report 2000, November, 1999, www.kpmg.co.uk. Kruchten2000 Kruchten, P., The Rational Unified Process, An Introduction, Addison Wesley Longman, 2000. Kuhn1997 Kuhn, O., Abecker, A., Corporate Memories for Knowledge Management in Industrial Practice: Prospects and Challenges, Journal of Universal Computer Science, vol. 3, no. 8, pp. 929-954, 1997. Kushmerick1997 Kushmerick, N., Weil, D., Doorenbos, R., Wrapper induction for information extraction, in Proc. of the Int. Joint Conference on Artificial Intelligence, 1997. Lambrix1997 Lambrix, P., Shamehri, N., Aberg, J., Towards Creating a Knowledge Base for World- Wide Web Documents, Proc. of the 1997 IASTED International Conference on Intelligent Information Systems (IIS ’97), IEEE 1997. Lassila1998 Lassila, O., Web Metadata: A Matter of Semantics, IEEE Internet Computing, July/August 1998. Lassila2000 Lassila, O., Swick, R.R., Resource Description Framework (RDF) Model and Syntax Specification, WWW Consortium, www.w3.org/TR/REC-rdf-syntax (current May 2000) Lawrence2001 Lawrence, R., Barker, K., Integrating Data Sources Using a Standardized Global Dictionary, in Knowledge Discovery for Business Information Systems, W. Abramowicz and J. Zurada (Eds.), Kluwer Academic Publishers, 2001. Leone1997 Leone, N., Rullo, P., Scarcello, F., "Unfounded Sets, Fixpoint Semantics and Computation of Disjunctive Stable Models", Information and Computation, Academic Press, Vol 135, N. 2, 1997, pp. 69-112 Letson2001 Letson R., Find A Match. TaxonomiesPutContentinContecxt, Transform Magazine, December 2001 Lifschitz1994 Lifschitz, V. and Turner, H., Splitting a Logic Program. In Van Hentenryck, P., editor, Proceedings of the 11th International Conference on Logic Programming (ICLP'94), pages 23-37, Santa Margherita Ligure, Italy. MIT Press, 1994. Lifschitz1996 Lifschitz, V., Foundations of logic programming. In Brewka, G., editor, Principles of Knowledge Representation, pages 69-127. CSLI Publications, Stanford, 1996. IST-2001-32429 ICONS Intelligent Content Management System page 81/86
  • 82. Intelligent Content Management System 1.15 Bibliography April 2002 Lin2002 H.Lin, T.Risch, T.Katchanounov: Adaptive data mediation over XML data. To be published in special issue on "Web Information Systems Applications" of Journal of Applied System Studies (JASS), Cambridge International Science Publishing, 2002 Litwin1993 Litwin, W., Neimat, M-A., Schneider, D. LH* : Linear Hashing for Distributed Files. ACM-SIGMOD Intl. Conf. On Management of Data, 1993. Litwin1996 Litwin, W. Menon, J., Risch, T., Schwarz Th. Design Issues For Scalable Availability LH* Schemes with Record Grouping. Distributed Data and Structures. Carleton Scientific, (publ.) 2000. Litwin2000 Litwin, W. Menon, J., Risch, T., Schwarz Th. Design Issues For Scalable Availability LH* Schemes with Record Grouping. Distributed Data and Structures. Carleton Scientific, (publ.) 2000. Litwin2000a Litwin, W., Schwarz, T., LH*RS: A High-Availability Scalable Distributed Data Structure using Reed Solomon Codes. ACM-SIGMOD-2000 Intl. Conf. On Management of Data. Lobo1992 Lobo, J., Minker, J., Rajasekar, A., Foundations of Disjunctive Logic Programming, Cambridge, Mass., MIT Press, 1992. Lobo1992 Lobo, J., Minker, J., and Rajasekar, A., Foundations of Disjunctive Logic Programming. The MIT Press, Cambridge, Massachusetts, 1992. Lopez1999 Fernandez M. Lopez, A., Overview of methodologies for building ontologies, In: Proceedings of the IJCAI Workshop on Ontologies and Problem-Solving Methods, Stockholm, Sweden, 1999. Maedche2001 Maedche, A., Staab, S., Strojanovic, N., et al. SEmantic portAL - The SEAL approach, In: Creating the Semantic Web. D. Fensel, J. Hendler, H. Lieberman, W. Wahlster (eds.) MIT Press, MA, Cambridge, 2001. Martin2000 Martin, Ph., Eklund, P.W., Knowledge Retrieval and the World Wide Web, IEEE Intelligent Systems, May/June 2000. McClean.2002 McClean, S., Páircéir, R., Scotney, B.,and Greer, K. (2002). A Negotiation Agent for Distributed Heterogeneous Statistical Databases. SSDBM, 2002. McClean2000 McClean, Páircéir, and Scotney and Zhang (2000) Adding Context to the Retrieval of Aggregate Data. McElroy1999 McElroy, M.W., Second-Generation KM, Knowledge Management, October 1999. Mecella2001 Mecella, M., Batini, C., Enabling Italian E-Government through a Cooperative Architecture, IEEE Computer, February 2001. Mitchell1979 Mitchell, T. M. (1979). Version spaces: An approach to concept learning. PhD thesis, Electrical Engineering Dept., Stanford University, Stanford, CA. Mitchell1997 Mitchell, T. M. (1997). Machine Learning. The McGraw-Hill Companies, Inc. Momotko2002 Momotko, M., Subieta, K., Dynamic change of Workflow Participant Assignment. Paper accepted to Advances in Database Information Systems, ADBIS‘2002, Bratislava, 2002. Nguyen1998 Nguyen, S. H., Skowron, A., and Synak, P. (1998). Discovery of data patterns with applications to decomposition and classification problems. In Polkowski, L. and Skowron, A., editors, Rough sets in knowledge discovery, Vol. 2, pages 55--97, Heidelberg. Physica--Verlag. Nonaka1995 Nonaka, I., Takeuchi, H., The Knowledge Creating Company, Oxford University Press, 1995, New York, USA. O’Leary1998 O’Leary, D., Enterprise Knowledge Management, IEEE Computer, March 1998. OMG1998 Object Management Group, Workflow Management Facility, 1998. Ozsu1999 Ozsu, T., Valduriez, P. Principles of Distributed Database Systems. 2nd Ed. Prentice Hall, 1999. PG - Glossary Annexes to Practical Guide: Glossary of Terms http://europa.eu.int/comm/europeaid/tender/gestion/pg/a01_en.pdf PG-2000 Practical Guide to Phare, Ispa & Sapard contract procedures http://europa.eu.int/comm/europeaid/tender/gestion/pg/pg_phare_en.pdf (December 2000), 170 pages. PG-2001 Practical Guide to EC external aid contract procedures http://europa.eu.int/comm/europeaid/tender/gestion/pg/pg_en.pdf (January 2001), 176 pages Phare_Review_2000 Phare 2000 Review, 27.10.2000, C(2000)3103/2, http://europa.eu.int/comm/enlargement/pas/phare/pdf/review_2000.pdf Phare-Glossary2001 http://europa.eu.int/comm/enlargement/pas/phare/glossary.htm IST-2001-32429 ICONS Intelligent Content Management System page 82/86
  • 83. Intelligent Content Management System 1.15 Bibliography April 2002 Phare-ISPA-Sapard- The Enlargement Process and the three pre-accession instruments: Phare, ISPA, Sapard, 2001 Proceedings of a conference, 5 th March 2001, http://europa.eu.int/comm/enlargement/pas/phare/pdf/bro-phare-ispa-sapard-2.pdf PL-Reg-Rep-2000 2001 REGULAR REPORT ON POLAND’S PROGRESS TOWARDS ACCESSION, 122 pages Plumtree2001 Plumtree Software Inc., A Framework for Assessing Return on Investment for a Corporate Portal Deployment, White Paper, 2001, www.plumtree.com. Polanyi1966 Polanyi, Michael, The Tacid Dimension, Routledge and Kegan Paul, 1966, London, England. Popper1972 Popper, Karl R., Objective Knowledge, Oxford University Press, 1972, London, England. Popper1977 Popper, Karl, R., Eccles, J., The Self and Its Brain, Springer Verlag, 1977, Berlin, Germany. President1998 PRESIDENT’S INFORMATION TECHNOLOGY ADVISORY COMMITTEE. INTERIM REPORT TO THE PRESIDENT OF THE UNITED STATES. August 1998 Quinn1996 Quinn, J.B., Anderson, P., and Finkelstein, S., Managing Professional Intellect, Harvard Business Review, March-April, 1996. Rabarijaona2000 Rabarijaona, A., Dieng, R., Corby, O., Ouddari, R., Building and Searching and XML- Based Corporate Memory, IEEE Intelligent Systems, May 2000. Ramakrishnan2000 Ramakrishnan, N., PIPE: Web Personalization by Partial Evaluation, IEEE Internet Computing, November/December 2000. Rumbaugh1999 Rumbaugh J., Jacobson I., Booch G., The Unified Modeling Language Reference Manual, Addision Wesley, 1999 RUP2002 Rational Unified Process, ver. 2002.05.00, Rational Software Corporation, 2002. Sahuguet1999 Sahuguet, A., Azavant, F., WysiWyg Web Wrapper Factory (W4F), Proceeedings of the WWW Conference, 1999. Shim2000 Shim, S.Y., Pendyala, V.S., Sundaram, M., Gao, J.Z., Business-to-Business E-commerce Frameworks, IEEE Computer, October 2000. Simons1994 Simons, G.F., Conceptual modelling versus visual modelling: a technological key to building consensus, Consensus ex Machina, Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computing and the Humanities, Paris, France, 1994. Soderland1997 Soderland, S., Learning to extract text-based information from the world wide web, Proc. of Knowledge Discovery and Data Mining Staab2001 Staab, S., Studer, R., Schnurr, H-P., Sure, Y., Knowledge processes and ontologies, Intelligent Systems, vol. 16, no. 1, pp. 26-34, 2001. Stader2001 Stader, J., Moore, J., Chung, P., McBriar, I., Ravinranathan, M., Macintosh, A., Applying Intelligent Workflow Management in the Chemicals Industries, in Workflow Handbook 2001, Layna Fischer (Ed.), Future Strategies Inc., Book Division, 2001, USA. Stader2001 Stader, J., Moore, J., Chung, P., McBriar, I., Ravinranathan, M., Macintosh, A., Applying Intelligent WorkFlow Management in the Chemicals Industries, In WorkFlow Handbook 2001, 2001. Struder1998 Struder, R., Richard Benjamins, V., Fensel, D., Knowledge Engineering: principles and methods, DKE vol. 25, no. 1-2, 1998. Sure2001 Sure, Y., A tool-supported methodology for ontology-based knowledge management, http://www.aifb.uni-karlsruhe.de/WBS/ysu/publications/2001_fgwm_ontokick.pdf, 2001. Swenson1998 Swenson, K., Simple Workflow Access Protocol (SWAP), 1998. Swenson2001 Swenson, K., Workflow for the Information Worker, in Workflow Handbook 2001, Layna Fischer (Ed.), Future Strategies Inc., Book Division, 2001, USA. Tiwana2000 Tiwana, A., The Knowledge Management Toolkit, Prentice Hall PTR, Upper Saddle River, 2000. Ullman1989 J.D. Ullman, Principles of Database and Knowledge-Base Systems, Rockville, Md: Computer Science Press, 1989 Uschold1995 Uschold, M., King, M., Towards a methodology for building ontologies, In: Workshop on Basic Ontological Issues in Knowledge Sharing, held in conjunction with IJCAI-95, Montreal, Canada, 1995. Uschold1996 Uschold, M., Building Ontologies: Towards a Unified Methodology, Proceedings of Expert Systems, 16th Annual Conference of the British Computer Society Specialist Group on Expert Systems, 1996. Vocking2002 Vocking, B. Symmetric vs. Asymmetric Multiple-Choice Algorithms. Invited Paper. Aracne 2001. Carleton Scientific. 2002. IST-2001-32429 ICONS Intelligent Content Management System page 83/86
  • 84. Intelligent Content Management System 1.15 Bibliography April 2002 Wang1998 Wang, H., Düntsch, I., and Bell, D. (1998). Data reduction based on hyper relations. In Agrawal, R., Stolorz, P., and Piatetsky-Shapiro, G., editors, Proceedings of KDD'98, pages 349--353, New York. Wang2000 Wang, H., Düntsch, I., and Gediga, G. (2000). Classificatory filtering in decision systems. International Journal of Approximate Reasoning, 23:111--136. Weidner2002 Weidner, D., Using Connect and Collect to Achieve the KM Endgame, IEEE IT Professional, Jan-Feb 2002, 18-24. WfMC1994 Workflow Management Coalition, Information Pack, Grenoble, France, July 1994 WfMC1996 Workflow Management Coallition, Workflow standard, Interoperability abstract specification, WfMC-TC-1012 version 1.0, Oct 1996. WfMC1999 Workflow Management Coallition, Workflow standard, Workflow terminology & glossary, WfMC-TC-1011 issue 3.0, Feb 1999. WfMC2001_A Workflow Management Coallition, Workflow standard, Workflow process definition language – XML process definition language, WfMC-TC-1025 draft 0.03a, May 2001. WfMC2001_B Workflow Management Coallition, Workflow standard, Wf-XML Binding, WfMC-TC- 1023 version 1.1, Nov 2001. WfMC2002 Workflow Management Coallition, Workflow standard, Wf-XML Binding, WfMC-TC 1023, Final draft, Nov 2001 Version 1.1. Zhang1996 Zhang, M, Zheng C. (1996). Analysis Methodologies of Synthesis of Solutions in Distributed Expert Systems. Proc ICMAS Kyoto, AAAI Press, pp417-421. ICONS references [ICONS ICONS Consortium, Intelligent Content Management System Contract Number IST- CONTRACT] 2001-32429. Annex I – Description of work, October 2001 [ICONS D02] ICONS Consortium, The ICONS project consortium agreement, April 2002 [ICONS D05] ICONS Consortium, Technological Base for the ICONS project, under development IST-2001-32429 ICONS Intelligent Content Management System page 84/86
  • 85. Intelligent Content Management System 1.15 Dictionary April 2002 Dictionary Notion Meaning actor intelligent agent or knowledge worker CAS a goal-directed open system attempting to fit itself to its environment and composed of interacting adaptive agents described in terms of rules applicable with respect to some specified class of environmental inputs [Holland1995] content any type of a multimedia object corporate portal uniform web-based access point to all the organisation’s data, applications and processes regardless of geographical and temporal limitations declarative knowledge pertaining to static entities like objects, relationships, taxonomies, ruls etc. knowledge non procedural knowledge FGKM First Generation Knowledge Management; approach focusing mainly on distribution of existing knowledge intelligent agent a software entity that carries out some set of operations on behalf of a user or another program with some degree of independence or autonomy, and in so doing, employs some knowledge or representation of the user’s goals or desires [IBM1995] KLC Knowledge Life Cycle; a cyclic activity of production, validation and integration of knowledge KM Knowledge Management; a set of compounded activities aiming at increasing organisations effectiveness and efficiency on the way of better exploitation of information resources KMS Knowledge Management System; an IT platform supporting knowledge management processes knowledge base the set of remembered data, validated propositions and models (along with metadata related to their testing), refuted propositions and models (along with metadata related to their refutation), metamodels, and (if the system produces such an artifact) software used for manipulating these, pertaining to the system and produced by it [Firestone2000] knowledge a set of methodologies and tools for expert knowledge acquisition and formal engineering representation necessary for process of capturing experts’ knowledge knowledge manager knowledge worker reposnsible for knowledge production, maintenance and dissemination knowledge map a visual facility allowing for navigation over complex taxonomy and object relationship structures of a knowledge base knowledge worker individual whose overall outstanding performance relay on a unique knowledge in a particular application domain KR knowledge representation learning an organization skilled at creating, acquiring, and transferring knowledge, and at organization modifying its behaviour to reflect new knowledge and insights [Garvin1993] mediator intermediate virtual database between the integrated data sources and the application using them for re-trieval and update [Lin2002] NAS Newly Associated States NAS Best Practices an ICONS architecture compliant portal comprising best practices of PHARE, SAPARD, Portal and ISPA projects developed within the Newly Associated States ontology an explicit conceptualization model comprising objects, their definitions, and relationships among objects [Becker1999] procedural knowledge defined in a prescriptive way i.e. by step by step procedure knowledge RDF Resource Description Facility; emerging standard for standard for Web metadata that syntactically consists of nodes and attached attribute/value pairs [Lassila1998] SGKM Second Generation Knowledge Management; approach adding to the FGKM aspects related to acceleration the production of new knowledge structural knowledge incorporated in ontology-based structure of objects and relation among them knowledge IST-2001-32429 ICONS Intelligent Content Management System page 85/86
  • 86. Intelligent Content Management System 1.15 Dictionary April 2002 taxonomy 1. a set of means – topics, headings, categories – into which content can be sorted 2. a well-defined terminology used within a particular ontology to describe the classes of objects, their properties, and relationships UML Unified Modelling Language workflow the automation of a business process, in whole or part, during which documents, management information or tasks are passed from one participant to another for action, according to a set of procedural rules [WfMC1994] wrapper an interface to data sources that translate data into a common data model used by the mediator [Lin2002] XML Extensible Markup Language IST-2001-32429 ICONS Intelligent Content Management System page 86/86

×