This course aims to present the steps that are taken to elicit requirements of the framework aiming to enhance traditional content management systems with semantic capabilities. It will first give common steps that are done between system designers and target groups i.e the group delivering the work and the target group (CMS vendors in this case) to determine the actual needs. After going over the agreements done between the two groups the focus will be on the high level requirements elicited. The course will try to give importance of high level requirements from a semantic perspective.
In the first phases of requirement elicitation process, bilateral meetings with the target group are done. These meeting can be workshops, interviews, brainstorming session, etc. Apart from bilateral meetings, carrying out questionnaires and survey can also be considered as methods to obtain, understand the needs of target groups in requirement analysis phase. For the time being, as our consideration is content management systems and as it is desired to provide services on top of existing systems, it would not be realistic if the existing systems would not have been examined. The better understanding of CMS vendors’ needs, the more fluent advancement later in design, implementation phases of the project.As long as collecting the needs, requirements of target groups, the collected materials are categorized into topics. For example put “Extract RDF statements from XML or HTML document” statement into “Enrichment of Content” topic and “Possibility to create pages by queries” statement into “Support for content creation” topic. The topics identified lead to high level requirements and use cases the refinement process.To ensure that needs of different CMS providers are considered, a cross validation is done between the resulting use cases and requirements of CMS providers.
Elicited high level requirements are refined into use cases. All requirements are based on use cases which use a common actor’s model for CMS because this model is the basis for the communication between the consortium, which contains producer and consumer groups, about different use cases and which actors are involved. Easy adoption and technology independence is one of the major concerns of the CMS industry. Because CMS providers want to spend the minimum effort to integrate semantic services onto their existing systems. Using RESTful services through the HTTP protocol for accessing to semantic services independent from the underlying technology required by them is a worldwide accepted modality to provide easy integration. Functionality provided by the system should be accessible by adopters through RESTful services. These services should be applicable for any domain e.gtouristic domain or health care domain. These services can be reused to create new higher-order services, the system can be extended by new services for semantic features, and services can be replaced. This setup allows a modular development of the IKS and enough flexibility to experiment with different implementations of semantic services. Additionally, each service is required to define further extension points to allow fine grained customization of all semantic features.
As it is desired to provide semantic functionalities that will be used by any kind of content management systems, common parts of existing content management systems should be identified. There are different kinds of input to be considered namely, product descriptions, expectations from industry, product web sites. Furthermore, existing CMSs can be analyzed by directly running and investigating the features.
These are the possible content types that are managed by content management systems. Not all of them support all kinds of these items; but a system aiming to be quite generic for existing systems should be capable to support these content types.
The figure in the slide shows the generic content workflow within in a content management systems. The important steps of this workflow in terms of semantic enhancement are the last 3 steps. Following 3 slides list the topics that are identified for each step.
The figure and explanation is adapted from the study: Fabian Christ, Benjamin Nagel: A Reference Architecture for Semantic Content Management Systems Starting from the user interface layer, A CMS User Interface at the top layer in the figure presents the content and offers editorial features to create, modify, and manage content within its lifecycle. Access to the content itself is provided by a Content Access layer. This layer is used by the User Interface to get access to the content and the content management features of the CMS. Additionally, the Content Access layer can be used by third party software that may want to integrate the CMS into other applications. The core management features are implemented in the Content Management layer. This layer provides functionalities for the deﬁnition of the domain or application speciﬁc Content Data Model. The Content Data Model layer is conceptually placed below the Content Management layer that has the necessary features to manipulate the model. The Content Data Model is the application speciﬁc model based on the underlying Content Repository. The Content Repository deﬁnes the fundamental concepts and persistence mechanisms for any Content Data Model that is deﬁned on top. The Content Management features are tightly related to the Content Administration layer to administer the CMS stack.The question was how new functionality provided by the semantic services may be integrated in this architectural scenario. The idea is to offer a set of semantic services that can be easily used by a standardized communication protocol. This approach is agreed and supported by the CMS vendors who would like to see simple RESTful interfaces to these semantic services. The new situation is depicted in the next slide.
The figure and explanation is adapted from the study: Fabian Christ, Benjamin Nagel: A Reference Architecture for Semantic Content Management Systems The figures shows the architecture that enables traditional CMSs enhancing their systems with semantic capabilities without a major change in the existing system. The adaptation can be examined in 4 layers defined by an SCMS which are Presentation & Interaction, Semantic Lifting, Knowledge Representation and Reasoning and Persistence.In a traditional CMS, the user is able to edit and consumecontent through a user interface. When dealing with knowledge in SemanticCMS (SCMS) we need an additional layer at the user interface level that allows a user to interact with content, calledSemantic User Interaction. For example, a user writes an article and the SCMS recognizes the name of a person in that article. An SCMS includes a reference to an objectrepresenting that person – not only the person’s name. The user can interact with the person object and see, e.g. its birthday.In Semantic Lifting layer, SCMS provides algorithms for semantic metadata extraction from the stored content which is a missing capability of traditional content management systems.After lifting content to a semantic level thisextracted information may be used as inputs for reasoning techniques in the Reasoninglayer. To handle knowledge within the system we use Knowledge (representation) Models thatdeﬁne the semantic metadata used to express knowledge. These metadata are often deﬁnedalong some ontology that speciﬁes so-called concepts and their semantic relationsIn thePersistence layer, as triple stores are used to store knowledge that is represented by triples (subject, predicate, object) indicating a relation between subject and object. To be able to give a semantic meaning to a triple, there should be Knowledge Models on top of knowledge repository to specify the semantic meaning of a certain predicate.
After defining the high level requirements (HLR) each HLR is refined using the following refinement process. The process starts with the HLR, produces use cases (UC), and results in lists of testable software requirements (R) for the system to be developed. The figure in the slide depicts the refinement graph as an directed acyclic graph (DAG) that emerges from this process.
The requirements refinement process iterates over all HLRs. For each HLR two refinement steps are performed. The process is depicted in the next figure.The first refinement is to specify scenarios and to extract and consolidate use cases from these scenarios. The result is a set of scenarios and use cases for each HLR. The use case consolidation is important to identify relationships between use cases and to keep them consistent among each other.The second refinement step is to extract and consolidate the resulting requirements. The software requirements result from the use cases, so that each use case relates to one or more software requirements. A key characteristic of these requirements is their testability. For this the requirements are formulated as simple statements like "The system shall be able to...". This formulation is key word based according to [RFC2119] (see section 4).The refinement process is implemented as an open participation process that supports constant input from the involved target groups. The process coordination and consolidation of the input was done by the research partners, who also made proposals for the requirements based on the input of the industrial partners. To achieve this in a distributed setup of partners, the documented results were published online at any time with the opportunity forthe partners to add comments and make further suggestions.
In order to be able to support semantic services on top of the CMS, there needs to be support for common vocabularies, which will constitute a commonunderstanding for users by relating a content item with clear and precise vocabulary items. These vocabularies can be external ontologies, taxonomies, thesauri, and they can provide horizontal or domain knowledge.Therefore, services for engineering of such vocabularies within the system are a key requirement. These vocabularies will be utilized in the system services for providing semantic capabilities.
The figure on this slide shows the refined use cases from a high level requirement. This is the first step in the refinement process.
In the second step of refinement process, detailed, different kinds of requirements are extracted from the use cases and scenarios that are consolidated in the first step of refinement process.
To allow easy integration of system functionalities into different heterogeneous system environments all provided functions should be accessible through RESTful service interfaces. So the architecture should be based on a service approach. The implementation should be as technology independent as possible on the one hand and on the other hand provide technology specific access to the services to guarantee best performance results.
The mantra behind the idea of providing each functionality through RESTful services is that everything (data, functions, etc.) inside the system stack can be accessed by an URI.The system services need access to information that are inside the data repository of the CMS. Therefore, the system defines data access interfaces that must be supported by the CMS that integrates the system. The communication is based on standardized text-based data formats, e.g. XML.
The system to be developed should provide services to enable semantic tagging on the content items with the semantic technologies such as ontological classes, RDF properties, microformats etc... The system attaches importance to providing horizontal services to extract semantics from structured and unstructured data automatically or semi-automatically, make suggestions about the annotations and to navigate on the content items in a semantic fashion etc...
One of the key outcomes of semantic enhancements on CMSs can be observed through the semantic query and search functionalities of the system.Faceted search mechanisms on top of semantic query language support form the key requirements of this perspective. Having semantic information aboutcontent should be used to improve the search capabilities. With semantic data the system should extend the traditional search functionality to allow newways of formulating search criteria and to provide "better" search results.
Extracting implicit set of data from the explicit information residing in the content repositories is a key requirement for horizontal services of the system to be developed. Reasoning on content managed by CMSs may reveal implicit relations, similarities between different content items that can be interested by the users. Furthermore, reasoning can be used in processes like consistency checking, auto categorization, etc.
Besides tagging in combination with ontological means, content entities can be (statically) linked. This process can be automated by algorithms that reasonon the provided tags and ontologies. Content items are linked among each other during their lifecycles by the help of relevant services inside a CMS. These links/relations are needed to be handled by the semantic services of the system. Along with the semantic annotations of the content items, semantic relations among them should also be considered.As linking is already a standard technique in CMSs the system to be developed should therefore focus on automatic link creation by playing on semantic algorithms and data.
Most CMS system have their own workflow management system to control the flow and lifecycle of content. The system should offer services that can beused to implement/extend a workflow management as part of the CMS. Additionally the system should provide workflows for semantic actions similar toworkflows for content. By this the user can describe a workflow which defines the semantic reasoning algorithms and semantic extraction algorithms thatwill be applied on a new content entity.
Like traditional CMS provide the functionality for content versioning and audit, the system must provide this concept for semantic information. All services provided by the system should log their actions in a way that they are comprehensible for a user (transparency) and the service should provide the possibility to undo an action.The system should also be aware of changing content and provide solutions to invalidate semantic data, e.g. a prior extracted semantic informationmight become invalid as the content changes. The problem of content evolution will become to a problem of semantic data evolution.The mentioned functionalities are not specific to an application domain of a CMS. Therefore, these services should be provided horizontally.
The semantic services to be provided by the system should be aware of content in different languages and provide functions to reason about information even if they are in different languages. Furthermore, the services provided by the system needs to support multilingualism for enabling a variety of users in different nationality to use the system. Multilingualism is an requirement of the horizontal services of the system as language support independent of the CMS application domain unless the CMS is not designed for a specific language.
In CMS the content access can be configured using more or less fine grade access controls. When using semantic algorithms the system must considerthese existing access control restrictions. Additionally the service may consider new kinds of restrictions which reflect the semantic data access, e.g. for algorithms that reason on existing data. The system needs a concept how to integrate permission, role and group models that normally exists as part of a CMS.
Requirements EngineeringSemantic CMS Community for Semantic CMS Lecturer Organization Date of presentation Co-funded by the 1 Copyright IKS Consortium European Union
Page: Part I: Foundations(1) Introduction of Content Foundations of Semantic (2) Management Web Technologies Part II: Semantic Content Part III: Methodologies Management Knowledge Interaction Requirements Engineering(3) (7) and Presentation for Semantic CMS(4) Knowledge Representation and Reasoning (8) Designing Semantic CMS Semantifying(5) Semantic Lifting (9) your CMS Storing and Accessing Designing Interactive(6) Semantic Data (10) Ubiquitous ISwww.iks-project.eu Copyright IKS Consortium
Page: 3 What is this Lecture about? We have seen ... Part III: Methodologies ... existing technologies of the Semantic Web Requirements Engineering (7) for Semantic CMS ... how these technologies can be used for semantic content Designing management (8) Semantic CMS What is missing? (9) Semantifying your CMS Methodologies for the development of semantic CMS Designing Interactive (10) Ubiquitous IS First, requirements for semantic CMS need to be specified www.iks-project.eu Copyright IKS Consortium
Page: 4 Outline What the course is about? Methodology Understand industry needs/expectations Analysis of Traditional CMSs Identify business scenarios Identification of High Level Requirements (HLRs) High Level Requirements Use cases Resulting requirements Summary www.iks-project.eu Copyright IKS Consortium
Page: 5 What the course is about? This course aims to Give the details of the domain-independent requirement elicitation process of semantic enhancement of any Content Management System www.iks-project.eu Copyright IKS Consortium
Page: 6 Methodology Bilateral meetings with CMS vendors Workshops Interviews Brainstorming sessions Gathered requirements Categorization under major topics High Level Requirements Use cases Validate the resulting use cases against the requirements of different CMS vendors www.iks-project.eu Copyright IKS Consortium
Page: 7 Results Requirements Engineering Process Refine HLRs into specific software requirements using scenario and use case descriptors Actors model All requirements are based on use cases which use a common actor’s model for CMS. Integration of semantic services to existing CMSs Easy to use and technology independent mechanisms RESTful HTTP services All features are expressed in terms of services Applicable to and can be accessed by “any” CMS Mash-up to create new high-order services www.iks-project.eu Copyright IKS Consortium
Page: 8 Analysis of Traditional CMSs GOAL: Identify common parts that all CMSs have INPUT: Product descriptions Expectations from industry Product web-sites Running CMS itself www.iks-project.eu Copyright IKS Consortium
Page: 9Analysis of Traditional CMSs Analysis of Content Types Content Workflow Content Services Architectural Styles www.iks-project.eu Copyright IKS Consortium
Page: 19 Merge All Inputs Workshops Brainstorming sessions Collected list of statements from CMS vendors Representing their view on a semantic CMS e.g. legacy data, how to semantify them? e.g. tagging, different for each person, rules for personalized tagging Examination of existing systems Focus on industrial needs rather than theoretical thinking Merge all input and come up with High Level Requirements www.iks-project.eu Copyright IKS Consortium
Page: 20 High Level Requirements HLR-1: Common Vocabulary HLR-2: Architecture and integration HLR-3: Semantic lifting & tagging HLR-4: Semantic search & semantic query HLR-5: Reasoning on content items HLR-6: Links/relations among content items HLR-7: Workflows HLR-8: Change management, versions and audit HLR-9: Multilingualism HLR-10: Security www.iks-project.eu Copyright IKS Consortium
Page: 21 The refinement process Startwith HLRs and ends with testable software requirements www.iks-project.eu Copyright IKS Consortium
Page: 22The refinement process www.iks-project.eu Copyright IKS Consortium
Page: 23 HLR 1 Common Vocabulary For a common understanding for users Relating a content item with clear and precise vocabulary items Services and engineering of External ontologies, taxonomies, thesauri4 scenarios upon the collected information e.g. statements from CMS vendors “Agree on a set of categories and relations, attributes as the default set” http://lsdis.cs.uga.edu “Help in finding good vocabularies” www.iks-project.eu Copyright IKS Consortium
Page: 25 HLR 1 Common Vocabulary Resulting Requirements Functional requirements The Vocabulary shall be navigable … Data requirements Vocabulary shall be in one of standard format which … Integration requirements Vocabulary shall be in an accepted standard format … Interface requirements: an interface shall be implemented for Presenting list of Vocabularies … Non functional requirements Vocabularies shall always be accessible … www.iks-project.eu Copyright IKS Consortium
Page: 26 HLR 2 Architecture and integration Easy integration of services to be developed into different heterogeneous system environments RESTful service interfaces The implementation should be as technology independent as possible Should also provide technology specific http://xml.com access to the services for best performance results www.iks-project.eu Copyright IKS Consortium
Page: 27 HLR 2 Architecture and integration Everything should be accessed by an URI Linked Data approach The communication should be based on standardized text-based data formats e.g. XML http://viralpatel.net/ www.iks-project.eu Copyright IKS Consortium
Page: 28 HLR 3 Semantic lifting & tagging Semantic tagging on content items Ontological classes RDF properties Microformats http://microformats.org/ Extract semantics from structures and unstructured data automatically or semi-automatically Make suggestions about annotations Navigate on the content items in a semantic fashion www.iks-project.eu Copyright IKS Consortium
Page: 29 HLR 4 Semantic search & semantic query Faceted search mechanisms in top of semantic query language support Statements from the industry Similarity search, similarity detection User friendly RDF query Support for disambiguation of search www.iks-project.eu Copyright IKS Consortium
Page: 30 HLR 5 Reasoning on content items Extracting implicit information from the explicit information residing in the content repositories “Semantic consistency check in CMSs” http://www.kent.ac.uk/ www.iks-project.eu Copyright IKS Consortium
Page: 31 HLR 6 Links/relations among content items Along with the semantic annotations of the content items, semantic relations among them should also be considered “Instance linking, linked data cloud, whenever we create something link it with something existing” http://ctmlogistics.co.uk/ www.iks-project.eu Copyright IKS Consortium
Page: 32 HLR 7 Workflows Control flow/lifecycle of the content Workflows for semantic actions similar to workflows for content “Intelligent content workflows, configured based on organization, hierarchy” http://coredotnet.blogspot.com www.iks-project.eu Copyright IKS Consortium
Page: 33 HLR 8 Change management, versions and audit The system should also be aware of changing content and provide solutions to invalidate semantic data Prior extracted semantic information might become invalid as the content changes Content evolution asdhttp://visiongss.com Semantic data evolution www.iks-project.eu Copyright IKS Consortium
Page: 34 HLR 9 Multilingualism Services to be provided should be aware of content in different languages Enabling a variety of users in different nationality Language support independent of the CMS application domain http://ec.europa.eu/ www.iks-project.eu Copyright IKS Consortium
Page: 35 HLR 10 Security The system must consider existing access control restrictions in CMSs New kinds of restrictions which reflect the semantic data access e.g. for algorithms that reason on existing data http://www.oplin.org Integrationof permission, role and group models www.iks-project.eu Copyright IKS Consortium
Page: 36 Summary Therequirements evolved from a systematic requirements engineering approach Started with the analysis of current CMS systems and their similarities Collection of needs of CMS vendors in the field of semantic enhancements of their systems Workshops Brainstorming sessions Interviews From the High Level Requirements (HLRs) Necessary Actors are defined Scenarios are constructed www.iks-project.eu Copyright IKS Consortium
Page: 37 Summary From the scenarios for each HLR Use cases are extracted From the use cases resulting requirements are refined into the following types of requirements Functional Data Integration Interface Non functional www.iks-project.eu Copyright IKS Consortium
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.