The figure is originally adapted from: http://intentionaldesign.ca/www/pmh3472/public_html/wp-content/uploads/2010/04/Content-Lifecycle-Management1.pngIt shows the content lifecycle in content management system. It produces results current and future status of the content in the content management system e.g whether it is controlled, it will be translated or deleted, etc. The cycle starts with the analysis phase. In this phase the strategy for the lifecycle of content is determined. How it will be produced, controlled, translated, etc. In collect phase, actual content is obtained, modified, versionized and if available metadata is created. In the manage phase, it is modeled and structured considering the standard approaches and stored. In the publishing phase, it’s transformed if needed and published.
CMIS is an open standard that uses web protocols to provide an generic abstraction layer on top of the content management systems. The web protocols specified by CMIS are Web Services (SOAP) and AtomPub.This specification is harbored in the OASIS consortium which offers lots of other standards related with the information society. CMIS specifications mainly contains service descriptions for storage and retrieval of content objects to/from underlying persistent store. The specification also introduces type and property definitions for the content objects. It includes services for version management and access control mechanisms as well.
CMISv1.0 specification offers 4 base objects (corresponding with their object types explained in the next slide). All content objects of a content repository that is compliant with the CMIS specification should have a type. The type definition of a content object determines the properties that it can have.Objects that holds the actual data are documents. They are elementary entities that are managed by a CMIS repository. Folder objects keep the file-able objects e.g documents and folders. Relationship objects represents a directional relationship between two objects. Policy objects specify administrative policies that can be applied to objects.
CMISv1.0 specifies 4 base object types namely cmis:document, cmis:folder, cmis:relationship, cmis:policy associated with the 4 base object explained in the previous slide. All these object types include a list of predefined properties that instances of these types can have. Any new object type should extend only one of these 4 object types. And any content object in the CMIS repository should have an object type.All CMIS objects are strongly typed means that a content object cannot have a property that is not defined its object type. Properties are not hierarchical, so an object type defines only a list of properties. For example, you can see the some of the properties that are defined in the cmis:document object type:cmis:namecmis:objectIdcmis:createdBycmis:creationDate…Whole specification can be found in: http://docs.oasis-open.org/cmis/CMIS/v1.0/cs01/cmis-spec-v1.0.html
Figure obtained from: Getting Started with CMISExamples using Content ManagementInteroperability Services, Abdera, & ChemistryJeff PottsNovember, 2009
The figure is adapted from http://dev.day.com/content/ddc/blog/2009/05/jcrcmiscomparison.htmlJCR is a content repository modelJava language API bindingsCMIS is document management modelWebService & AtomPub protocol bindingsComplementary of JCR/CMIS complementarity based on the above statements is similar to complementarity of Servlet API in java and the HTTP protocol .
Repository models specified by JCR and CMIS specifications are similar from a high level perspective. They both have hierarchical structure, specify elementary content items i.e objects and nodes. Elementary object items are restricted based on their type definitions. However, CMIS offers a more specialized model where objects may be a folder, document, relationship, policy object or an instance of a object type derived from one of the 4 default base types whereas JCR does not state such an obligation. Its API allows node type creation from scratch.
In the figure the elements that are in boxes having straight line borders represent the common hierarchy of JCR and CMIS content repositories. However, content repositories do not differentiate among actual data and metadata. So, to specify which objects keep metadata, which objects keep actual data, the Object element is extended with the new items: Content Object and Classification Object.For example, the content repository structure slide-37 contains only classification objects that are used to classify other documents. In other words, they do not hold actual data. It is possible that we would have document related with liver cancer that is classified by the Cancer node.
Legacy content management systems are mostly built on top of the LAMP stack and they do not implement new advancements in semantic technologies. All CMSs somehow exhibit three main categories for content management, namely content modeling, content creation and search on content items. Applying semantic functionalities (there are several possible procedures on domain ontologies) to these categories introduces several new improvements to CMS capabilities.Content Modeling: Considering the content modeling phase in a CMS; ontology browsing, ontology generation and ontology alignment procedures lead to ontology-guided modeling and automatic lifting to knowledge base. Ontology guided modeling enables content types and content metadata which are inline with ontological concepts so that inference and reasoning engines can process these models. A domain ontology helps the user while modeling the content metadata by restricting and suggesting according to the implicit and explicit knowledge that it exhibits. These functionalities are provided through ontology browsing and alignment procedures which might be provided through strong semantic features. On the other hand, automatic ontology generation can be done through CMS’s already existing data models. This may lead to creation of an ontology which reflects the as-is data models of the legacy CMS. In addition, alignment of the extracted ontology/ontologies with a domain ontology enables much more concrete ontological concepts which merges CMS’s as-is knowledge with a domain ontology knowledge in a semantic way.Content Creation/Editing: In content creation/editing phase, a CMS user makes use of the data models which are generated in the previous step (content/metadata modeling). Various functionalities provided on a domain ontology lead to semantic features such as auto-categorization of content items, suggestion for annotation, consistency checking etc… For example, while a CMS user is creating a text based content item, semantic functions might analyze the text and suggest the most appropriate concepts from the domain ontology for annotation. Or, the content item might be categorized under the most fitting concept automatically. If the user has defined semantic rules to be applied on the content items, automatic inference and semantic consistency checking mechanisms might be available.Search: Most of the existing CMSs cannot go beyond ordinary keyword-based search. However, understanding the meaning of the keywords that users enter and comparing them with the meanings of the words inside the content items require high-level semantic capabilities. Meanings of the words can change based on the domain ontology and even based on the user context. For example, searching with the keyword “Jersey”; one user might be interested in the island in the English channel or New Jersey in U.S or Jersey (JAX-RS) from the web programming domain. Capabilities of structural search can be aligned with the test-based search to come up with hybrid methods which benefits from all pros of different approaches.
Most of the time, content repository structures host implicit semantics. For example, content is organized in a hierarchical manner according to different categories in different levels, some standard taxonomies are used to annotate content items or content items contain semantic information within their properties. However, this implicit semantics can not be parsed by machines. Therefore, there is a need for a methodology that is compatible with existing content repository models. After extracting the semantics of content repository more intelligent operations e.g automatic annotating, automatic classifying, reasoning, etc can be done.
The methodology to extract the semantics of content management systems should not interfere in the CMS itself. CMS developers should be able to use the provided services.
Considering the domain model of content repositories, it can be seen that the structure can be mapped to OWL model as follows:The first step can be representing the type definitions as classes. Through the same logic content objects are turned into individuals as instance of associated ontology class. Properties of content repository objects can also be represented as datatype or object properties i.e relationships between content items are represented with object properties and literal valued properties can be transformed into datatype properties. If necessary, restrictions can also be defined e.g in class hierarchies.
This table shows the content repository model to OWL model mapping.
For mapping any content repository resources that have semantic information, there is a need for customizable bridges from content repository to ontology resources.
<ConceptBridge> <Query>/NewsSubjectCodes/%</Query> <PropertyBridge><PredicateName>equiClass</PredicateName> <PropertyAnnotation> <Annotation>equivalentClass</Annotation> </PropertyAnnotation> </PropertyBridge> </ConceptBridge>According tothisConcept Bridge, all content repository items under NewsSubjectCodeswill be transformed into OWL classes in the ontology by preserving the hierarchical structure. Furthermore, the target values referred from the processed content repository item through the equiClassproperty will be created as equivalent classes the processed content repository item.
<InstanceBridge> <Query>/NewsArticles/%</Query> <PropertyBridge> <PredicateName>relatedItem</PredicateName> <PropertyAnnotation>symmetric</PropertyAnnotation> </PropertyBridge> </InstanceBridge>According to this instance bridge, content repository items under the NewsArticlespath are transformed into ontology individuals. Furthermore, relatedItemproperties of the content repository objects will be added as assertions to corresponding individual. Furthermore, annotations can be used to specify detailed semantics of properties. For instance, when there exists an assertion like: contentItem1 -> relatedItem -> contentItem2, symmetric annotation indicates that contentItem2 -> relatedItem -> contentItem1 is also true.These annotations can be augmented with other OWL model property annotations such as transitive, functions, inverse, etc…
This an example structure of content repository that will be used to in the examples in the rest of this slide set. In the left hand side, it contains the type hierarchy that is used annotate actual content items in the repository. For instance Article2 is classified by HealthTreatmentcategory.
Lecture semantifying yourcms_presentation
Semantifying Your CMSSemantic CMS Community Lecturer Organization Date of presentation Co-funded by the 1 Copyright IKS Consortium European Union
Page: Part I: Foundations(1) Introduction of Content Foundations of Semantic (2) Management Web Technologies Part II: Semantic Content Part III: Methodologies Management Knowledge Interaction Requirements Engineering(3) (7) and Presentation for Semantic CMS(4) Knowledge Representation and Reasoning (8) Designing Semantic CMS Semantifying(5) Semantic Lifting (9) your CMS Storing and Accessing Designing Interactive(6) Semantic Data (10) Ubiquitous ISwww.iks-project.eu Copyright IKS Consortium
Page: 3 What is this Lecture about? We have introduced ... Part III: Methodologies ... an RE approach for semantic CMS. Requirements Engineering (7) for Semantic CMS ... a component-based reference architecture for the design of Designing semantic CMS. (8) Semantic CMS What„s next? (9) Semantifying your CMS A systematic method that can be used by developers to extend (10) Designing Interactive „traditional“ CMS with semantic Ubiquitous IS capabilities. www.iks-project.eu Copyright IKS Consortium
Page: Content Management Systems Contentmanagement systems (CMS) are designed to support a content management cycle analyze content creation and collection of content the publication of content for access by users and/or other systems the management of these content www.iks-project.eu
Page: Standardized API Each CMS provides an API to interact with the repository which can be used within content-oriented applications Toprevent each CMS vendor providing their own proprietary API, two main specifications are being used in the community JCR:Content Repository API for Java CMIS: Content Management Interoperability Services www.iks-project.eu
Page: What Is JCR? Abbreviation of Content Repository API for Java (JCR) It is a specification for a Java platform API for accessing content repositories in a uniform manner. JSRs: Java Specification Requests JSR 283: Content Repository for JavaTM Technology API Version 2.0 www.iks-project.eu
Page: What Is JCR? Provides a functional view and a common vocabulary over the content repository One does not need to learn dozens of proprietary APIs Encourages code portability Prevents content lock in isolated silos by providing a standardized repository model and access www.iks-project.eu
Page:Repository Model In JCR www.iks-project.eu
Page: Repository Model In JCR Each node has a node type definition Each node type can have Property definitions specifying the properties that can be used by instance of the node type Child definitions specifying the node types of child nodes that instances of current node type can have www.iks-project.eu
Page: What Is CMIS? Abbreviation of Content Management Interoperability Services Defines a domain model and bindings that are designed to be layered on top of existing Content Management systems and their existing programmatic interfaces. www.iks-project.eu
Page: What Is CMIS? Standard repository model and binding interface allows: reduction of the work for integration of multi-vendor, multi- repository content management environments sweeping away the need for maintaining proprietary code developing independent business units without infrastructure considerations www.iks-project.eu
Page: Repository Model In CMIS Theentities managed by CMIS are modeled as typed Objects CMIS comes with four types of base objects Document object Folder object Relationship object Policy object Every CMIS object has a set of properties www.iks-project.eu
Page: Repository Model In CMIS All CMIS objects are strongly typed Object-Type defines a fixed and non-hierarchical set of properties that all objects of that type have CMIS has four base object types corresponding to four base objects: cmis:document cmis:folder cmis:relationship cmis:policy Object types have their specific set of property definitions as in JCR specification. www.iks-project.eu
Page:Repository Model In CMIS www.iks-project.eu
Page: Comparison of JCR and CMIS Both provides High level domain model to represent the content in the repository Get rid of proprietary API of each content repository www.iks-project.eu
Page:Comparison of JCR and CMIS www.iks-project.eu
Page: Comparison of JCR and CMIS Both JCR and CMIS define a hierarchical repository model. JCR calls the building blocks as Nodes CMIS calls the building blocks as Objects Both JCR and CMIS specifies type definitions Restrict properties Restrict hierarchical structure Content items of JCR and CMIS both have properties according that are defined their type definitions www.iks-project.eu
Page: Metadata Management In CMS Organizing the content as hierarchies Through properties/parameters of nodes/objects/documents Free format values, or selected from a constrained vocabulary ( which can be a taxonomy) Can be used as content categories By representing relationships between nodes/objects/documents Taxonomies can be represented as tags hierarchies (as a hierarchy of nodes..) www.iks-project.eu
Page: Generic Repository Model Consideringthe JCR and CMIS repository models to semantify a CMS, we need a generic repository model The generic repository model should allow to represent CMS objects from both specifications www.iks-project.eu
Page:Generic Repository Model www.iks-project.eu
Page: Generic Repository Model In the generic repository model Object entity corresponds to JCR node and CMIS object Object type entity corresponds to JCR node types and CMIS object types Property and property definition notions are also represented in the generic repository model. www.iks-project.eu
Page: Generic Repository Model ClassificationObject and Content Object notions are introduced on top of the representation which covers JCR and CMIS model They differentiate data and metadata Content objects are used to represent repository items that contain actual data. Classification Objects represent hierarchical taxonomies of CMSs which are used to classify “content objects” www.iks-project.eu
Page: Strength of Semantic Technologies An ontology consists of following artifacts: A vocabulary to describe a domain A specification for intended meaning of vocabulary including the how concept classification is done Constraints providing additional knowledge about the domain Thus, an ontology represents a formal and machine manipulable model of a domain www.iks-project.eu
Page: Strength of Semantic Technologies A machinemanipulable model of a domain enables reasoning on it Reasoning provides Recognising semantic similarity in spite of syntactic differences Recognising implicit consequences given explicitly stated facts www.iks-project.eu
Page: Enhancing CMS With Semantic TechnologiesProvidedfunctionalitieson domainontologyBenefitsto CMSs www.iks-project.eu
Page: Extracting Semantics From CMSs as Ontologies ContentRepositories already provide certain amount of semantics for content items Through content hierarchies, properties, taxonomies, node/object types However this semantics is not “machine understandable”; can not be reasoned on www.iks-project.eu
Page: Need For A Methodology is a need for an “Integrated semantic engineering There method” Enabling CMS developers to easily utilize semantic functionalities provided by ontologies, reasoners, without a major change in their systems www.iks-project.eu
Page: Extracting Semantics From CMSs as Ontologies Nodetypes/Object types/Document Types can be automatically converted in to OWL classes Properties as object and datatype properties Restrictions when necessary Nodes of these nodetypes can be created as instances… www.iks-project.eu
Page:Extracting Semantics From CMSsas Ontologies www.iks-project.eu
Page: What About Resources Having Semantic Worth? How should other resources be treated? Links between content items Taxonomies Content hierarchies Thereshould be configurable bridges from CMS to ontology www.iks-project.eu
Page: Bridges Should provide Extracting certain CMS objects as ontology classes Extracting certain CMS objects as ontology individuals Extracting hierarchical structure through certain properties between CMS objects Extracting certain properties of CMS objects indicating a semantic value Treating differently to extracted properties according to their annotations www.iks-project.eu
Page: Concept Bridge Takes a query specifying the target CMS objects Transforms the target objects to ontology classes together with the possible hierarchical relations Is able to include Subsumption Bridges to enable hierarchy through certain properties Is able to include Property Bridges to enable extract certain properties of target objects and set appropriate annotations in the ontology www.iks-project.eu
Page: Subsumption Bridge Takes a query specifying the target CMS objects Takes a predicate name Forms subclass/superclass relations between the target CMS objects through the specified predicate www.iks-project.eu
Page: Instance Bridge Takes a query to select target CMS objects Transforms selected CMS object into ontology individuals As Concept Bridge, it is able to include Property Bridges to treat differently based on annotations of properties of CMS objects www.iks-project.eu
Page: Property Bridge Provides selectively lift some of the CMS objects properties in the ontological representation This enables lifting properties having semantic value only It can be included in and Concept Bridge or an Instance Bridge www.iks-project.eu
Page: Backend Knowledge Base For CMSs As a result of semantic lifting mechanism we have the ontological representation of the content repository semantics The ontological representation should be kept in a backend knowledge base and kept synchronized with the changes in the repository A reasoner should be used collaboratively with the knowledge base to recognize implicit facts from the explicit ones in the ontology www.iks-project.eu
Page: Backend Knowledge Base For CMSs Existing triple stores Providing built-in reasoner like Jena, Sesame While Sesame supports only RDFS reasoning, Jena provides RDFS, OWL and Rule Based reasoner It is also possible to integrate external reasoner with triple stores Considering the pros and cons of different triple stores, a generic interface to communicate with triple stores host knowledge-base on different triple stores through the generic interface the semantic lifting mechanism can feed and query ontologies hosted. www.iks-project.eu
Page: Using the Extracted Semantics in Content Discovery After extracting semantics of a CMS into an ontology, the ontology can be used to provide semantic functionalities on top of it. Semantic search It can be further enhanced by aligning/merging external domain ontologies www.iks-project.eu
Page:Initial CMS Structure Workspace NewsSubjectCodes NewsArticles Disaster/ Accident Health classifiedBy Education Article1 Economy HealthTreatment Business Finance Disease Illness classifiedBy Article2 ViralDiseases Obesity Article3Eating Disorder Cancer Neurological classifiedBy Disease SwineFlu Content Management System Structure www.iks-project.eu
Page: Ontological Representation Of CMS Represent the CMS structure in the previous slide ontologically Represent the “news subject codes” branch as an ontology class hierarchy Represent the “news articles” branch as a set of ontology individuals www.iks-project.eu
Page: Ontological Representation Of CMS -NewsSubjectCodes -ArtsCultureEntertainment -DisasterAccident -EconomyBusinessFinance -Education Article1 -EnvironmentalIssues -Health instanceOfRepresentation ofNew Subject Codes as -Diseasehierarchical ontology -ViralDiseaseclasses instanceOf Article3 Representation of -SwineFlu new articles as -Cancer ontology individuals -......... -HealthTreatment instanceOf Article2 -Illness -Medicine -SocialIssues www.iks-project.eu Individual types are set with corresponding ontology
Page:Make a SearchFind me articles categorized by “Health” … The answer contains: Article1, Article2 and Article3 dueto subsumption relation between the ontology classes. www.iks-project.eu
Page: Make a Rule Based SearchRule: If a Disease isCausedBy PathogenicAgent Then it is an InfectiousDisease.Facts: Virus is a PathogenicAgent. Fungi is a PathogenicAgent. ViralDisease isCausedBy Virus.Find me InfectiousDisease articles…The answer is: Article 3 www.iks-project.eu
Page: Go Back To Example To represent “news subject codes” as a class hierarchy in the ontological representation, we need a Concept Bridge. Having a query which targets the CMS objects under “/Workspace/NewsSubjectCodes” www.iks-project.eu
Page: Go Back To Example Torepresent “news articles” as individuals in the ontological representation, we need an Instance Bridge Having a query which targets the CMS objects under “/Workspace/NewsArticles” Having an inner Property Bridge which has “classifiedBy” as predicate name This will provide setting types of the individuals with the ontology class corresponding to value of “classifiedBy” property www.iks-project.eu
Page:Aligning External OntologiesIt is possible to align external domain ontologies withthe ontology representing the structure of CMS to beable to use semantics in the external ontology www.iks-project.eu
Page: Go Over An Example Initially, assume that we have the following ontology representation of CMS -NewsSubjectCodes -ArtsCultureEntertainment MotorNeuroneDiseaseGeneClue -EnvironmentalIssues … Professor Christopher Shaw, from the Institute of Psychiatry at -Health Kings College London, said … RepresentatioRepresentation of -Disease instanceOf n of two of theNew Subject -NeurologicalDisease News ArticlesCodes as as individualshierarchical -HealthTreatmentontology classes GeneticCluesToEatingDisorders -Illness …Doctors studying the causes of -EatingDisorder the eating disorders anorexia and -Obesity instanceOf bulimia believe it has less to do with media images of slim-figured models and more to do with -Medicine biological and genetic factors… -SocialIssues www.iks-project.eu
Page:Align CMS Representation WithExternal Ontology ... ... -Education -Organisms -EnvironmentalIssues -Psychiatry -Health instanceOf -BehaviorMechanisms -Disease -HealthTreatment -BehaviorDisciplines -Illness instanceOf -MentalDisorders -EatingDisorder -AnxietyDisorders equivalentTo -Obesity -EatingDisorders instanceOf -SleepingDisorders -Medicine GeneticCluesToEatingDisorders -SocialIssues …Doctors studying the causes of the eating disorders anorexia and bulimia believe it has less to do with media images of slim-figured models and more to do with www.iks-project.eu biological and genetic factors…
Page: Make A Search Findme articles related with “psychiatry” Search results will not only include the article “MotorNeuroneDiseaseGeneClue” but also the article “GeneticCluesToEatingDisorders” The keyword “psychiatry” will be matched with the ontology class “Psychiatry”. Through reasoning, it will be inferred that the “GeneticCluesToEatingDisorders” is an indirect instance of “Psychiatry” class. www.iks-project.eu