SlideShare a Scribd company logo
1 of 45
Semantic
Semantic CMS Community       Lifting for
                             Traditional
                             Content
 Lecturer
                             Resources
 Organization

 Date of presentation



    Co-funded by the
                         1        Copyright IKS Consortium
    European Union
Page:


                           Part I: Foundations

(1)   Introduction of Content                   Foundations of Semantic
                                         (2)
          Management                             Web Technologies


 Part II: Semantic Content                    Part III: Methodologies
        Management

      Knowledge Interaction                    Requirements Engineering
(3)                                      (7)
        and Presentation                          for Semantic CMS


(4) Knowledge Representation
       and Reasoning
                                         (8)
                                                    Designing
                                                  Semantic CMS

                                                   Semantifying
(5)     Semantic Lifting                 (9)        your CMS

      Storing and Accessing                      Designing Interactive
(6)       Semantic Data
                                        (10)        Ubiquitous IS


www.iks-project.eu                                                Copyright IKS Consortium
Page: 3




 What is this Lecture about?
 We   have learned ...                       Part II: Semantic Content
    ... how to build ontologies                     Management
     representing complex                          Knowledge Interaction
                                             (3)
     knowledge domains.                              and Presentation
    ... a way to reason about
     knowledge.                              (4) Knowledge Representation
                                                    and Reasoning
 We   need a way ...
    ... to extract knowledge from           (5)     Semantic Lifting

     content in a automatic way 
     Semantic Lifting                              Storing and Accessing
                                             (6)       Semantic Data


     www.iks-project.eu                                    Copyright IKS Consortium
Page: 4




  Overview
 What is semantic lifting?
 Core concepts
 Scenarios
 Requirements
 Technologies
     Semantic Reengineering
     Semantic Enhancements of textual content




      www.iks-project.eu                         Copyright IKS Consortium
Page: 5




  What is “Semantic Lifting”?
 Semantic  Lifting refers to the process of associating
 content items with suitable semantic objects as
 metadata to turn “unstructured” content items into
 semantic knowledge resources

 Semantic  Lifting makes explicit “hidden” metadata in
 content items




    www.iks-project.eu                          Copyright IKS Consortium
Page: 6




  Semantic Lifting Targets
 Semantic          Reengineering of structured data
    Semantic Lifting harmonizes metadata representations
    Semantic Lifting reengineers data from an existing resource so
     that the data from the resource can be reused within in a
     semantic repository

 Semantic          Content Enhancement
    Semantic Lifting generates additional metadata and annotations
     by semantic analysis of content items
    Semantic Lifting classifies content objects by means of semantic
     annotations




     www.iks-project.eu                                Copyright IKS Consortium
Page: 7




  Structured Content
 Structured content provides implicit semantics through
 the structure definition
  Table definitions in relational databases, XML
   schemata, field definitions for adressbooks,
   calendars, etc.
 Application  programs are designed to „know“ how
  to interpret the structures and the data within.
 Semantic Lifting is used for Reengineering to
  support data exchange and seamless interoperability
  between different systems
    www.iks-project.eu                        Copyright IKS Consortium
Page: 8




  Unstructured Content
 Unstructured  content
   Images, texts, videos, music, web pages composed
    of various types of media items
   Meaningful only to humans not to machines
 Content must be described semantically by metadata
  to become meaningful to machines, e.g. what the text
  or image is about.
 Semantic Lifting is used as content enhancement




    www.iks-project.eu                     Copyright IKS Consortium
Page: 9




      Mixed Content
   No      dichotomy of structured and unstructured content
        Structured databases are used to store unstructured
         content types, such as texts, images etc.
        Documents can be composed of unstructured content
         items such as free text and images as well as more
         structured information, e.g. tables and charts

Free text
                                                                Structured
                                                                 content




         www.iks-project.eu                          Copyright IKS Consortium
Page: 10




    Metadata: Variants
   Metadata exist in many forms:
       Free text descriptions
       Descriptive content related keywords or tags from fixed vocabularies or
        in free form
       Taxonomic and classificatory labels
       Media specific metadata, such a mime-types, encoding, language, bit
        rate
       Media-type specific structured metadata schemes such as EXIF for
        photos, IPTC tags for images, ID3-tags for MP3, MPEG-7 for videos,
        etc.
       Content related structured knowledge markup, e.g. to specify what
        objects are shown in an image or mentioned in a text, what the actors
        are doing, etc.

        www.iks-project.eu                                      Copyright IKS Consortium
Page: 11




  Metadata: Variants
 Inline     metadata are part of content
     ID3 tags embedded in MP3 files
 Offline      metadata are kept separate from content




      www.iks-project.eu                         Copyright IKS Consortium
Page: 12




  Formal semantic metadata
 Data representation in a formalism with a formal
 semantic interpretation that defines the concept of
 (logical) entailment for reasoning:
    Soundness: conclusions are valid entailments
    Completeness: every valid entailment can be deduced
    Decidability: a procedure exists to determine whether a
     conclusion can be deduced
 Embodiments:
    Logics
    Knowledge Representation Systems, Description Logics
      Semantic           Web: RDF, OWL
     www.iks-project.eu                              Copyright IKS Consortium
Page: 13




  „Semantics“ in CMS
 CMSsystems provide various methods to include
 metadata
    Organize content in hierarchies
    Hierarchical taxonomies
    Attachment of properties to content items for metadata
    Content type definitions with inheritance


 These methods are used in CMS systems in ad-hoc
 fashion without clear semantics. Therefore no well-
 defined reasoning is possible.

     www.iks-project.eu                           Copyright IKS Consortium
Page: 14




    Semantic Lifting Usage
   Content Creation and Acquisition
       Authoring content
           Support content editors in providing metadata of specified types
       Uploading external content/documents
           automatic extraction and analysis, e.g. for indexing
       Importing content from external sources/documents
           Integration of external content into content repository
           Content needs to be transformed to match internal CMS structures and
            metadata schemes
       Crossreferencing/linking among CMS content items and external
        content
           Detect related or additional content
           Add pointers/links to related or additional content


        www.iks-project.eu                                              Copyright IKS Consortium
Page: 15




  Semantic Lifting Usage
 Access       to external documents and content repositories
    Semantic harmonization with CMS semantic structures
    Semantic interoperability in data exchange with other content
     repositories
 TheCMS needs to understand the data structures used
 by external services and programs
    E.g synchronization of a local calendar from Outlook with an
     external calendar based on iCalendar format
    E.g. Importing RDF from a Linked Data endpoint such as
     dbpedia
 TheCMS must present its data in a form understood by
 external target services or programs
     www.iks-project.eu                                Copyright IKS Consortium
Page: 16




  Semantic Lifting Usage
 Publishing          content with metadata
     Metadata need to be transformed into a form compatible
      with the publication format
       E.g. converting FreeDB metadata into ID3 tags for inclusion in
         an MP3 file




      www.iks-project.eu                                Copyright IKS Consortium
Page: 17




    Publishing Web Content with
    semantic metadata
   Augmenting web content with structured information becomes
    increasingly important
   Several methods have emerged in recent years to include
    structured metadata in Web pages
       Microformats
       RDFa
       Microdata (HTML5)
   Supported by the major search engines to improve search and
    result presentation, e.g. Google („Rich Snippets), Bing, Yahoo




        www.iks-project.eu                               Copyright IKS Consortium
Page: 18




    Augmenting Web Content
   The HTML code contains a review of a restaurant in plain text
    using only line breaks for structuring
   Without specialized information extraction analysis tools it cannot
    be interpreted, e.g. that it is a review (of what and when?), who the
    reviewer was, etc.


<div>
L’Amourita Pizza
Reviewed by Ulysses Grant on Jan 6.
Delicious, tasty pizza on Eastlake!
L'Amourita serves up traditional wood-fired Neapolitan-style pizza,
brought to your table promptly and without fuss. An ideal neighborhood
pizza joint.
Rating: 4.5
</div>

       www.iks-project.eu                                  Copyright IKS Consortium
Page: 19




    Microformats
   Same text but additional span elements with class attributes to
    encode the type of contained information (hReview) and the
    properties of that type
<div class="hreview">
   <span class="item">
       <span class="fn">L’Amourita Pizza</span>
   </span>
   Reviewed by <span class="reviewer">Ulysses Grant</span> on
   <span class="dtreviewed">
       Jan 6<span class="value-title" title="2009-01-06"></span>
   </span>.
   <span class="summary">Delicious, tasty pizza on Eastlake!</span>
   <span class="description">L'Amourita serves up traditional wood-fired
   Neapolitan-style pizza, brought to your table promptly and without fuss.
   An ideal neighborhood pizza joint.</span>
   Rating:
   <span class="rating">4.5</span>
</div>



       www.iks-project.eu                                          Copyright IKS Consortium
Page: 20




    RDFa
   Same text but additional attributes and span elements encoding a
    RDF structure:
     namespace declaration of the used ontology
     RDF class encoded by typeof attribute and its properties by a
      property attribute
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Review">
   <span property="v:itemreviewed">L’Amourita Pizza</span>
   Reviewed by
   <span property="v:reviewer">Ulysses Grant</span> on
   <span property="v:dtreviewed" content="2009-01-06">Jan 6</span>.
   <span property="v:summary">Delicious, tasty pizza on Eastlake!</span>
   <span property="v:description">L'Amourita serves up traditional wood-fired
   Neapolitan-style pizza, brought to your table promptly and without fuss.
   An ideal neighborhood pizza joint.</span>
   Rating:
   <span property="v:rating">4.5</span>
</div>


       www.iks-project.eu                                            Copyright IKS Consortium
Page: 21




    Microdata (HTML5)
   Same text but additional attributes and span elements:
     A class declaration as value of an itemtype attribute and its
      properties as values of an itemprop attribute

<div>
  <div itemscope itemtype="http://data-vocabulary.org/Review">
    <span itemprop="itemreviewed">L’Amourita Pizza</span>
    Reviewed by <span itemprop="reviewer">Ulysses Grant</span> on
    <time itemprop="dtreviewed" datetime="2009-01-06">Jan 6</time>.
    <span itemprop="summary">Delicious, tasty pizza in Eastlake!</span>
    <span itemprop="description">L'Amourita serves up traditional wood-fired
Neapolitan-style pizza,
       brought to your table promptly and without fuss. An ideal neighborhood pizza
joint.</span>
    Rating: <span itemprop="rating">4.5</span>
  </div>
</div>




       www.iks-project.eu                                             Copyright IKS Consortium
Page: 22




 Lifting Requirements:
 Overview
Top-level              requirements
  Semantic Associations with Content
  Semantic Harmonization
  Semantic Linking
  Interactive Lifting
  Customizability
  Semantically Transparent Structured Content
   Sources

   www.iks-project.eu                       Copyright IKS Consortium
Page: 23




  Semantic Associations with
  Content
 Unstructured content and information must be
 supplied with structured semantic annotations and
 metadata.
    Support for various content/media types
    Information extraction from text, topic classification, image
     tagging, …
    Support for creation of semantic annotations in content
     authoring




     www.iks-project.eu                              Copyright IKS Consortium
Page: 24




  Semantic Harmonization
 Metadataand annotations must be harmonized with
 requirements for semantic processing in the CMS
    Reengineering methods, interpreters and wrappers for all
     types and formats of metadata and annotations, e.g. tags,
     microformats, XML Metadata ( MPEG-7, …), ID3 tags,
     EXIF data, …
    Ensure semantic interoperability of data and annotation
     schemes within the CMS and across external resources
    Ontology mapping and harmonization of annotations
      Externalmetadata
      Metadata generated by semantic analysis


     www.iks-project.eu                           Copyright IKS Consortium
Page: Slide 25




  Semantic Linking
 Liftingmust enable the interlinking of content
  objects by semantic relationships.
     Internal linking of content items within the CMS
     links to external resources, e.g. Linked Open Data
     Establish semantic relatedness of content for different
      views as well as different search, navigation and browsing
      strategies, …
       Directsemantic links among content items and metadata
       Similarity relations over sets of content items
       Clustering of content items



      www.iks-project.eu                             Copyright IKS Consortium
Page: Slide 26




  Interactive Lifting
 Lifting      must interact with CMS users.
     Suggest semantic annotations during content creation
       Support for various publishing formats such as microformats,
         RDFa, etc.
     Automatic annotations (autotagging) with optional
      correction option
     Learning capabilities and adaptability of automatic
      annotation components from user feedback




      www.iks-project.eu                                Copyright IKS Consortium
Page: 27




  Customizability
 Liftingcomponents must be customizable by CMS
  users/customers.
     Users must not be restricted to predefined vocabularies,
      ontologies, …
     Domain ontologies, terminologies, tag sets are defined by
      CMS users/customers.
     Browsers and editors for component resources are
      necessary.




      www.iks-project.eu                           Copyright IKS Consortium
Page: 28




  Transparent Structured
  Content Sources
 Structured
           content sources need to be reengineered to
 semantic resources
     Support uniform data access to structured content
      repositories, e.g. SPARQL end points based on D2RQ
      technologies for transparent access to RDF and non-RDF
      databases
     Extraction of ontologies from database structures,
      schemata, XML, resources, …
     Alignment and mapping of the descriptions



      www.iks-project.eu                        Copyright IKS Consortium
Page: 29




    Semantic Reengineering of
    structured data sources
   Focus on tools for reengineering structured data sources to RDF
    representations
   Many tools and platforms for
     D2R Servers: Exhibit relational DBs as RDF
     Talis platform: Linked Open Data
     Triplify: like D2R but in PHP
     Virtuoso middleware
     Krextor/OntoCape: generating RDF from XML
     Various Transformers for inducing RDF ontologies and instance
      data from XSD and XML
 More   details in presentation on Knowledge
    Representation (KReS)
      www.iks-project.eu                               Copyright IKS Consortium
Page: 30




  Semantic Content
  Enhancements: Overview
 Focus here is on textual content
 Metadata Extraction from existing content in various
  formats to make embedded metadata explicit
 Information Extraction from textual content:
     Named Entities
     Coreference
     Relationships
 Classification           and Clustering of content items
     Statistical methods and tools
     Semantic classification based on ontological definitions

      www.iks-project.eu                                 Copyright IKS Consortium
Page: 31




    Information Extraction
   Rule based approaches for shallow text analysis
     Usually based on Finite State technology: fast, robust
     Cascaded processing
     Based on templates as target structures to be filled
     Example platforms:
          GATE
          SProUT
   Can be used for nearly any kind of extraction/annotation task,
    including Named-Entity-Recognition (NER)
   Easy customization



       www.iks-project.eu                                 Copyright IKS Consortium
Page: 32




  Information Extraction
 Semi-supervised         learning approaches
    Rule induction from corpora
    Use example annotations as seeds for bootstrapping
    Pattern Rules learned from contextual features with
     generalization over contexts




     www.iks-project.eu                          Copyright IKS Consortium
Page: 33




    Named Entities
   Statistical Approaches: examples
     Lingpipe: Hidden Markov Models
     OpenNLP: Maximum Entropy Models
     Stanford NER: Conditional Random Fields


   Statistical models crated by supervised learning techniques
     Large annotated corpora required
   Customization diffcult except by re-annotation/re-training
   Not suitable for any type of named entity




       www.iks-project.eu                                Copyright IKS Consortium
Page: 34




NER Document Markup




www.iks-project.eu              Copyright IKS Consortium
Page: 35




NER Markup for a Web Page




 www.iks-project.eu              Copyright IKS Consortium
Page: 36




 IE Template
A Person Template (as
Typed Featured Structure)
instantiated from text.
The template supports the
extraction of various
properties of a person.




   www.iks-project.eu                  Copyright IKS Consortium
Page: 37




  Classification
 Assign   a data item to some predefined class
 Statistical classification
 Numerous methods, e.g.:
     Bayes classifiers
     K-Nearest Neighbor (KNN)
     Support Vector Machines (SVM)




      www.iks-project.eu                          Copyright IKS Consortium
Page: 38




   Semantic Classification
 Semanticclassification in Knowledge Representation
 Formalisms
     Infer the item„s class from the item„s properties by matching
      them with the class definitions: Which classes allow for these
      properties?
Assume that our ontology contains 2 classes with some properties
          SpatialThing:      latitude, longitude
          PopulatedPlace:    population
Paderborn is an object with latidude „51°43′0″N“, longitude „8°46′0″E“ and a
population of 146283.
Then we can infer that Paderborn is a SpatialThing as that are the things that
have latitudes and longitudes in our ontology. Also, we can infer that it is a
PopulatedPlace as that are the things that have a population.
       www.iks-project.eu                                         Copyright IKS Consortium
Page: 39




  Clustering
 Detection  of classes in a data set
 Partitioning data into classes in an unsupervised way
  with
  high intra-class similarity
  low inter-class similarity
 Main variants:
     Hierarchical clustering
       Agglomerative

     Partitioning clustering
       K-Means


      www.iks-project.eu                      Copyright IKS Consortium
Page: 40




  Tools for Classification and
  Clustering
 Generic:
     WEKA: Java library implementing several dozen methods
      for data mining. Application to textual data requires special
      preprocessing.
 Text:
     MALLET: Java library with implementations of major
      methods for text and document classification and
      clustering




      www.iks-project.eu                              Copyright IKS Consortium
Page: 41




  Evaluation Measures
 Standard        evaluation measures for IE/IR etc. systems:
                           tp tn
    Accuracy: acc tp fp tn fn
                         tp
                                                tp = true positive
    Precision: prec tp fp                      tn = true negative
    Recall:    recall
                         tp                     fp = false positive
                       tp fn
                                                fn = false negative
    F-Measure : F 2 prec recall
                       prec
                            recall




     www.iks-project.eu                                       Copyright IKS Consortium
Page: 42




  Evaluation Measures:
  Classification
 A confusion matrix which reports on the classification of
 27 wines by grape variety. The reference in this case is
 the true variety and the response arises from the blind
 evaluation of a human judge.

                                                    =9/(9+3+1)
      Many-way Confusion Matrix
                          Response
                     Cabernet Syrah Pinot Precision Recall F-Measure
     Refer- Cabernet         9      3      0   0,69 0,75        0,72
     ence Syrah              3      5      1   0,56 0,56        0,56
            Pinot            1      1      4   0,80 0,67        0,73
                              Macro average    0,68 0,66        0,67
                            Overall accuracy   0,67
                                                               =4/(1+1+4)
    www.iks-project.eu                                        Copyright IKS Consortium
Page: 43




    Evaluation Measures: NER
 Reference            annotations:
      [Microsoft Corp.] CEO [Steve Ballmer] announced the release of [Windows 7] today

 Recognized                annotations:
      [Microsoft Corp.] [CEO] [Steve] Ballmer announced the release of Windows 7 [today]

-> Microsoft Corp. CEO Steve Ballmer announced the release of Windows 7 today
                                                                       Counts    Entities

Precision: 1/(1+3) = 0,25                                       TP     1         [Microsoft Corp.]
Recall:    1/(1+2) = 0,33                                       TN

F-Measure:                                                      FP     3         [CEO]
                                                                                 [Steve]
  2*0,25*0,33/(0,25+0,33) = 0,28                                                 [today]
                                                                FN     2         [Windows 7]
                                                                                 [Steve Ballmer]
                                                                             Copyright IKS Consortium
       www.iks-project.eu
Page: 44




  NER Evaluation
 Nobel Prize Corpus from NYT, BBC, CNN
 538 documents (Ø 735 words/document)
    28948 person, 16948 organization occurrences


                      Sprout       Calais          Stanford     OpenNLP
                                                   NER
     Precision             77,26        94,22           73,21       57,69
     Recall                65,85        86,66           73,62       42,86
     F1                    71,10        90,28           73,41       49,18



     www.iks-project.eu                                            Copyright IKS Consortium
Page: 45




    References
   Microformats: http://microformats.org/
   RDFa: http://www.w3.org/TR/xhtml-rdfa-primer/
   Google Rich Snippets:
    http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html
   Linked Data: http://linkeddata.org/guides-and-tutorials
   Linked Data: Heath and Bizer, Linked Data: Evolving the Web into a Global Data
    Space. Morgan & Claypool, 2011. (Online: http://linkeddatabook.com/book)
   Information Extraction: Moens, Information Extraction: Algorithms and Prospects in
    a Retrieval Context. Springer 2006
   Text Mining: Feldman and Sanger, The Text Mining Handbook: Advanced
    Approaches in Analyzing Unstructured Data, CUP, 2007




       www.iks-project.eu                                             Copyright IKS Consortium

More Related Content

What's hot

Lecture the semantic_web_part_1
Lecture the semantic_web_part_1Lecture the semantic_web_part_1
Lecture the semantic_web_part_1IKS - Project
 
ECLAP Tutorial first part, ECLAP 2012 conference. the general overview
ECLAP Tutorial first part, ECLAP 2012 conference. the general overviewECLAP Tutorial first part, ECLAP 2012 conference. the general overview
ECLAP Tutorial first part, ECLAP 2012 conference. the general overviewPaolo Nesi
 
Information Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectInformation Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectElisabeth Lex
 
Seven Arguments for Semantic Technologies
Seven Arguments for Semantic TechnologiesSeven Arguments for Semantic Technologies
Seven Arguments for Semantic TechnologiesMike Bergman
 
Towards Socially Intelligent Media Computing
Towards Socially Intelligent Media ComputingTowards Socially Intelligent Media Computing
Towards Socially Intelligent Media ComputingPaolo Nesi
 
The Rationale for Semantic Technologies
The Rationale for Semantic TechnologiesThe Rationale for Semantic Technologies
The Rationale for Semantic TechnologiesMike Bergman
 
Pundit: Semantically Structured Annotations for Web Contents and Digital Libr...
Pundit: Semantically Structured Annotations for Web Contents and Digital Libr...Pundit: Semantically Structured Annotations for Web Contents and Digital Libr...
Pundit: Semantically Structured Annotations for Web Contents and Digital Libr...SemLib Project
 
Dh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemDh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemMarco Grassi
 

What's hot (10)

Lecture the semantic_web_part_1
Lecture the semantic_web_part_1Lecture the semantic_web_part_1
Lecture the semantic_web_part_1
 
Limes webinar
Limes webinarLimes webinar
Limes webinar
 
ECLAP Tutorial first part, ECLAP 2012 conference. the general overview
ECLAP Tutorial first part, ECLAP 2012 conference. the general overviewECLAP Tutorial first part, ECLAP 2012 conference. the general overview
ECLAP Tutorial first part, ECLAP 2012 conference. the general overview
 
Information Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectInformation Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU Project
 
Seven Arguments for Semantic Technologies
Seven Arguments for Semantic TechnologiesSeven Arguments for Semantic Technologies
Seven Arguments for Semantic Technologies
 
Towards Socially Intelligent Media Computing
Towards Socially Intelligent Media ComputingTowards Socially Intelligent Media Computing
Towards Socially Intelligent Media Computing
 
The Rationale for Semantic Technologies
The Rationale for Semantic TechnologiesThe Rationale for Semantic Technologies
The Rationale for Semantic Technologies
 
Pundit: Semantically Structured Annotations for Web Contents and Digital Libr...
Pundit: Semantically Structured Annotations for Web Contents and Digital Libr...Pundit: Semantically Structured Annotations for Web Contents and Digital Libr...
Pundit: Semantically Structured Annotations for Web Contents and Digital Libr...
 
AKM PPT C4 ASSET FORMATION
AKM PPT C4 ASSET FORMATIONAKM PPT C4 ASSET FORMATION
AKM PPT C4 ASSET FORMATION
 
Dh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemDh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit system
 

Similar to Lecture semantic lifting_presentation

Chapter 0 -_organization
Chapter 0 -_organizationChapter 0 -_organization
Chapter 0 -_organizationIKS - Project
 
Lecture semantic based_interaction_and_presentation_of_content
Lecture semantic based_interaction_and_presentation_of_contentLecture semantic based_interaction_and_presentation_of_content
Lecture semantic based_interaction_and_presentation_of_contentIKS - Project
 
Lecture reference architecture_for_semantic_cms_part_ii
Lecture reference architecture_for_semantic_cms_part_iiLecture reference architecture_for_semantic_cms_part_ii
Lecture reference architecture_for_semantic_cms_part_iiIKS - Project
 
Lecture content management
Lecture content managementLecture content management
Lecture content managementIKS - Project
 
Iks d73-teachers handbook-v12
Iks d73-teachers handbook-v12Iks d73-teachers handbook-v12
Iks d73-teachers handbook-v12IKS - Project
 
Introduction 1 -_the_iks_project
Introduction 1 -_the_iks_projectIntroduction 1 -_the_iks_project
Introduction 1 -_the_iks_projectIKS - Project
 
Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applications
Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applicationsNuxeo Semantic ECM: from Scribo and Stanbol to valuable applications
Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applicationsNuxeo
 
Gilbane SF - Content Convergence Strategies
Gilbane SF - Content Convergence StrategiesGilbane SF - Content Convergence Strategies
Gilbane SF - Content Convergence StrategiesEric Barroca
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries mdabrowski
 
The JISC Information Environment and VLEs
The JISC Information Environment and VLEsThe JISC Information Environment and VLEs
The JISC Information Environment and VLEsAndy Powell
 
The IKS RESTful semantic engine - let's get started!
The IKS RESTful semantic engine - let's get started!The IKS RESTful semantic engine - let's get started!
The IKS RESTful semantic engine - let's get started!Bertrand Delacretaz
 
"Just Put That In The Zip Code Field..."
"Just Put That In The Zip Code Field...""Just Put That In The Zip Code Field..."
"Just Put That In The Zip Code Field..."gadgetopia
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
Peoplesoft.com Case Study: Enterprise Information Architecture
Peoplesoft.com Case Study: Enterprise Information ArchitecturePeoplesoft.com Case Study: Enterprise Information Architecture
Peoplesoft.com Case Study: Enterprise Information ArchitectureChiara Fox Ogan
 
Linked Data media experiment
Linked Data media experimentLinked Data media experiment
Linked Data media experimentMediArena
 
SIOC: Semantic Web for Social Media Sites
SIOC: Semantic Web for Social Media SitesSIOC: Semantic Web for Social Media Sites
SIOC: Semantic Web for Social Media SitesUldis Bojars
 
Web content management
Web content managementWeb content management
Web content managementsmtcd
 

Similar to Lecture semantic lifting_presentation (20)

Chapter 0 -_organization
Chapter 0 -_organizationChapter 0 -_organization
Chapter 0 -_organization
 
Lecture semantic based_interaction_and_presentation_of_content
Lecture semantic based_interaction_and_presentation_of_contentLecture semantic based_interaction_and_presentation_of_content
Lecture semantic based_interaction_and_presentation_of_content
 
Lecture reference architecture_for_semantic_cms_part_ii
Lecture reference architecture_for_semantic_cms_part_iiLecture reference architecture_for_semantic_cms_part_ii
Lecture reference architecture_for_semantic_cms_part_ii
 
Lecture content management
Lecture content managementLecture content management
Lecture content management
 
Iks d73-teachers handbook-v12
Iks d73-teachers handbook-v12Iks d73-teachers handbook-v12
Iks d73-teachers handbook-v12
 
Introduction 1 -_the_iks_project
Introduction 1 -_the_iks_projectIntroduction 1 -_the_iks_project
Introduction 1 -_the_iks_project
 
Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applications
Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applicationsNuxeo Semantic ECM: from Scribo and Stanbol to valuable applications
Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applications
 
Public
PublicPublic
Public
 
Gilbane SF - Content Convergence Strategies
Gilbane SF - Content Convergence StrategiesGilbane SF - Content Convergence Strategies
Gilbane SF - Content Convergence Strategies
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
Semantic web
Semantic webSemantic web
Semantic web
 
The JISC Information Environment and VLEs
The JISC Information Environment and VLEsThe JISC Information Environment and VLEs
The JISC Information Environment and VLEs
 
The IKS RESTful semantic engine - let's get started!
The IKS RESTful semantic engine - let's get started!The IKS RESTful semantic engine - let's get started!
The IKS RESTful semantic engine - let's get started!
 
"Just Put That In The Zip Code Field..."
"Just Put That In The Zip Code Field...""Just Put That In The Zip Code Field..."
"Just Put That In The Zip Code Field..."
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
Peoplesoft.com Case Study: Enterprise Information Architecture
Peoplesoft.com Case Study: Enterprise Information ArchitecturePeoplesoft.com Case Study: Enterprise Information Architecture
Peoplesoft.com Case Study: Enterprise Information Architecture
 
Linked Data media experiment
Linked Data media experimentLinked Data media experiment
Linked Data media experiment
 
The Semantic Web & Web 3.0
The Semantic Web & Web 3.0The Semantic Web & Web 3.0
The Semantic Web & Web 3.0
 
SIOC: Semantic Web for Social Media Sites
SIOC: Semantic Web for Social Media SitesSIOC: Semantic Web for Social Media Sites
SIOC: Semantic Web for Social Media Sites
 
Web content management
Web content managementWeb content management
Web content management
 

Lecture semantic lifting_presentation

  • 1. Semantic Semantic CMS Community Lifting for Traditional Content Lecturer Resources Organization Date of presentation Co-funded by the 1 Copyright IKS Consortium European Union
  • 2. Page: Part I: Foundations (1) Introduction of Content Foundations of Semantic (2) Management Web Technologies Part II: Semantic Content Part III: Methodologies Management Knowledge Interaction Requirements Engineering (3) (7) and Presentation for Semantic CMS (4) Knowledge Representation and Reasoning (8) Designing Semantic CMS Semantifying (5) Semantic Lifting (9) your CMS Storing and Accessing Designing Interactive (6) Semantic Data (10) Ubiquitous IS www.iks-project.eu Copyright IKS Consortium
  • 3. Page: 3 What is this Lecture about?  We have learned ... Part II: Semantic Content  ... how to build ontologies Management representing complex Knowledge Interaction (3) knowledge domains. and Presentation  ... a way to reason about knowledge. (4) Knowledge Representation and Reasoning  We need a way ...  ... to extract knowledge from (5) Semantic Lifting content in a automatic way  Semantic Lifting Storing and Accessing (6) Semantic Data www.iks-project.eu Copyright IKS Consortium
  • 4. Page: 4 Overview  What is semantic lifting?  Core concepts  Scenarios  Requirements  Technologies  Semantic Reengineering  Semantic Enhancements of textual content www.iks-project.eu Copyright IKS Consortium
  • 5. Page: 5 What is “Semantic Lifting”?  Semantic Lifting refers to the process of associating content items with suitable semantic objects as metadata to turn “unstructured” content items into semantic knowledge resources  Semantic Lifting makes explicit “hidden” metadata in content items www.iks-project.eu Copyright IKS Consortium
  • 6. Page: 6 Semantic Lifting Targets  Semantic Reengineering of structured data  Semantic Lifting harmonizes metadata representations  Semantic Lifting reengineers data from an existing resource so that the data from the resource can be reused within in a semantic repository  Semantic Content Enhancement  Semantic Lifting generates additional metadata and annotations by semantic analysis of content items  Semantic Lifting classifies content objects by means of semantic annotations www.iks-project.eu Copyright IKS Consortium
  • 7. Page: 7 Structured Content  Structured content provides implicit semantics through the structure definition  Table definitions in relational databases, XML schemata, field definitions for adressbooks, calendars, etc.  Application programs are designed to „know“ how to interpret the structures and the data within.  Semantic Lifting is used for Reengineering to support data exchange and seamless interoperability between different systems www.iks-project.eu Copyright IKS Consortium
  • 8. Page: 8 Unstructured Content  Unstructured content  Images, texts, videos, music, web pages composed of various types of media items  Meaningful only to humans not to machines  Content must be described semantically by metadata to become meaningful to machines, e.g. what the text or image is about.  Semantic Lifting is used as content enhancement www.iks-project.eu Copyright IKS Consortium
  • 9. Page: 9 Mixed Content  No dichotomy of structured and unstructured content  Structured databases are used to store unstructured content types, such as texts, images etc.  Documents can be composed of unstructured content items such as free text and images as well as more structured information, e.g. tables and charts Free text Structured content www.iks-project.eu Copyright IKS Consortium
  • 10. Page: 10 Metadata: Variants  Metadata exist in many forms:  Free text descriptions  Descriptive content related keywords or tags from fixed vocabularies or in free form  Taxonomic and classificatory labels  Media specific metadata, such a mime-types, encoding, language, bit rate  Media-type specific structured metadata schemes such as EXIF for photos, IPTC tags for images, ID3-tags for MP3, MPEG-7 for videos, etc.  Content related structured knowledge markup, e.g. to specify what objects are shown in an image or mentioned in a text, what the actors are doing, etc. www.iks-project.eu Copyright IKS Consortium
  • 11. Page: 11 Metadata: Variants  Inline metadata are part of content  ID3 tags embedded in MP3 files  Offline metadata are kept separate from content www.iks-project.eu Copyright IKS Consortium
  • 12. Page: 12 Formal semantic metadata  Data representation in a formalism with a formal semantic interpretation that defines the concept of (logical) entailment for reasoning:  Soundness: conclusions are valid entailments  Completeness: every valid entailment can be deduced  Decidability: a procedure exists to determine whether a conclusion can be deduced  Embodiments:  Logics  Knowledge Representation Systems, Description Logics  Semantic Web: RDF, OWL www.iks-project.eu Copyright IKS Consortium
  • 13. Page: 13 „Semantics“ in CMS  CMSsystems provide various methods to include metadata  Organize content in hierarchies  Hierarchical taxonomies  Attachment of properties to content items for metadata  Content type definitions with inheritance  These methods are used in CMS systems in ad-hoc fashion without clear semantics. Therefore no well- defined reasoning is possible. www.iks-project.eu Copyright IKS Consortium
  • 14. Page: 14 Semantic Lifting Usage  Content Creation and Acquisition  Authoring content  Support content editors in providing metadata of specified types  Uploading external content/documents  automatic extraction and analysis, e.g. for indexing  Importing content from external sources/documents  Integration of external content into content repository  Content needs to be transformed to match internal CMS structures and metadata schemes  Crossreferencing/linking among CMS content items and external content  Detect related or additional content  Add pointers/links to related or additional content www.iks-project.eu Copyright IKS Consortium
  • 15. Page: 15 Semantic Lifting Usage  Access to external documents and content repositories  Semantic harmonization with CMS semantic structures  Semantic interoperability in data exchange with other content repositories  TheCMS needs to understand the data structures used by external services and programs  E.g synchronization of a local calendar from Outlook with an external calendar based on iCalendar format  E.g. Importing RDF from a Linked Data endpoint such as dbpedia  TheCMS must present its data in a form understood by external target services or programs www.iks-project.eu Copyright IKS Consortium
  • 16. Page: 16 Semantic Lifting Usage  Publishing content with metadata  Metadata need to be transformed into a form compatible with the publication format  E.g. converting FreeDB metadata into ID3 tags for inclusion in an MP3 file www.iks-project.eu Copyright IKS Consortium
  • 17. Page: 17 Publishing Web Content with semantic metadata  Augmenting web content with structured information becomes increasingly important  Several methods have emerged in recent years to include structured metadata in Web pages  Microformats  RDFa  Microdata (HTML5)  Supported by the major search engines to improve search and result presentation, e.g. Google („Rich Snippets), Bing, Yahoo www.iks-project.eu Copyright IKS Consortium
  • 18. Page: 18 Augmenting Web Content  The HTML code contains a review of a restaurant in plain text using only line breaks for structuring  Without specialized information extraction analysis tools it cannot be interpreted, e.g. that it is a review (of what and when?), who the reviewer was, etc. <div> L’Amourita Pizza Reviewed by Ulysses Grant on Jan 6. Delicious, tasty pizza on Eastlake! L'Amourita serves up traditional wood-fired Neapolitan-style pizza, brought to your table promptly and without fuss. An ideal neighborhood pizza joint. Rating: 4.5 </div> www.iks-project.eu Copyright IKS Consortium
  • 19. Page: 19 Microformats  Same text but additional span elements with class attributes to encode the type of contained information (hReview) and the properties of that type <div class="hreview"> <span class="item"> <span class="fn">L’Amourita Pizza</span> </span> Reviewed by <span class="reviewer">Ulysses Grant</span> on <span class="dtreviewed"> Jan 6<span class="value-title" title="2009-01-06"></span> </span>. <span class="summary">Delicious, tasty pizza on Eastlake!</span> <span class="description">L'Amourita serves up traditional wood-fired Neapolitan-style pizza, brought to your table promptly and without fuss. An ideal neighborhood pizza joint.</span> Rating: <span class="rating">4.5</span> </div> www.iks-project.eu Copyright IKS Consortium
  • 20. Page: 20 RDFa  Same text but additional attributes and span elements encoding a RDF structure:  namespace declaration of the used ontology  RDF class encoded by typeof attribute and its properties by a property attribute <div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Review"> <span property="v:itemreviewed">L’Amourita Pizza</span> Reviewed by <span property="v:reviewer">Ulysses Grant</span> on <span property="v:dtreviewed" content="2009-01-06">Jan 6</span>. <span property="v:summary">Delicious, tasty pizza on Eastlake!</span> <span property="v:description">L'Amourita serves up traditional wood-fired Neapolitan-style pizza, brought to your table promptly and without fuss. An ideal neighborhood pizza joint.</span> Rating: <span property="v:rating">4.5</span> </div> www.iks-project.eu Copyright IKS Consortium
  • 21. Page: 21 Microdata (HTML5)  Same text but additional attributes and span elements:  A class declaration as value of an itemtype attribute and its properties as values of an itemprop attribute <div> <div itemscope itemtype="http://data-vocabulary.org/Review"> <span itemprop="itemreviewed">L’Amourita Pizza</span> Reviewed by <span itemprop="reviewer">Ulysses Grant</span> on <time itemprop="dtreviewed" datetime="2009-01-06">Jan 6</time>. <span itemprop="summary">Delicious, tasty pizza in Eastlake!</span> <span itemprop="description">L'Amourita serves up traditional wood-fired Neapolitan-style pizza, brought to your table promptly and without fuss. An ideal neighborhood pizza joint.</span> Rating: <span itemprop="rating">4.5</span> </div> </div> www.iks-project.eu Copyright IKS Consortium
  • 22. Page: 22 Lifting Requirements: Overview Top-level requirements  Semantic Associations with Content  Semantic Harmonization  Semantic Linking  Interactive Lifting  Customizability  Semantically Transparent Structured Content Sources www.iks-project.eu Copyright IKS Consortium
  • 23. Page: 23 Semantic Associations with Content  Unstructured content and information must be supplied with structured semantic annotations and metadata.  Support for various content/media types  Information extraction from text, topic classification, image tagging, …  Support for creation of semantic annotations in content authoring www.iks-project.eu Copyright IKS Consortium
  • 24. Page: 24 Semantic Harmonization  Metadataand annotations must be harmonized with requirements for semantic processing in the CMS  Reengineering methods, interpreters and wrappers for all types and formats of metadata and annotations, e.g. tags, microformats, XML Metadata ( MPEG-7, …), ID3 tags, EXIF data, …  Ensure semantic interoperability of data and annotation schemes within the CMS and across external resources  Ontology mapping and harmonization of annotations  Externalmetadata  Metadata generated by semantic analysis www.iks-project.eu Copyright IKS Consortium
  • 25. Page: Slide 25 Semantic Linking  Liftingmust enable the interlinking of content objects by semantic relationships.  Internal linking of content items within the CMS  links to external resources, e.g. Linked Open Data  Establish semantic relatedness of content for different views as well as different search, navigation and browsing strategies, …  Directsemantic links among content items and metadata  Similarity relations over sets of content items  Clustering of content items www.iks-project.eu Copyright IKS Consortium
  • 26. Page: Slide 26 Interactive Lifting  Lifting must interact with CMS users.  Suggest semantic annotations during content creation  Support for various publishing formats such as microformats, RDFa, etc.  Automatic annotations (autotagging) with optional correction option  Learning capabilities and adaptability of automatic annotation components from user feedback www.iks-project.eu Copyright IKS Consortium
  • 27. Page: 27 Customizability  Liftingcomponents must be customizable by CMS users/customers.  Users must not be restricted to predefined vocabularies, ontologies, …  Domain ontologies, terminologies, tag sets are defined by CMS users/customers.  Browsers and editors for component resources are necessary. www.iks-project.eu Copyright IKS Consortium
  • 28. Page: 28 Transparent Structured Content Sources  Structured content sources need to be reengineered to semantic resources  Support uniform data access to structured content repositories, e.g. SPARQL end points based on D2RQ technologies for transparent access to RDF and non-RDF databases  Extraction of ontologies from database structures, schemata, XML, resources, …  Alignment and mapping of the descriptions www.iks-project.eu Copyright IKS Consortium
  • 29. Page: 29 Semantic Reengineering of structured data sources  Focus on tools for reengineering structured data sources to RDF representations  Many tools and platforms for  D2R Servers: Exhibit relational DBs as RDF  Talis platform: Linked Open Data  Triplify: like D2R but in PHP  Virtuoso middleware  Krextor/OntoCape: generating RDF from XML  Various Transformers for inducing RDF ontologies and instance data from XSD and XML  More details in presentation on Knowledge Representation (KReS) www.iks-project.eu Copyright IKS Consortium
  • 30. Page: 30 Semantic Content Enhancements: Overview  Focus here is on textual content  Metadata Extraction from existing content in various formats to make embedded metadata explicit  Information Extraction from textual content:  Named Entities  Coreference  Relationships  Classification and Clustering of content items  Statistical methods and tools  Semantic classification based on ontological definitions www.iks-project.eu Copyright IKS Consortium
  • 31. Page: 31 Information Extraction  Rule based approaches for shallow text analysis  Usually based on Finite State technology: fast, robust  Cascaded processing  Based on templates as target structures to be filled  Example platforms:  GATE  SProUT  Can be used for nearly any kind of extraction/annotation task, including Named-Entity-Recognition (NER)  Easy customization www.iks-project.eu Copyright IKS Consortium
  • 32. Page: 32 Information Extraction  Semi-supervised learning approaches  Rule induction from corpora  Use example annotations as seeds for bootstrapping  Pattern Rules learned from contextual features with generalization over contexts www.iks-project.eu Copyright IKS Consortium
  • 33. Page: 33 Named Entities  Statistical Approaches: examples  Lingpipe: Hidden Markov Models  OpenNLP: Maximum Entropy Models  Stanford NER: Conditional Random Fields  Statistical models crated by supervised learning techniques  Large annotated corpora required  Customization diffcult except by re-annotation/re-training  Not suitable for any type of named entity www.iks-project.eu Copyright IKS Consortium
  • 34. Page: 34 NER Document Markup www.iks-project.eu Copyright IKS Consortium
  • 35. Page: 35 NER Markup for a Web Page www.iks-project.eu Copyright IKS Consortium
  • 36. Page: 36 IE Template A Person Template (as Typed Featured Structure) instantiated from text. The template supports the extraction of various properties of a person. www.iks-project.eu Copyright IKS Consortium
  • 37. Page: 37 Classification  Assign a data item to some predefined class  Statistical classification  Numerous methods, e.g.:  Bayes classifiers  K-Nearest Neighbor (KNN)  Support Vector Machines (SVM) www.iks-project.eu Copyright IKS Consortium
  • 38. Page: 38 Semantic Classification  Semanticclassification in Knowledge Representation Formalisms  Infer the item„s class from the item„s properties by matching them with the class definitions: Which classes allow for these properties? Assume that our ontology contains 2 classes with some properties SpatialThing: latitude, longitude PopulatedPlace: population Paderborn is an object with latidude „51°43′0″N“, longitude „8°46′0″E“ and a population of 146283. Then we can infer that Paderborn is a SpatialThing as that are the things that have latitudes and longitudes in our ontology. Also, we can infer that it is a PopulatedPlace as that are the things that have a population. www.iks-project.eu Copyright IKS Consortium
  • 39. Page: 39 Clustering  Detection of classes in a data set  Partitioning data into classes in an unsupervised way with high intra-class similarity low inter-class similarity  Main variants:  Hierarchical clustering  Agglomerative  Partitioning clustering  K-Means www.iks-project.eu Copyright IKS Consortium
  • 40. Page: 40 Tools for Classification and Clustering  Generic:  WEKA: Java library implementing several dozen methods for data mining. Application to textual data requires special preprocessing.  Text:  MALLET: Java library with implementations of major methods for text and document classification and clustering www.iks-project.eu Copyright IKS Consortium
  • 41. Page: 41 Evaluation Measures  Standard evaluation measures for IE/IR etc. systems: tp tn  Accuracy: acc tp fp tn fn tp tp = true positive  Precision: prec tp fp tn = true negative  Recall: recall tp fp = false positive tp fn fn = false negative  F-Measure : F 2 prec recall prec recall www.iks-project.eu Copyright IKS Consortium
  • 42. Page: 42 Evaluation Measures: Classification  A confusion matrix which reports on the classification of 27 wines by grape variety. The reference in this case is the true variety and the response arises from the blind evaluation of a human judge. =9/(9+3+1) Many-way Confusion Matrix Response Cabernet Syrah Pinot Precision Recall F-Measure Refer- Cabernet 9 3 0 0,69 0,75 0,72 ence Syrah 3 5 1 0,56 0,56 0,56 Pinot 1 1 4 0,80 0,67 0,73 Macro average 0,68 0,66 0,67 Overall accuracy 0,67 =4/(1+1+4) www.iks-project.eu Copyright IKS Consortium
  • 43. Page: 43 Evaluation Measures: NER  Reference annotations:  [Microsoft Corp.] CEO [Steve Ballmer] announced the release of [Windows 7] today  Recognized annotations:  [Microsoft Corp.] [CEO] [Steve] Ballmer announced the release of Windows 7 [today] -> Microsoft Corp. CEO Steve Ballmer announced the release of Windows 7 today Counts Entities Precision: 1/(1+3) = 0,25 TP 1 [Microsoft Corp.] Recall: 1/(1+2) = 0,33 TN F-Measure: FP 3 [CEO] [Steve] 2*0,25*0,33/(0,25+0,33) = 0,28 [today] FN 2 [Windows 7] [Steve Ballmer] Copyright IKS Consortium www.iks-project.eu
  • 44. Page: 44 NER Evaluation  Nobel Prize Corpus from NYT, BBC, CNN  538 documents (Ø 735 words/document)  28948 person, 16948 organization occurrences Sprout Calais Stanford OpenNLP NER Precision 77,26 94,22 73,21 57,69 Recall 65,85 86,66 73,62 42,86 F1 71,10 90,28 73,41 49,18 www.iks-project.eu Copyright IKS Consortium
  • 45. Page: 45 References  Microformats: http://microformats.org/  RDFa: http://www.w3.org/TR/xhtml-rdfa-primer/  Google Rich Snippets: http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html  Linked Data: http://linkeddata.org/guides-and-tutorials  Linked Data: Heath and Bizer, Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, 2011. (Online: http://linkeddatabook.com/book)  Information Extraction: Moens, Information Extraction: Algorithms and Prospects in a Retrieval Context. Springer 2006  Text Mining: Feldman and Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, CUP, 2007 www.iks-project.eu Copyright IKS Consortium