SemanticS & digital data
proceSSing
Beirut, 9-10 March 2015
Establishing a framework for scholarly editing and publishing in the 21st century
Mokhtar BEN HENDA
Semantics:Semantics: Better computational description ofBetter computational description of
sciencescience
 Information is given explicit meaning so that machines can process it
more intelligently;
 Instead of just creating standard terms for concepts as is done in
XML, the Semantic Web also allows users to provide formal
definitions for the standard terms they create so that machines can
use inference algorithms to reason about the terms;
 A crucial component to the Semantic Web is the definition and use of
ontologies,
Basics of semantic WebBasics of semantic Web
 XML
 XML is a language for transmitting structured information. If the goal of the
web is to enable not only communication between people, but also between
machines, then XML seems a good basis not only for documents to be read
by people, but for data to be read by machines.
 RDF
 RDF(Resource Description Framework) is a directed, labeled graph data
format for representing information in the Web. It is just a data model that
does not have any significant semantics. RDF Schema is used to define a
vocabulary for use in RDF models. In particular, it allows to define the classes
used to type resources and to define the properties that resources can have.
 OWL
 OWL was designed to provide a common way to process the content of web
information instead of displaying it. It is primarily concerned with defining
terminology that can be used in RDF documents. Syntactically, an OWL
ontology is a valid RDF document and as such also a well-formed XML
document.
 SPARQL
 Is a set of specifications that provide languages and protocols to query and
manipulate RDF graph content on the Web or in an RDF store
OntologiesOntologies
 Principle:
 From characters string to word meaning
 From words listing to words relations
 From indexes to ontologies
Ontologies
Semantic links
Vocabularies
Data ModelsData Models
bank
fiddle
violin
violist
fiddler
string
rec: 12345
- financial institute
rec: 54321
- side of a river
rec: 9876
- small string instrument
rec: 65438
- musician playing violin
rec:42654
- musician
rec:25876
- string instrument
rec:35576
- string of instrument
rec:29551
- underwear
type-of
type-of
part-of
Vocabulary of a languageConceptsRelations
1
2
2
1
1
2
© Piek Vossen
Capturing Semantics in XML DocumentsCapturing Semantics in XML Documents
7
XML
app#1
Semantics: Code
to interpret the data
Action: Code to
process the data
app#2
Semantics: Code
to interpret the data
Action: Code to
process the data
application basisapplication basis
© Katia Sycara and Massimo Paolucci
OWL (Ontology Web Language)OWL (Ontology Web Language)
XML
app#1
Action: Code to
process the data
app#2
Action: Code to
process the data
OWL Document
Semantic Definitions
OWL Document
Semantic Definitions
Handles separate concept definitions (semantics) from
application
Express concept definitions using a standard vocabulary
© Katia Sycara and Massimo Paolucci
9
OWL and logicsOWL and logics
 OWL relies on Description Logics
 Logics provide automatic
 CheckCheck of consistency of concept definitions
 CompletionCompletion of concept definitions
 ClassificationClassification of new instances and concepts
 ExtractionExtraction of implicit knowledge in the documents
 OWL greatly expands the vocabulary for multiple
possible constructs
 XML Schema provides some of those properties to
some extent
© Katia Sycara and Massimo Paolucci
RDF (Resource Description Framework)RDF (Resource Description Framework)
 Provide basic syntax for OWL
 Use of URI for uniqueunique identification of
concepts, instances and relations
 Expression of relations between objects
and concepts (RDF triples)
Problem: no structure
© Katia Sycara and Massimo Paolucci
11
RDF SchemaRDF Schema
 Add basic structure to RDF
 Class/Subclass declaration
 Instances
 Properties (relations)
 Multiple inheritance
12(24)
NOTES
• everything is a replaceable bean
• all communication via fixed APIs
• low coupling, high modularity,
high extensibility
…
HTML
docs
RTF
docs
XML
docs
PDF
docs
email
XML
Document
Format
HTML
Document
Format
PDF
Document
Format
…
Document
Format
Layer (LRs)
XML OraclePostgre
Sql .ser
DataStore Layer
Corpus Document
Document
Content
Annotation
Set
Annotation
Feature
Map
Corpus Layer (LRs)
NOTES (2)
• eg: Protégé LR & VR both
wrapped in Res. (bean) API
• ontology repositories and
inference are the same:
KAON + Sesame +
Orenge + ?
GATE APIs
Processing Layer (PRs)
NE Co-ref TEs TRs POS …
Onto-
logy
Protégé
Onto-
logy
Word-
net
Gaz-
etteers
Language Resource Layer (LRs)
...
Application Layer
ANNIE OBIE …
IDE GUI Layer (VRs)
ADiff OntolVR DocVR ... Web
Services
Sheffield Natural Language Processing GroupSheffield Natural Language Processing Group
Semantic Web: cartographic searchingSemantic Web: cartographic searching
Semantic Web: cartographic searchingSemantic Web: cartographic searching
 Conceptual charts
 Reproduce semantic links like charts
 One concept can have numerous semantic
links
http://www.touchgraph.com/TGGoogleBrowser.html
Semantic Web: cartographic searchingSemantic Web: cartographic searching
http://www.oskope.com
Semantic Web: cartographic searchingSemantic Web: cartographic searching
• http://vionto.com/show/
Semantic Web: cartographic searchingSemantic Web: cartographic searching
Topic Maps : http://highwire.stanford.edu/lists/artbytopic.dtl
SKOS: Simple Knowledge Org. SystemsSKOS: Simple Knowledge Org. Systems
 SKOS: specifications
and standards to
support within the
framework of the
Semantic Web, the use
of knowledge
organization systems
(KOS) such as:
 thesauri,
 classification schemes,
 subject heading lists
 Taxonomies
http://www.ebusiness-unibw.org/tools/skos2owl/
SKOS PlaySKOS Play
 SKOS Play thesaurus is a
visualization service of
SKOS formatted
taxonomies or
vocabularies.
 More generally, it is used
to view or print a
knowledge organization
system expressed in
SKOS.
http://labs.sparna.fr/skos-play/upload?lang=en

Semantics

  • 1.
    SemanticS & digitaldata proceSSing Beirut, 9-10 March 2015 Establishing a framework for scholarly editing and publishing in the 21st century Mokhtar BEN HENDA
  • 2.
    Semantics:Semantics: Better computationaldescription ofBetter computational description of sciencescience  Information is given explicit meaning so that machines can process it more intelligently;  Instead of just creating standard terms for concepts as is done in XML, the Semantic Web also allows users to provide formal definitions for the standard terms they create so that machines can use inference algorithms to reason about the terms;  A crucial component to the Semantic Web is the definition and use of ontologies,
  • 3.
    Basics of semanticWebBasics of semantic Web  XML  XML is a language for transmitting structured information. If the goal of the web is to enable not only communication between people, but also between machines, then XML seems a good basis not only for documents to be read by people, but for data to be read by machines.  RDF  RDF(Resource Description Framework) is a directed, labeled graph data format for representing information in the Web. It is just a data model that does not have any significant semantics. RDF Schema is used to define a vocabulary for use in RDF models. In particular, it allows to define the classes used to type resources and to define the properties that resources can have.  OWL  OWL was designed to provide a common way to process the content of web information instead of displaying it. It is primarily concerned with defining terminology that can be used in RDF documents. Syntactically, an OWL ontology is a valid RDF document and as such also a well-formed XML document.  SPARQL  Is a set of specifications that provide languages and protocols to query and manipulate RDF graph content on the Web or in an RDF store
  • 4.
    OntologiesOntologies  Principle:  Fromcharacters string to word meaning  From words listing to words relations  From indexes to ontologies Ontologies Semantic links Vocabularies
  • 5.
    Data ModelsData Models bank fiddle violin violist fiddler string rec:12345 - financial institute rec: 54321 - side of a river rec: 9876 - small string instrument rec: 65438 - musician playing violin rec:42654 - musician rec:25876 - string instrument rec:35576 - string of instrument rec:29551 - underwear type-of type-of part-of Vocabulary of a languageConceptsRelations 1 2 2 1 1 2 © Piek Vossen
  • 6.
    Capturing Semantics inXML DocumentsCapturing Semantics in XML Documents
  • 7.
    7 XML app#1 Semantics: Code to interpretthe data Action: Code to process the data app#2 Semantics: Code to interpret the data Action: Code to process the data application basisapplication basis © Katia Sycara and Massimo Paolucci
  • 8.
    OWL (Ontology WebLanguage)OWL (Ontology Web Language) XML app#1 Action: Code to process the data app#2 Action: Code to process the data OWL Document Semantic Definitions OWL Document Semantic Definitions Handles separate concept definitions (semantics) from application Express concept definitions using a standard vocabulary © Katia Sycara and Massimo Paolucci
  • 9.
    9 OWL and logicsOWLand logics  OWL relies on Description Logics  Logics provide automatic  CheckCheck of consistency of concept definitions  CompletionCompletion of concept definitions  ClassificationClassification of new instances and concepts  ExtractionExtraction of implicit knowledge in the documents  OWL greatly expands the vocabulary for multiple possible constructs  XML Schema provides some of those properties to some extent © Katia Sycara and Massimo Paolucci
  • 10.
    RDF (Resource DescriptionFramework)RDF (Resource Description Framework)  Provide basic syntax for OWL  Use of URI for uniqueunique identification of concepts, instances and relations  Expression of relations between objects and concepts (RDF triples) Problem: no structure © Katia Sycara and Massimo Paolucci
  • 11.
    11 RDF SchemaRDF Schema Add basic structure to RDF  Class/Subclass declaration  Instances  Properties (relations)  Multiple inheritance
  • 12.
    12(24) NOTES • everything isa replaceable bean • all communication via fixed APIs • low coupling, high modularity, high extensibility … HTML docs RTF docs XML docs PDF docs email XML Document Format HTML Document Format PDF Document Format … Document Format Layer (LRs) XML OraclePostgre Sql .ser DataStore Layer Corpus Document Document Content Annotation Set Annotation Feature Map Corpus Layer (LRs) NOTES (2) • eg: Protégé LR & VR both wrapped in Res. (bean) API • ontology repositories and inference are the same: KAON + Sesame + Orenge + ? GATE APIs Processing Layer (PRs) NE Co-ref TEs TRs POS … Onto- logy Protégé Onto- logy Word- net Gaz- etteers Language Resource Layer (LRs) ... Application Layer ANNIE OBIE … IDE GUI Layer (VRs) ADiff OntolVR DocVR ... Web Services Sheffield Natural Language Processing GroupSheffield Natural Language Processing Group
  • 13.
    Semantic Web: cartographicsearchingSemantic Web: cartographic searching
  • 14.
    Semantic Web: cartographicsearchingSemantic Web: cartographic searching  Conceptual charts  Reproduce semantic links like charts  One concept can have numerous semantic links http://www.touchgraph.com/TGGoogleBrowser.html
  • 15.
    Semantic Web: cartographicsearchingSemantic Web: cartographic searching http://www.oskope.com
  • 16.
    Semantic Web: cartographicsearchingSemantic Web: cartographic searching • http://vionto.com/show/
  • 17.
    Semantic Web: cartographicsearchingSemantic Web: cartographic searching Topic Maps : http://highwire.stanford.edu/lists/artbytopic.dtl
  • 18.
    SKOS: Simple KnowledgeOrg. SystemsSKOS: Simple Knowledge Org. Systems  SKOS: specifications and standards to support within the framework of the Semantic Web, the use of knowledge organization systems (KOS) such as:  thesauri,  classification schemes,  subject heading lists  Taxonomies http://www.ebusiness-unibw.org/tools/skos2owl/
  • 19.
    SKOS PlaySKOS Play SKOS Play thesaurus is a visualization service of SKOS formatted taxonomies or vocabularies.  More generally, it is used to view or print a knowledge organization system expressed in SKOS. http://labs.sparna.fr/skos-play/upload?lang=en