Infromation Reprentation, 
Structured Data and Semantics 
Yogendra Tamang 
070-MSCS-670
OUTLINE 
• XML, DTD and XML Schema, XSLT 
• Meta data Standards. 
• Information Representation in Semantic Web 
• RDF, RDFS 
• Syntactic Formats 
• RDF/XML, N-triples, Turtles etc. 
• Embedded Formats 
• RDFs, Microformats, eRDF, HTML5, GRDDL, SPARQL
XML 
Thing! Elements 
XML 
Documents Epilogue 
Prolog 
XML 
Declaration and 
Reference 
Tag, 
Attributes 
and 
Content 
<?xml version="1.0" encoding="UTF-16"?> 
<!DOCTYPE book SYSTEM "book.dtd"> 
<lecturer>David Billington</lecturer>
XML 
• Comments and Processing Instructions 
<!-- This is a comment --> 
<?stylesheet type="text/css" href="mystyle.css"?>
DTD and XML Schema 
• Used for defining the structure 
• what values an attribute may take 
• which elements may or must occur within other elements, etc. 
<lecturer> 
<name>David Billington</name> 
<phone> +61 − 7 − 3875 507 </phone> 
</lecturer> 
<!ELEMENT lecturer (name,phone)> 
<!ELEMENT name (#PCDATA)> 
<!ELEMENT phone (#PCDATA)>
DTD 
Disjunctions: 
<!ELEMENT lecturer (name|phone)> 
<!ELEMENT lecturer((name,phone)|(phone,name))>
DTD 
<order orderNo="23456" customer="John Smith" date="October 15, 2002"> 
<item itemNo="a528" quantity="1"/> 
<item itemNo="c817" quantity="3"/> 
</order> 
<!ELEMENT order (item+)> 
<!ATTLIST order orderNo ID #REQUIRED 
customer CDATA #REQUIRED 
date CDATA #REQUIRED> 
<!ELEMENT item EMPTY> 
<!ATTLIST item itemNo ID #REQUIRED 
quantity CDATA #REQUIRED 
comments CDATA #IMPLIED>
XML Schema 
<schema http://www.w3.org/2000/10/XMLSchema version="1.0"> 
Element and Attribute Types using Data Types 
• Numerical data types: integer, Short etc. 
• String types: string, ID, IDREF, CDATA etc. 
• Date and time data types: time, Month etc. 
• User defined(Simple and Comlpex)
XML Namespaces and XPATH 
• Form: 
xmlns:prefix="location“ 
• XPATH: 
Operates on Tree data model and is core for XML query lanaguage
XSLT 
• XSLT specifies rules with which an input XML document 
is transformed to 
• another XML document 
• an HTML document 
• plain text
RDF 
• It provides a uniform framework for interchange of 
data and metadata between applications 
• XML does not provide any means of talking about the 
semantics (meaning) of data 
• Object-Attribute-Value
RDF 
• Fundamental Concepts: 
• resources 
• properties 
• statements
Statements 
• Triples 
• (x, P, y)  P(x, y)
RDFS Core Classes 
• rdfs:Resource, the class of all resources 
• rdfs:Class, the class of all classes 
• rdfs:Literal, the class of all literals (strings) 
• rdf:Property, the class of all properties. 
• rdf:Statement, the class of all reified statements
RDFS Core Properties 
• rdf:type, which relates a resource to its class 
• The resource is declared to be an instance of that class 
• rdfs:subClassOf, which relates a class to one of its 
superclasses 
• All instances of a class are instances of its superclass 
• rdfs:subPropertyOf, relates a property to one of its 
superproperties 
• rdfs:domain, 
• rdfs:range,
RDF Sparql Query 
• SPARQL is based on matching graph patterns 
• Example: 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
SELECT ?c 
WHERE 
{ 
?c rdf:type rdfs:Class . 
}
RDFa 
• RDFa is an extension to HTML5 that helps you markup things like 
People, Places, Events, Recipes and Reviews. Search Engines and Web 
Services use this markup to generate better search listings and give 
you better visibility on the Web, so that people can find your website 
more easily
Microformats 
• microformats are a set of simple, open data formats built upon 
existing and widely adopted standards
MicroFormats Tools and Code generators
Ntriples
Green Turtle RDFa
Green Triples 
• an implementation of RDFa 1.1 for browsers 
• When triples are discovered in a web page, a little green turtle 
will appear in the address bar. If you click on that turtle, you can 
view the triple graph.
References 
• “Semantic Web Primer”, Grigoris Antoniou, Frank van Harmelen 
• http://rdfa.info/ 
• https://code.google.com/p/green-turtle/ 
• http://microformats.org/wiki/about

Infromation Reprentation, Structured Data and Semantics

  • 1.
    Infromation Reprentation, StructuredData and Semantics Yogendra Tamang 070-MSCS-670
  • 2.
    OUTLINE • XML,DTD and XML Schema, XSLT • Meta data Standards. • Information Representation in Semantic Web • RDF, RDFS • Syntactic Formats • RDF/XML, N-triples, Turtles etc. • Embedded Formats • RDFs, Microformats, eRDF, HTML5, GRDDL, SPARQL
  • 3.
    XML Thing! Elements XML Documents Epilogue Prolog XML Declaration and Reference Tag, Attributes and Content <?xml version="1.0" encoding="UTF-16"?> <!DOCTYPE book SYSTEM "book.dtd"> <lecturer>David Billington</lecturer>
  • 4.
    XML • Commentsand Processing Instructions <!-- This is a comment --> <?stylesheet type="text/css" href="mystyle.css"?>
  • 5.
    DTD and XMLSchema • Used for defining the structure • what values an attribute may take • which elements may or must occur within other elements, etc. <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> <!ELEMENT lecturer (name,phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT phone (#PCDATA)>
  • 6.
    DTD Disjunctions: <!ELEMENTlecturer (name|phone)> <!ELEMENT lecturer((name,phone)|(phone,name))>
  • 7.
    DTD <order orderNo="23456"customer="John Smith" date="October 15, 2002"> <item itemNo="a528" quantity="1"/> <item itemNo="c817" quantity="3"/> </order> <!ELEMENT order (item+)> <!ATTLIST order orderNo ID #REQUIRED customer CDATA #REQUIRED date CDATA #REQUIRED> <!ELEMENT item EMPTY> <!ATTLIST item itemNo ID #REQUIRED quantity CDATA #REQUIRED comments CDATA #IMPLIED>
  • 8.
    XML Schema <schemahttp://www.w3.org/2000/10/XMLSchema version="1.0"> Element and Attribute Types using Data Types • Numerical data types: integer, Short etc. • String types: string, ID, IDREF, CDATA etc. • Date and time data types: time, Month etc. • User defined(Simple and Comlpex)
  • 9.
    XML Namespaces andXPATH • Form: xmlns:prefix="location“ • XPATH: Operates on Tree data model and is core for XML query lanaguage
  • 10.
    XSLT • XSLTspecifies rules with which an input XML document is transformed to • another XML document • an HTML document • plain text
  • 11.
    RDF • Itprovides a uniform framework for interchange of data and metadata between applications • XML does not provide any means of talking about the semantics (meaning) of data • Object-Attribute-Value
  • 12.
    RDF • FundamentalConcepts: • resources • properties • statements
  • 13.
    Statements • Triples • (x, P, y)  P(x, y)
  • 14.
    RDFS Core Classes • rdfs:Resource, the class of all resources • rdfs:Class, the class of all classes • rdfs:Literal, the class of all literals (strings) • rdf:Property, the class of all properties. • rdf:Statement, the class of all reified statements
  • 15.
    RDFS Core Properties • rdf:type, which relates a resource to its class • The resource is declared to be an instance of that class • rdfs:subClassOf, which relates a class to one of its superclasses • All instances of a class are instances of its superclass • rdfs:subPropertyOf, relates a property to one of its superproperties • rdfs:domain, • rdfs:range,
  • 16.
    RDF Sparql Query • SPARQL is based on matching graph patterns • Example: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?c WHERE { ?c rdf:type rdfs:Class . }
  • 17.
    RDFa • RDFais an extension to HTML5 that helps you markup things like People, Places, Events, Recipes and Reviews. Search Engines and Web Services use this markup to generate better search listings and give you better visibility on the Web, so that people can find your website more easily
  • 18.
    Microformats • microformatsare a set of simple, open data formats built upon existing and widely adopted standards
  • 19.
    MicroFormats Tools andCode generators
  • 23.
  • 24.
  • 25.
    Green Triples •an implementation of RDFa 1.1 for browsers • When triples are discovered in a web page, a little green turtle will appear in the address bar. If you click on that turtle, you can view the triple graph.
  • 28.
    References • “SemanticWeb Primer”, Grigoris Antoniou, Frank van Harmelen • http://rdfa.info/ • https://code.google.com/p/green-turtle/ • http://microformats.org/wiki/about

Editor's Notes

  • #5 Define procedural attachments –PI A piece of text that is to be ignored by parser– Comments Well Formed XML Syntactically correct documents Only one outermost element (called root element) Each element contains an opening and a corresponding closing tag Tags may not overlap <author><name>Lee Hong</author></name> Attributes within an element have unique names Element and tag names must be permissible The tree representation of an XML document is an ordered labeled tree: There is exactly one root There are no cycles Each non-root node has exactly one parent Each node has a label. The order of elements is important … but the order of attributes is not important
  • #6 An XML document is valid if it is well-formed respects the structuring information it uses
  • #7 We express that a lecturer element contains either a name element or a phone element as follows: <!ELEMENT lecturer (name|phone)> A lecturer element contains a name element and a phone element in any order. <!ELEMENT lecturer((name,phone)|(phone,name))>
  • #9 #REQUIRED Attribute must appear in every occurrence of the element type in the XML document #IMPLIED The appearance of the attribute is optional #FIXED "value" Every element must have this attribute "value" This specifies the default value for the attribute
  • #10 Significantly richer language for defining the structure of XML documents Tts syntax is based on XML itself not necessary to write separate tools Reuse and refinement of schemas Expand or delete already existent schemas Sophisticated set of data types, compared to DTDs (which only supports strings) Opening tag Element Types <element name="email"/> <element name="head" minOccurs="1" maxOccurs="1"/> <element name="to" minOccurs="1"/> Cardinality constraints: minOccurs="x" (default value 1) maxOccurs="x" (default value 1) Generalizations of *,?,+ offered by DTDs
  • #11 location is the address of the DTD or schema If a prefix is not specified: xmlns="location" then the location is used by default XPath is core for XML query languages Language for addressing parts of an XML document. It operates on the tree data model of XML It has a non-XML syntax
  • #12 The output document may use the same DTD or schema, or a completely different vocabulary XSLT can be used independently of the formatting language
  • #14 Resource:- Resource is object or thing authors, books, publishers URI=== URL or some unique identifier Properties: Describe relations between resources Written by, age, title etc. URI Value can be resource or literals