DM110 Emerging Web Media Dr. John Breslin [email_address] http://sw.deri.org/~jbreslin/ Week 10: Semantic Web / Web 3.0
What is the Semantic Web? Sir Tim Berners-Lee et al., Scientific American, 2001: “ An extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” “ Entrepreneurs see a Web guided by common sense”, John Markoff, New York Times, 2006 : “ Referred to as Web 3.0, the effort is in its infancy, and the very idea has given rise to skeptics who have called it an unobtainable vision. But the underlying technologies are rapidly gaining adherents, at big companies like IBM and Google as well as small ones.” Requires web pages to have  metadata  with underlying  ontologies
Where are we in the Semantic Web layer cake? You Are Here!
What is metadata? Metadata has been with us since the first librarian made a list of the items on a shelf of handwritten scrolls The term “meta” comes from a Greek word that denotes “alongside, with, after, next” More recent Latin and English usage would employ “meta” to denote something transcendental, or beyond nature Metadata can be thought of as “ data about data ” It is the Internet-age term for information that librarians traditionally have put into catalogues, and it most commonly refers to descriptive information about Web resources
Why do we need metadata? To provide a structured description of characteristics such as the meaning (semantics), content, structure and purpose of a resource To facilitate information sharing To enable more sophisticated search engines on the Internet To support intelligent agents and the pushing of data (e.g. from blog feeds) To minimise data loss or repetition To improve resource discovery by enabling field-based searches
Why does the Web need metadata? No metadata: Google Library analogy: Index every word in every page in every book Bad search results Lagging the growth and change in the Web Metadata (basic): Yahoo! Directory Library analogy: Categories Titles Descriptions Ratings Better results More work in classifying things and assigning properties!
What kind of resources, objects, things? HTML documents digital images databases books museum objects archival records metadata records collections services physical places people (using FOAF) abstract “works” concepts events
Who or what makes use of metadata? People: an owner managing resources a researcher seeking resources third-party services Software agents: aggregators (e.g. blog collections) “ portals” presenting “landscape” of data to users “ brokers” performing query tasks on behalf of users
What can they do with metadata? End user wants to: find identify select obtain/use interpret Third-party service may want to: disclose/promote enable and control access/use annotate re-contextualise
Metadata and ontologies Metadata elements are used to provide structure to the description of a resource: e.g. title, description, keywords, author, educational level, version, location, language, date created, etc. Further structure is provided by a metadata schema or ontology: For example, if there is metadata about a soccer team, an underlying ontology will say that a soccer team always has a goalkeeper and always has a manager, so each metadata entry for a soccer team should have that information
How is metadata created? By software tools: indexing robots, web crawlers from resource content, from server info  By people: descriptions added by resource creator/owner descriptions provided by third party services, specialist cataloguers or resource users  Creating (and maintaining) good quality metadata is not always cheap: may be rights issues for metadata as well as for resources
Where can you find metadata? Embedded within the coding for a resource itself: depends on format of resource can metadata be extracted from resource Linked to resource In a database of descriptions/repository of resources: may be remote database … Adopt approach which offers most flexibility: may need to “present” different subsets of full metadata in different contexts
What about metadata standards? Metadata standards are agreed-on criteria for describing data to support interoperability Simple example: January 31, 2006 31 janvier 2006 2006-01-31 01-31-2006 31012006 Need some consistent forms for exchanging metadata Many standards for different domains (Dublin Core, Warwick Framework, SCORM, IMS, ARIADNE, IEEE LOM, AICC, ADL SCORM, Merlot,  RDF ), so may also need mappings between these standards
What is RDF? On the Semantic Web, we use a standard called RDF to express metadata about resources, and RDF Schema to create metadata schemas or ontologies RDF stands for Resource Description Framework RDF is a framework for describing and interchanging Semantic Web metadata “ RDF is an infrastructure that enables the encoding, exchange, and reuse of structured metadata” -  Bearman et al., 1999
A typical full text search without RDF Web pages at the moment are mainly text, e.g. “ Stefan Decker works at DERI, funded by SFI. ” NLP not evolved enough to solve human problems, e.g. how can one find out Stefan’s funding agency? Google: “stefan decker +deri” Did I choose the right keywords? Look through results How do I know Google’s rankings are correct? Click on most likely link But is it really the best choice? Search through text for the answer The answer in the text is ambiguous…
Same example but with RDF metadata If we use RDF metadata in a web page, e.g. <Person> <name> Stefan Decker </name> <workplaceHomepage> http://www.deri.ie/ </workplaceHomepage> <fundedBy> http://www.sfi.ie/ </fundedBy> </Person> Now a computer can return an answer to a question such as “who funds Stefan Decker?” rather than requiring a combination of person plus computer to figure it out!
What does RDF consist of? Resources A resource is a thing you talk about (can reference) Resources have URIs (e.g. they may be web pages, a part of an XML document, etc.) Properties  Slots, define relationships to other resources or atomic values Statements “ Resource has Property with Value” (expressed as a Subject / Predicate / Object statement) Values can be resources or atomic XML data (e.g. “literal” string) Frames A straightforward way to express these abstract properties in XML
A simple RDF example Statement: “ Ora Lassila is the creator of the resource (web page) http://www.w3.org/Home/Lassila” Structure: Resource (subject) http://www.w3.org/Home/Lassila Property (predicate)  http://www.schema.org/#Creator Value (object)  &quot;Ora Lassila” Directed graph: http://www.w3.org/Home/Lassila s:Creator Ora Lassila
Simple RDF example shown in RDF/XML In the directed graphs, the arrows point from the subject to the object, and the text on the arrow is the predicate The ellipses are resources and the rectangles are literals or text strings We can also represent this graph model in RDF/XML: <rdf:Description about=“ http://www.w3.org/Home/Lassila ”> <Creator> Ora Lassila </Creator> </rdf:Description>
Expanding on the previous example To add properties to the “Creator”, point through an intermediate resource (the ellipses are resources and the rectangles are literals or text strings) http://www.w3.org/Home/Lassila s:Creator Person://fi/654645635 Name Ora Lassila [email_address] Email
Expanded RDF example shown in RDF/XML <rdf:Description about=“ http://www.w3.org/Home/Lassila ”> <Creator rdf:resource=“ Person://fi/654645635 ”/> </rdf:Description> <rdf:Description about=“ Person://fi/654645635 ”> <Name> Ora Lassila </Name> <Email> [email_address] </Email> </rdf:Description>
What is an ontology? In a nutshell, ontologies are formal and consensual specifications of conceptualisations that provide a shared and common understanding of a domain Ontologies define the terms used to describe and represent an area of knowledge Ontologies are a key enabling technology for the Semantic Web They interweave human understanding of symbols with their machine processability
Ontologies on the Semantic Web Semantic Web ontologies have computer-usable definitions: Concepts (AKA classes) are general things in the domain: Person, Document, Book, Web_Page Relationships exist among things: Book, Web_Page are subclasses of Document Properties (attributes) that things may have: Person has an age, Web_Page has a creation_date
Ontology structures From: http://aot.ce.unipr.it/team/poggi/teaching/ia/docs/Ontology.pdf
Why use ontologies? Labeling: If I say “car” and you say “automobile”, how do we know we mean the same thing? Semantics: If I say “vehicle”, how do you know if this includes buses, powered motorcycles? Knowledge sharing and reuse: Need to be able to create definitions of terms in a machine-understandable format Systematic categorisation and computation requires systematic representation: Systematic representation corresponds to an ontology
What is a concept? Concepts or “classes”: Are in general language independent (the words ‘university’ and ‘ollscoil’ denote the same concept) Are mental or logical representations of reality Are related to other concepts Do not need symbols but hold them for means of communication A concept has: Intension, i.e. meaning Extension, i.e. a set of objects that the concept refers to On the difference between intension and extension, consider phrases &quot;Evening Star&quot; and &quot;Morning Star&quot; that have different meanings (intension) yet both refer to planet Venus (extension) Ontology is mainly concerned with intension
Components of an ontology Concepts Cat Dog Properties Length Age Constraints Cardinality is at least 1 Maximum value is 300 Axioms Cows are larger than dogs Cats cannot eat only vegetation Relationships Is a Part of
An ontology example in RDF <rdf:Description ID=“ Document &quot;> <rdf:type resource=&quot;http://www.w3.org/...#Class&quot;/> <rdfs:subClassOf rdf:resource=&quot;http://www.w3.org/...#Resource&quot;/> </rdf:Description> <rdf:Description ID=“ Book &quot;> <rdf:type resource=&quot;http://www.w3.org/...#Class&quot;/> <rdfs:subClassOf rdf:resource=&quot;# Document &quot;/> </rdf:Description>
Implementing or creating ontologies Implementation consists in defining all the ontology components through an ontology definition language Generally in two stages: Informal stage: Ontology is sketched out using either natural language descriptions or some diagram technique Formal stage: Ontology is encoded in a formal knowledge representation language, that is machine computable Different tools (e.g., Protégé) may help in the implementation
Can already describe lots of things semantically Geographic coordinates: GEO Library books: Dublin Core (DC) Online discussions: SIOC People, social networks: Friend-of-a-Friend (FOAF) Maybe even hormones! GeneOnt
The power of the Semantic Web Interoperability  and increased connectivity is possible through a commonality of expression Vocabularies can be combined and  used together : e.g. a description of a book using Dublin Core metadata can be augmented with specifics about the book author using the Friend-of-a-Friend vocabulary Vocabularies can be  easily extended  (modules, etc.) Intelligent search  with more granularity and relevance: e.g. a search can be personalised to an individual by making use of their identity and relationship information
The challenge for the Semantic Web The Semantic Web can’t work all by itself: If it did it would be called the “Magic Web” It will need some help to become a reality For example, it is not very likely that you will be able to sell your car just by putting your RDF file on the Web Need society-scale applications: Consumers and processors of Semantic Web data Semantic Web agents or services More advanced collaborative applications that make real use of shared data and annotations
The path to Web 3.0 The Semantic Web effort is mainly towards producing standards and recommendations that will  interlink applications The Web 2.0 meme (already discussed) is about  providing user applications Not mutually exclusive: http://www.oreillynet.com/xml/blog/2005/10/is_web_20_killing_the_semantic.html With a little effort, many Web 2.0 applications can and do use Semantic Web technologies to great benefit
Semantic Web + Web 2.0 = Web 3.0 Web 2.0 applications such as  blogging  and  wikis  have become very popular and at the same time have created an interconnected information space (through the “blogosphere” and inter-wiki links) At the same time, these applications are experiencing  boundaries in terms of information dissemination and automation , as they require increased levels of automation (i.e. more automated ways for information distribution) The Semantic Web is  increasingly aiming at these applications areas : Semantic Wikis, Semantic Desktops, etc.

DM110 - Week 10 - Semantic Web / Web 3.0

  • 1.
    DM110 Emerging WebMedia Dr. John Breslin [email_address] http://sw.deri.org/~jbreslin/ Week 10: Semantic Web / Web 3.0
  • 2.
    What is theSemantic Web? Sir Tim Berners-Lee et al., Scientific American, 2001: “ An extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” “ Entrepreneurs see a Web guided by common sense”, John Markoff, New York Times, 2006 : “ Referred to as Web 3.0, the effort is in its infancy, and the very idea has given rise to skeptics who have called it an unobtainable vision. But the underlying technologies are rapidly gaining adherents, at big companies like IBM and Google as well as small ones.” Requires web pages to have metadata with underlying ontologies
  • 3.
    Where are wein the Semantic Web layer cake? You Are Here!
  • 4.
    What is metadata?Metadata has been with us since the first librarian made a list of the items on a shelf of handwritten scrolls The term “meta” comes from a Greek word that denotes “alongside, with, after, next” More recent Latin and English usage would employ “meta” to denote something transcendental, or beyond nature Metadata can be thought of as “ data about data ” It is the Internet-age term for information that librarians traditionally have put into catalogues, and it most commonly refers to descriptive information about Web resources
  • 5.
    Why do weneed metadata? To provide a structured description of characteristics such as the meaning (semantics), content, structure and purpose of a resource To facilitate information sharing To enable more sophisticated search engines on the Internet To support intelligent agents and the pushing of data (e.g. from blog feeds) To minimise data loss or repetition To improve resource discovery by enabling field-based searches
  • 6.
    Why does theWeb need metadata? No metadata: Google Library analogy: Index every word in every page in every book Bad search results Lagging the growth and change in the Web Metadata (basic): Yahoo! Directory Library analogy: Categories Titles Descriptions Ratings Better results More work in classifying things and assigning properties!
  • 7.
    What kind ofresources, objects, things? HTML documents digital images databases books museum objects archival records metadata records collections services physical places people (using FOAF) abstract “works” concepts events
  • 8.
    Who or whatmakes use of metadata? People: an owner managing resources a researcher seeking resources third-party services Software agents: aggregators (e.g. blog collections) “ portals” presenting “landscape” of data to users “ brokers” performing query tasks on behalf of users
  • 9.
    What can theydo with metadata? End user wants to: find identify select obtain/use interpret Third-party service may want to: disclose/promote enable and control access/use annotate re-contextualise
  • 10.
    Metadata and ontologiesMetadata elements are used to provide structure to the description of a resource: e.g. title, description, keywords, author, educational level, version, location, language, date created, etc. Further structure is provided by a metadata schema or ontology: For example, if there is metadata about a soccer team, an underlying ontology will say that a soccer team always has a goalkeeper and always has a manager, so each metadata entry for a soccer team should have that information
  • 11.
    How is metadatacreated? By software tools: indexing robots, web crawlers from resource content, from server info By people: descriptions added by resource creator/owner descriptions provided by third party services, specialist cataloguers or resource users Creating (and maintaining) good quality metadata is not always cheap: may be rights issues for metadata as well as for resources
  • 12.
    Where can youfind metadata? Embedded within the coding for a resource itself: depends on format of resource can metadata be extracted from resource Linked to resource In a database of descriptions/repository of resources: may be remote database … Adopt approach which offers most flexibility: may need to “present” different subsets of full metadata in different contexts
  • 13.
    What about metadatastandards? Metadata standards are agreed-on criteria for describing data to support interoperability Simple example: January 31, 2006 31 janvier 2006 2006-01-31 01-31-2006 31012006 Need some consistent forms for exchanging metadata Many standards for different domains (Dublin Core, Warwick Framework, SCORM, IMS, ARIADNE, IEEE LOM, AICC, ADL SCORM, Merlot, RDF ), so may also need mappings between these standards
  • 14.
    What is RDF?On the Semantic Web, we use a standard called RDF to express metadata about resources, and RDF Schema to create metadata schemas or ontologies RDF stands for Resource Description Framework RDF is a framework for describing and interchanging Semantic Web metadata “ RDF is an infrastructure that enables the encoding, exchange, and reuse of structured metadata” - Bearman et al., 1999
  • 15.
    A typical fulltext search without RDF Web pages at the moment are mainly text, e.g. “ Stefan Decker works at DERI, funded by SFI. ” NLP not evolved enough to solve human problems, e.g. how can one find out Stefan’s funding agency? Google: “stefan decker +deri” Did I choose the right keywords? Look through results How do I know Google’s rankings are correct? Click on most likely link But is it really the best choice? Search through text for the answer The answer in the text is ambiguous…
  • 16.
    Same example butwith RDF metadata If we use RDF metadata in a web page, e.g. <Person> <name> Stefan Decker </name> <workplaceHomepage> http://www.deri.ie/ </workplaceHomepage> <fundedBy> http://www.sfi.ie/ </fundedBy> </Person> Now a computer can return an answer to a question such as “who funds Stefan Decker?” rather than requiring a combination of person plus computer to figure it out!
  • 17.
    What does RDFconsist of? Resources A resource is a thing you talk about (can reference) Resources have URIs (e.g. they may be web pages, a part of an XML document, etc.) Properties Slots, define relationships to other resources or atomic values Statements “ Resource has Property with Value” (expressed as a Subject / Predicate / Object statement) Values can be resources or atomic XML data (e.g. “literal” string) Frames A straightforward way to express these abstract properties in XML
  • 18.
    A simple RDFexample Statement: “ Ora Lassila is the creator of the resource (web page) http://www.w3.org/Home/Lassila” Structure: Resource (subject) http://www.w3.org/Home/Lassila Property (predicate) http://www.schema.org/#Creator Value (object) &quot;Ora Lassila” Directed graph: http://www.w3.org/Home/Lassila s:Creator Ora Lassila
  • 19.
    Simple RDF exampleshown in RDF/XML In the directed graphs, the arrows point from the subject to the object, and the text on the arrow is the predicate The ellipses are resources and the rectangles are literals or text strings We can also represent this graph model in RDF/XML: <rdf:Description about=“ http://www.w3.org/Home/Lassila ”> <Creator> Ora Lassila </Creator> </rdf:Description>
  • 20.
    Expanding on theprevious example To add properties to the “Creator”, point through an intermediate resource (the ellipses are resources and the rectangles are literals or text strings) http://www.w3.org/Home/Lassila s:Creator Person://fi/654645635 Name Ora Lassila [email_address] Email
  • 21.
    Expanded RDF exampleshown in RDF/XML <rdf:Description about=“ http://www.w3.org/Home/Lassila ”> <Creator rdf:resource=“ Person://fi/654645635 ”/> </rdf:Description> <rdf:Description about=“ Person://fi/654645635 ”> <Name> Ora Lassila </Name> <Email> [email_address] </Email> </rdf:Description>
  • 22.
    What is anontology? In a nutshell, ontologies are formal and consensual specifications of conceptualisations that provide a shared and common understanding of a domain Ontologies define the terms used to describe and represent an area of knowledge Ontologies are a key enabling technology for the Semantic Web They interweave human understanding of symbols with their machine processability
  • 23.
    Ontologies on theSemantic Web Semantic Web ontologies have computer-usable definitions: Concepts (AKA classes) are general things in the domain: Person, Document, Book, Web_Page Relationships exist among things: Book, Web_Page are subclasses of Document Properties (attributes) that things may have: Person has an age, Web_Page has a creation_date
  • 24.
    Ontology structures From:http://aot.ce.unipr.it/team/poggi/teaching/ia/docs/Ontology.pdf
  • 25.
    Why use ontologies?Labeling: If I say “car” and you say “automobile”, how do we know we mean the same thing? Semantics: If I say “vehicle”, how do you know if this includes buses, powered motorcycles? Knowledge sharing and reuse: Need to be able to create definitions of terms in a machine-understandable format Systematic categorisation and computation requires systematic representation: Systematic representation corresponds to an ontology
  • 26.
    What is aconcept? Concepts or “classes”: Are in general language independent (the words ‘university’ and ‘ollscoil’ denote the same concept) Are mental or logical representations of reality Are related to other concepts Do not need symbols but hold them for means of communication A concept has: Intension, i.e. meaning Extension, i.e. a set of objects that the concept refers to On the difference between intension and extension, consider phrases &quot;Evening Star&quot; and &quot;Morning Star&quot; that have different meanings (intension) yet both refer to planet Venus (extension) Ontology is mainly concerned with intension
  • 27.
    Components of anontology Concepts Cat Dog Properties Length Age Constraints Cardinality is at least 1 Maximum value is 300 Axioms Cows are larger than dogs Cats cannot eat only vegetation Relationships Is a Part of
  • 28.
    An ontology examplein RDF <rdf:Description ID=“ Document &quot;> <rdf:type resource=&quot;http://www.w3.org/...#Class&quot;/> <rdfs:subClassOf rdf:resource=&quot;http://www.w3.org/...#Resource&quot;/> </rdf:Description> <rdf:Description ID=“ Book &quot;> <rdf:type resource=&quot;http://www.w3.org/...#Class&quot;/> <rdfs:subClassOf rdf:resource=&quot;# Document &quot;/> </rdf:Description>
  • 29.
    Implementing or creatingontologies Implementation consists in defining all the ontology components through an ontology definition language Generally in two stages: Informal stage: Ontology is sketched out using either natural language descriptions or some diagram technique Formal stage: Ontology is encoded in a formal knowledge representation language, that is machine computable Different tools (e.g., Protégé) may help in the implementation
  • 30.
    Can already describelots of things semantically Geographic coordinates: GEO Library books: Dublin Core (DC) Online discussions: SIOC People, social networks: Friend-of-a-Friend (FOAF) Maybe even hormones! GeneOnt
  • 31.
    The power ofthe Semantic Web Interoperability and increased connectivity is possible through a commonality of expression Vocabularies can be combined and used together : e.g. a description of a book using Dublin Core metadata can be augmented with specifics about the book author using the Friend-of-a-Friend vocabulary Vocabularies can be easily extended (modules, etc.) Intelligent search with more granularity and relevance: e.g. a search can be personalised to an individual by making use of their identity and relationship information
  • 32.
    The challenge forthe Semantic Web The Semantic Web can’t work all by itself: If it did it would be called the “Magic Web” It will need some help to become a reality For example, it is not very likely that you will be able to sell your car just by putting your RDF file on the Web Need society-scale applications: Consumers and processors of Semantic Web data Semantic Web agents or services More advanced collaborative applications that make real use of shared data and annotations
  • 33.
    The path toWeb 3.0 The Semantic Web effort is mainly towards producing standards and recommendations that will interlink applications The Web 2.0 meme (already discussed) is about providing user applications Not mutually exclusive: http://www.oreillynet.com/xml/blog/2005/10/is_web_20_killing_the_semantic.html With a little effort, many Web 2.0 applications can and do use Semantic Web technologies to great benefit
  • 34.
    Semantic Web +Web 2.0 = Web 3.0 Web 2.0 applications such as blogging and wikis have become very popular and at the same time have created an interconnected information space (through the “blogosphere” and inter-wiki links) At the same time, these applications are experiencing boundaries in terms of information dissemination and automation , as they require increased levels of automation (i.e. more automated ways for information distribution) The Semantic Web is increasingly aiming at these applications areas : Semantic Wikis, Semantic Desktops, etc.