The 2009 Semantic Web Landscape
Technologies, tools, and projects
                                         Lee Feigenbaum
...
Thanks Upfront
Much material & wisdom used with gracious
 permission of:
 Ivan Herman
         W3C Semantic Web Activity L...
Thanks Upfront
Much material & wisdom used with gracious
 permission of:
 Michael Hausenblas
         Evangelist for RDFa,...
Executive Summary: The Semantic
Web in 2009
                 The Semantic Web in 2009 is characterized by a healthy
      ...
Agenda
   Introduction
   The data model (RDF)
   The query language (SPARQL)
   Adding structure & semantics (RDFS, OWL, ...
A Motivating Example: Drug Discovery
 The W3C HCLS interest group set out to use
 Semantic Web technologies to receive pre...
General search
               223,000 hits, 0 results




May 12, 2009                             7
Domain-limited search
               2,580 potential results




May 12, 2009                             8
Specific databases
               Too many silos!




May 12, 2009                     9
A Semantic Web Approach

Integrate disparate databases…

   MeSH
   PubMed
   Entrez Gene
   Gene Ontology
   …



May 12,...
A Semantic Web Approach (cont’d)
…so that one query…




May 12, 2009                   11
A Semantic Web Approach (cont’d)
…(trivially) spans several databases…




May 12, 2009                            12
A Semantic Web Approach (cont’d)
…to deliver targeted results…




May 12, 2009                    13
What’s the trick?


        1. Agreement on common terms and
           relationships
        2. Incremental, flexible dat...
Names




May 12, 2009   15
Branding
    Semantic Web
    Web of Data
    Giant Global Graph
    Data Web
    Web 3.0
    Linked Data Web
    Semantic...
What is it & why do we care? (1)
   “The Semantic Web”
         Augments the World Wide Web
         Represents the Web’s ...
What is it & why do we care? (2)
   “Semantic Web technologies”
         A family of technology standards that ‘play nice
...
A Common & Coherent Set of Technology
Standards



   A common set of technologies:
         ...enables diverse uses
     ...
The (In)Famous Layer Cake




May 12, 2009                20
Semantic Web Technology Timeline

               2001   2004   2007     2008   2009
1999




                             ...
2009: Where we are
As technologies & tools have evolved, Semantic
Web advocates have progressed through stages:
          ...
2009: Where we are (cont’d)
                    http://www.w3.org/2001/sw/sweo/public/UseCases/




May 12, 2009          ...
2009: Where we’re not
                                                Image from Trey Ideker via Enoch Huang




       Se...
2009: Where we’re not (cont’d)

                                                         XML vs. RDF?
     “Ontology” vs.
...
2009: Where we’re not (cont’d)




      People with appropriate skill sets for designing & building Semantic
            ...
2009: Where we’re not (cont’d)




       We don’t yet have standard solutions for privacy, trust, probability,
          ...
Introduction to the Semantic Web
approach

       How does a Semantic Web approach help us
     merge data sets, infer new...
The rough structure of data integration

 1. Map the various data onto an abstract data
    representation
           Make...
Data set “A”: A simplified book store
Books
          ID            Author            Title            Publisher     Year
...
st:
1        Export your data as a set of relations




 May 12, 2009                                31
Some notes on the data export
     Data export does not necessarily mean
     physical conversion of the data
           R...
Data set “F”: Another book store’s data
                A             B          D               E
                       ...
2nd: Export your second set of data




May 12, 2009                          34
3rd: start merging your data




May 12, 2009                   35
3rd: start merging your data (cont’d)




May 12, 2009                        36
4th: Merge identical resources




May 12, 2009                     37
Start making queries…
   User of data set “F” can now ask queries like:
         “What is the title of the original versio...
5th: Query the merged data set




May 12, 2009                     39
However, more can be achieved…
   We “know” that a:author and f:auteur are
   really the same
   But our automatic merge d...
3rd revisited: Use the extra knowledge




May 12, 2009                             41
Start making richer queries!
   User of data set “F” can now query:
         “What is the home page of Le Palais des miroi...
6th: Richer queries




May 12, 2009          43
Bring in other data sources
   We can integrate new information into our
   merged data set from other sources
         e....
7th: Merge with Wikipedia data




May 12, 2009                     45
7th (cont’d): Merge with Wikipedia data




May 12, 2009                          46
7th (cont’d): Merge with Wikipedia data




May 12, 2009                          47
Is that surprising?
   It may look like it but, in fact, it should not be…
   What happened via automatic means is done
  ...
What did we do?
   We combined different data sets that
         ...may be internal or somewhere on the Web
         ...ar...
What did we do? (cont’d)




May 12, 2009               50
The abstraction pays off because…
   …the graph representation is independent of
   the details of the native structures
 ...
So where is the Semantic Web?

               Semantic Web technologies make such integration possible




   The rest of ...
Agenda
   Introduction
   The data model (RDF)
   The query language (SPARQL)
   Adding structure & semantics (RDFS, OWL, ...
RDF is…




               Resource Description Framework




May 12, 2009                                    54
RDF is…




               The data model of the Semantic Web.




May 12, 2009                                         55
RDF is…



            A schema-less data model that features
          unambiguous identifiers and named relations
      ...
RDF is…
         A labeled, directed graph of relations between
                  resources and literal values.

   RDF gr...
Example RDF triples
   “Lee Feigenbaum works for Cambridge
   Semantics”
                              works for
         ...
Triples connect to form graphs
                                             works for
                             Lee    ...
Why RDF? What’s different here?
   The graph data structure makes merging data
   with shared identifiers trivial (as we s...
Why RDF? Incremental Integration
                                                      Agile,
               Flexible
    ...
Types of RDF Tools
   Triple stores
         Built on relational database
         Native RDF store
   Development librari...
Finding RDF Tools
   Community-maintained lists
         http://esw.w3.org/topic/SemanticWebTools
   Emphasis on large tri...
RDF Tools – (Some) Triple Stores
                          Commercial or
                   Tool                     Envir...
Agenda
   Introduction
   The data model (RDF)
   The query language (SPARQL)
   Adding structure & semantics (RDFS, OWL, ...
Motivating SPARQL



       With a query language, a client can design their
                       own interface.
       ...
SPARQL is…




               SPARQL Protocol And RDF Query Language




May 12, 2009                                     ...
SPARQL is…




               The query language of the Semantic Web.




May 12, 2009                                    ...
SPARQL is…




           A SQL-like language for querying sets of RDF
                             graphs.




May 12, 20...
SPARQL is…



               A simple protocol for issuing queries and
                   receiving results over HTTP. So…...
Why SPARQL?
SPARQL lets us:
  Pull information from structured and semi-
  structured data.
  Explore data by discovering ...
Dealer 2
                      Dealer 3
Dealer 1
                                                                         ...
SPARQL Example: Querying Wikipedia
    Find me all landlocked countries with a population
                  greater than 1...
SPARQL Example: Querying Wikipedia




           DBPedia SPARQL Endpoint
SPARQL Example: Querying Wikipedia
Types of SPARQL Tools
   Query engines
         Things that can run queries
         Most RDF stores provide a SPARQL engi...
Finding SPARQL Tools
   Community-maintained list of query engines
         http://esw.w3.org/topic/SparqlImplementations
...
(Some) SPARQL’able Data Sets




May 12, 2009                   78
bio2rdf.org – querying life sciences data




May 12, 2009                            79
bio2rdf.org – querying life sciences data




May 12, 2009                            80
Agenda
   Introduction
   The data model (RDF)
   The query language (SPARQL)
   Adding structure & semantics (RDFS, OWL, ...
Where’s the magic?
   We haven’t seen anything yet that begins to
   approach the long-term Semantic Web vision




May 12...
From the explicit to the inferred

   3 pieces of the Semantic Web technology stack
   are about describing a domain well ...
RDFS is…




               RDF Schema




May 12, 2009                84
RDF Schema is…
   Elements of:
         Vocabulary (defining terms)
               I define a relationship called “prescri...
WOL OWL is…




               Web Ontology Language




May 12, 2009                           86
OWL is…
   Elements of ontology
         Same/different identity
               “author” and “auteur” are the same relatio...
What can we do with OWL?
   Answer questions of
         Consistency
               Are there any contradictions in this m...
Building Useful Ontologies
           Developing and maintaining quality ontolgies is very challenging
           Users ne...
Building Useful Ontologies
  Developing and maintaining quality ontolgies is very challenging
  Users need tools and servi...
Building Useful Ontologies
  Developing and maintaining quality ontolgies is very challenging
  Users need tools and servi...
Example: SNOMED
   Large: 373,731 concepts & over 1 million terms
   NHS version extended to 542,380 classes with
        ...
Example: SNOMED




May 12, 2009      93
RIF is…




               Rules Interchange Format




May 12, 2009                              97
RIF is…
   Standard representation for exchanging sets of logical
   and business rules
   Logical rules
         A buyer ...
Developing Tools and Infrastructure

   Editors/environments
         Oiled, Protégé, Swoop, TopBraid, Ontotrack, …




Ma...
Developing Tools and Infrastructure

   Editors/environments
         Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
   Rea...
Visualizing and Publishing Vocabularies




May 12, 2009                         101
Reusable, public ontologies

                                                      FOAF




               The Event Ontol...
Agenda
   Introduction
   The data model (RDF)
   The query language (SPARQL)
   Adding structure & semantics (RDFS, OWL, ...
Fantasy Land Architecture


                                           Ontology /

                               +       ...
Reality

                               Internet
                                                               DB2
      ...
GRDDL is…



        Gleaning Resource Descriptions from Dialects of
                          Language




May 12, 2009  ...
GRDDL is…

         A method for authoritatively getting RDF data
              from XML and XHTML documents.




May 12, ...
GRDDL is…

          A mechanism for authoritatively deriving RDF
             data from families of XML and XHTML
       ...
GRDDL tools
        Most GRDDL tools are adapters to existing RDF
         stores or SPARQL engines to allow loading or
  ...
RDB2RDF is…




               Relational Database to RDF




May 12, 2009                                110
RDB2RDF is…



         A proposed W3C Working Group to define a
       standard way to map from relational databases
    ...
RDF2RDB tools
   Survey of existing approaches:
         http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf...
What about… everything else?



      Standards don’t yet exist, but many tools exist to
       derive RDF and/or run SPAR...
LDAP Directories




                          Squirrel RDF
               http://jena.sourceforge.net/SquirrelRDF/


May ...
Excel spreadsheets




                                  Anzo for Excel
               http://www.cambridgesemantics.com/p...
Excel spreadsheets




                    Semantic Discovery System
               http://insilicodiscovery.com/installat...
Web-based data sources




                            Virtuoso Sponger Cartridges
               http://virtuoso.openlink...
Unstructured Text




                       Calais
               http://www.opencalais.com/


May 12, 2009              ...
Unstructured Text




               Zemanta Web Service
               http://developer.zemanta.com/




May 12, 2009    ...
Agenda
   Introduction
   The data model (RDF)
   The query language (SPARQL)
   Adding structure & semantics (RDFS, OWL, ...
Linked Data is…
   A simple set of 4 guidelines for publishing RDF data on the
   Web (over HTTP)
         Developed by Ti...
The Linking Open Data Project is...

 A community project started within the W3C
 Semantic Web Education & Outreach group ...
The LOD “cloud”, May 2007




May 12, 2009                123
The LOD “cloud”, March 2008




May 12, 2009                  124
The LOD “cloud”, September 2008




May 12, 2009                      125
The LOD “cloud”, March 2009




May 12, 2009                  126
Application specific portions of the cloud
 Notably, bio-related data sets (in light purple)
      some by the W3C “Linkin...
Sindice - Another view of data on the Web




 May 12, 2009                        128
Tools: Publishing linked data
   Many tools we’ve already seen publish RDF
   data according to linked data principles
   ...
Tools: the Data Browser
               World Wide Web : Web pages :: The Semantic Web : Data




      World Wide Web : We...
Tabulator: Generic Data Browser




May 12, 2009                      131
Disco Hyperdata Browser




May 12, 2009              132
OpenLink Data Explorer




May 12, 2009             133
Marbles Linked Data Browser




May 12, 2009                  134
DBPedia Mobile




May 12, 2009     135
DBPedia Mobile




May 12, 2009     136
DBPedia Mobile




May 12, 2009     137
DBPedia Mobile




May 12, 2009     138
QDOS – your online digital status




May 12, 2009                        139
BBC Music Beta




May 12, 2009     140
Producer-oriented Web to consumer-
oriented Web
   On the current Web…
         Content publishers decide what can be done...
UltraLink
     UltraLink is Novartis’s solution for cross-linking over 1,500,000 biologic
       and chemical terms, inclu...
UltraLink
   What if an acquisition brings with it a new
   Web-based corpus of pathway data that uses
   terms not recogn...
RDFa is…




               RDF in Attributes




May 12, 2009                       144
RDFa is…




      A collection of HTML attributes that allow RDF to
             be embedded directly in Web pages.




M...
Why RDFa?
   Don’t Repeat Yourself (DRY)
   In-context metadata (copy & paste)
   Authoritative (no screen scrapig)




Ma...
Who’s using RDFa?

                    STW Thesaurus for Economics




May 12, 2009                                147
RDFa in action




May 12, 2009     148
POWDER is…




               Protocol for Web Description Resources




May 12, 2009                                     ...
http://www.slideshare.net/fabien_gandon/powder-in-a-nutshell-presentation




       descriptions applied to
groups of onl...
many
              resources




one
description          151
grouping mechanisms...

     ... list URIs
    ... domain names, paths
     ... regular expressions on URIs




          ...
descriptions
may be grouped




queries
are on individual resources


                              153
description…
• Which resources does the DR describe?
• What is the description?
• Who has created the description?
• When ...
in order to...
 adapt
               authorize
protect
               trust
search
               monitor


              ...
Thanks & Questions



               lee@cambridgesemantics.com




May 12, 2009                                156
Upcoming SlideShare
Loading in...5
×

Semantic Web Landscape 2009

76,163

Published on

These slides were originally a tutorial presented for the SIG preceding the May 2009 meeting of the PRISM Forum.

They attempt to give a survey of the technologies, tools, and state of the world with respect to the Semantic Web as of the first half of 2009.

Published in: Education, Technology
7 Comments
89 Likes
Statistics
Notes
No Downloads
Views
Total Views
76,163
On Slideshare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
1,866
Comments
7
Likes
89
Embeds 0
No embeds

No notes for slide
  • Susie StephensBen AdidaEric Prud’hommeauxChris Bizer, Chris Becker
  • Executive summary.
  • Courtesy W3C SWEO group, http://linkeddata.org/docs/eswc2007-poster-linking-open-data.pdf
  • http://linkeddata.org/tools
  • http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/SemWebClients
  • See http://beckr.org/DBpediaMobile/ and http://wiki.dbpedia.org/DBpediaMobile
  • One of the goals of this tutorial is to de-mystify the all of the names of technologies, tools, projects, etc. that swirl around the Semantic Web story.And since I saw that as I researched this presentation, everyone seems to like this particular Gary Larson cartoon, it behooved me to include it.
  • Thanks to Fabien Gandon for the POWDER slides: http://www.slideshare.net/fabien_gandon/powder-in-a-nutshell-presentation
  • Thanks to Fabien Gandon for the POWDER slides: http://www.slideshare.net/fabien_gandon/powder-in-a-nutshell-presentation
  • Thanks to Fabien Gandon for the POWDER slides: http://www.slideshare.net/fabien_gandon/powder-in-a-nutshell-presentation
  • Thanks to Fabien Gandon for the POWDER slides: http://www.slideshare.net/fabien_gandon/powder-in-a-nutshell-presentation
  • Thanks to Fabien Gandon for the POWDER slides: http://www.slideshare.net/fabien_gandon/powder-in-a-nutshell-presentation
  • Thanks to Fabien Gandon for the POWDER slides: http://www.slideshare.net/fabien_gandon/powder-in-a-nutshell-presentation
  • The good – emphasize the importance of the foundational layers (URIs and RDF) ; emphasizes the long-term roadmap/vision of what’s needed for the Semantic WebThe bad – implies that perhaps things can’t be taken serious until all the pieces are in place ; implies an order to the research ; various versions of the cake tell different stories (importance of XML, absence of query, lack of UI/application layer, …)Valentin Zacharias wrote about the “infamy” part of the layer cake here: http://www.valentinzacharias.de/blog/2007/04/ban-semantic-web-layer-cake.html
  • http://www.w3.org/2001/sw/sweo/public/UseCases/
  • Definition.
  • Prescriptive.
  • Descriptive.
  • Formal.
  • The first is as opposed to relational tables or XML schemas where the schema needs to be explicitly adjusted to accommodate whatever data is being merged.The second is due to the expressivity of the model – can handle lists, trees, n-ary relations, etc.The third is as opposed to table & column identifiers or XML attribute names.
  • Quotation from http://xtech06.usefulinc.com/schedule/paper/61
  • Definition.
  • Prescriptive.
  • Descriptive.
  • Descriptive (part 2). This is leagues ahead of the situation with SQL!
  • To run for real: http://dbpedia.org/sparqlPREFIX type: <http://dbpedia.org/class/yago/>PREFIX prop: <http://dbpedia.org/property/>SELECT ?country_name ?populationWHERE { ?country a type:LandlockedCountries ;rdfs:label ?country_name ;prop:populationEstimate ?population . FILTER (?population > 15000000 && langMatches(lang(?country_name), \"EN\")) .} ORDER BY DESC(?population)
  • http://bio2rdf.org/
  • http://bio2rdf.org/
  • Definition.
  • Definition.
  • Thanks to BijanParsia for much of this material http://www.cs.man.ac.uk/~bparsia/2009/comp60462/17-03-casestudies.pdf
  • Semantic Web Landscape 2009

    1. 1. The 2009 Semantic Web Landscape Technologies, tools, and projects Lee Feigenbaum VP Technology & Standards, Cambridge Semantics Co-chair, W3C SPARQL Working Group For PRISM Forum SIG on Semantic Web May 12, 2009
    2. 2. Thanks Upfront Much material & wisdom used with gracious permission of: Ivan Herman W3C Semantic Web Activity Lead Bijan Parsia Co-editor of the core OWL 2 specification Ian Horrocks Co-chair of the W3C OWL 2 Working Group Phil Archer Chair of the W3C POWDER Working Group May 12, 2009 2
    3. 3. Thanks Upfront Much material & wisdom used with gracious permission of: Michael Hausenblas Evangelist for RDFa, Linked Data, and Multimedia Semantics Fabien Gandon Member, GRDDL and OWL 2 Working Groups Susie Stephens Co-chair W3C HCLS Interest Group Eric Prud’hommeaux W3C team member, Semantic Web expert May 12, 2009 3
    4. 4. Executive Summary: The Semantic Web in 2009 The Semantic Web in 2009 is characterized by a healthy environment of stable, broadly-implemented core standard technologies complemented by a number of continually emerging new standards. Adopters of Semantic Web technologies in 2009 can choose from a wide range of commercial and open-source interoperable tools and systems. Enterprise Semantic Web projects are beginning to move beyond proofs of concept to serious production implementations. Community projects on the World Wide Web have linked hundreds of public data sets into an emergent Semantic Web. May 12, 2009 4
    5. 5. Agenda Introduction The data model (RDF) The query language (SPARQL) Adding structure & semantics (RDFS, OWL, RIF) Working in the real world (GRDDL, RDF2RDB) Working on the Web (Linked Data, RDFa, POWDER) May 12, 2009 5
    6. 6. A Motivating Example: Drug Discovery The W3C HCLS interest group set out to use Semantic Web technologies to receive precise answers to a complex question: Find me genes involved in signal transduction that are related to pyramidal neurons. May 12, 2009 6
    7. 7. General search 223,000 hits, 0 results May 12, 2009 7
    8. 8. Domain-limited search 2,580 potential results May 12, 2009 8
    9. 9. Specific databases Too many silos! May 12, 2009 9
    10. 10. A Semantic Web Approach Integrate disparate databases… MeSH PubMed Entrez Gene Gene Ontology … May 12, 2009 10
    11. 11. A Semantic Web Approach (cont’d) …so that one query… May 12, 2009 11
    12. 12. A Semantic Web Approach (cont’d) …(trivially) spans several databases… May 12, 2009 12
    13. 13. A Semantic Web Approach (cont’d) …to deliver targeted results… May 12, 2009 13
    14. 14. What’s the trick? 1. Agreement on common terms and relationships 2. Incremental, flexible data structure 3. Good-enough modeling 4. Query interface tailored to the data model May 12, 2009 14
    15. 15. Names May 12, 2009 15
    16. 16. Branding Semantic Web Web of Data Giant Global Graph Data Web Web 3.0 Linked Data Web Semantic Data Web May 12, 2009 16
    17. 17. What is it & why do we care? (1) “The Semantic Web” Augments the World Wide Web Represents the Web’s information in a machine- readable fashion Enables… …targeted search …data browsing …automated agents World Wide Web : Web pages :: The Semantic Web : Data May 12, 2009 17
    18. 18. What is it & why do we care? (2) “Semantic Web technologies” A family of technology standards that ‘play nice together’, including: Flexible data model Expressive ontology language Distributed query language Drive Web sites, enterprise applications The technologies enable us to build applications and solutions that were not possible, practical, or feasible traditionally. May 12, 2009 18
    19. 19. A Common & Coherent Set of Technology Standards A common set of technologies: ...enables diverse uses ...encourages interoperability A coherent set of technologies: …encourage incremental application …provide a substantial base for innovation A standard set of technologies: ...reduces proprietary vendor lock-in ...encourages many choices for tool sets May 12, 2009 19
    20. 20. The (In)Famous Layer Cake May 12, 2009 20
    21. 21. Semantic Web Technology Timeline 2001 2004 2007 2008 2009 1999 RIF HCLS May 12, 2009 21
    22. 22. 2009: Where we are As technologies & tools have evolved, Semantic Web advocates have progressed through stages: Report on… Execute on… Semantic Web vision Initial experiments Experiments Technology standards Technology standards Software packages Software packages Proofs of concept Proofs of concept Production implementations May 12, 2009 22
    23. 23. 2009: Where we are (cont’d) http://www.w3.org/2001/sw/sweo/public/UseCases/ May 12, 2009 23
    24. 24. 2009: Where we’re not Image from Trey Ideker via Enoch Huang Semantic Web technologies are not a ‘magic crank’ for discovering new drugs (or solving other problems, for that matter)! May 12, 2009 24
    25. 25. 2009: Where we’re not (cont’d) XML vs. RDF? “Ontology” vs. “ontology”? Data integration vs. Semantic Web vs. reasoning vs. KBs Linked Data? vs. search vs. app. development vs. … The Semantic Web still suffers from confusing and conflicting messaging, each of which asserts it’s “correct”. May 12, 2009 25
    26. 26. 2009: Where we’re not (cont’d) People with appropriate skill sets for designing & building Semantic Web solutions are not widely available. May 12, 2009 26
    27. 27. 2009: Where we’re not (cont’d) We don’t yet have standard solutions for privacy, trust, probability, and other elements of the Semantic Web vision. May 12, 2009 27
    28. 28. Introduction to the Semantic Web approach How does a Semantic Web approach help us merge data sets, infer new relations, and integrate outside data sources? Thanks to Ivan Herman for this example May 12, 2009 28
    29. 29. The rough structure of data integration 1. Map the various data onto an abstract data representation Make the data independent of its internal • representation… 2. Merge the resulting representations 3. Start making queries on the whole Queries not possible on the individual data sets • May 12, 2009 29
    30. 30. Data set “A”: A simplified book store Books ID Author Title Publisher Year ISBN0-00-651409-X id_xyz The Glass Palace id_qpr 2000 Authors ID Name Home page id_xyz Ghosh, Amitav http://www.amitavghosh.com Publishers ID Publisher Name City id_qpr Harper Collins London May 12, 2009 30
    31. 31. st: 1 Export your data as a set of relations May 12, 2009 31
    32. 32. Some notes on the data export Data export does not necessarily mean physical conversion of the data Relations can be virtual, generated on-the-fly at query time via SQL “bridges” scraping HTML pages extracting data from Excel sheets etc. One can export part of the data May 12, 2009 32
    33. 33. Data set “F”: Another book store’s data A B D E Traducteur ID Titre Original 1 ISBN0 2020386682 Le Palais A13 ISBN-0-00-651409-X des miroirs 2 3 ID Auteur 6 ISBN-0-00-651409-X A12 7 Nom 11 Ghosh, Amitav 12 Besse, Christianne 13 May 12, 2009 33
    34. 34. 2nd: Export your second set of data May 12, 2009 34
    35. 35. 3rd: start merging your data May 12, 2009 35
    36. 36. 3rd: start merging your data (cont’d) May 12, 2009 36
    37. 37. 4th: Merge identical resources May 12, 2009 37
    38. 38. Start making queries… User of data set “F” can now ask queries like: “What is the title of the original version of Le Palais des miroirs?” This information is not in the data set “F”... …but can be retrieved after merging with data set “A”! May 12, 2009 38
    39. 39. 5th: Query the merged data set May 12, 2009 39
    40. 40. However, more can be achieved… We “know” that a:author and f:auteur are really the same But our automatic merge does not know that! Let us add some extra information to the merged data: a:author is the same as f:auteur Both identify a Person, a category (type) for certain resources May 12, 2009 40
    41. 41. 3rd revisited: Use the extra knowledge May 12, 2009 41
    42. 42. Start making richer queries! User of data set “F” can now query: “What is the home page of Le Palais des miroirs’s ‘auteur’?” The information is not in data set “F” or “A”… …but was made available by: Merging data sets “A” and “F” Adding three simple “glue” statements May 12, 2009 42
    43. 43. 6th: Richer queries May 12, 2009 43
    44. 44. Bring in other data sources We can integrate new information into our merged data set from other sources e.g. additional information about author Amitav Ghosh Perhaps the largest public source of general knowledge is Wikipedia Structured data can be extracted from Wikipedia using dedicated tools May 12, 2009 44
    45. 45. 7th: Merge with Wikipedia data May 12, 2009 45
    46. 46. 7th (cont’d): Merge with Wikipedia data May 12, 2009 46
    47. 47. 7th (cont’d): Merge with Wikipedia data May 12, 2009 47
    48. 48. Is that surprising? It may look like it but, in fact, it should not be… What happened via automatic means is done every day by Web users! The difference: a bit of extra rigour so that machines could do this, too May 12, 2009 48
    49. 49. What did we do? We combined different data sets that ...may be internal or somewhere on the Web ...are of different formats (RDBMS, Excel spreadsheet, (X)HTML, etc) ...have different names for the same relations We could combine the data because some URIs were identical i.e. the ISBNs in this case We could add some simple additional information (the “glue”) to help further merge data sets The result? Answer queries that could not previously be asked May 12, 2009 49
    50. 50. What did we do? (cont’d) May 12, 2009 50
    51. 51. The abstraction pays off because… …the graph representation is independent of the details of the native structures …a change in local database schemas, HTML structures, etc. do not affect the whole “schema independence” …new data, new connections can be added seamlessly & incrementally May 12, 2009 51
    52. 52. So where is the Semantic Web? Semantic Web technologies make such integration possible The rest of this tutorial introduces many of these technologies. May 12, 2009 52
    53. 53. Agenda Introduction The data model (RDF) The query language (SPARQL) Adding structure & semantics (RDFS, OWL, RIF) Working in the real world (GRDDL, RDF2RDB) Working on the Web (Linked Data, RDFa, POWDER) May 12, 2009 53
    54. 54. RDF is… Resource Description Framework May 12, 2009 54
    55. 55. RDF is… The data model of the Semantic Web. May 12, 2009 55
    56. 56. RDF is… A schema-less data model that features unambiguous identifiers and named relations between pairs of resources. May 12, 2009 56
    57. 57. RDF is… A labeled, directed graph of relations between resources and literal values. RDF graphs are collections of triples Triples are made up of a subject, a predicate, and an object predicate subject object Resources and relationships are named with URIs May 12, 2009 57
    58. 58. Example RDF triples “Lee Feigenbaum works for Cambridge Semantics” works for Lee Cambridge Feigenbaum Semantics “Lee Feigenbaum was born in 1978” born in Lee 1978 Feigenbaum “Cambridge Semantics is headquartered in Massachusetts” headquartered Cambridge Massachusetts Semantics May 12, 2009 58
    59. 59. Triples connect to form graphs works for Lee Cambridge Feigenbaum Semantics headquartered born in lives in Massachusetts 1978 capital Boston May 12, 2009 59
    60. 60. Why RDF? What’s different here? The graph data structure makes merging data with shared identifiers trivial (as we saw earlier) Triples act as a least common denominator for expressing data URIs for naming remove ambiguity …the same identifier means the same thing May 12, 2009 60
    61. 61. Why RDF? Incremental Integration Agile, Flexible URIs for Incremental Graph naming Model Integration Relational RDF Database May 12, 2009 61
    62. 62. Types of RDF Tools Triple stores Built on relational database Native RDF store Development libraries Full-featured application servers Most RDF tools contain some elements of each of these. May 12, 2009 62
    63. 63. Finding RDF Tools Community-maintained lists http://esw.w3.org/topic/SemanticWebTools Emphasis on large triple stores http://esw.w3.org/topic/LargeTripleStores Michael Bergman’s Sweet Tools searchable list: http://www.mkbergman.com/?page_id=325 May 12, 2009 63
    64. 64. RDF Tools – (Some) Triple Stores Commercial or Tool Environment Open-source Anzo Both Java ARC Open-source PHP AllegroGraph Commercial Java, Prolog Jena Open-source Java Mulgara Open-source Java Oracle RDF Commercial SQL / SPARQL RDF::Query Open-source Perl Redland Open-source C, many wrappers Sesame Open-source Java Talis Platform Commercial HTTP (Hosted) Virtuoso Both C++ May 12, 2009 64
    65. 65. Agenda Introduction The data model (RDF) The query language (SPARQL) Adding structure & semantics (RDFS, OWL, RIF) Working in the real world (GRDDL, RDF2RDB) Working on the Web (Linked Data, RDFa, POWDER) May 12, 2009 65
    66. 66. Motivating SPARQL With a query language, a client can design their own interface. --Leigh Dodds, Talis May 12, 2009 66
    67. 67. SPARQL is… SPARQL Protocol And RDF Query Language May 12, 2009 67
    68. 68. SPARQL is… The query language of the Semantic Web. May 12, 2009 68
    69. 69. SPARQL is… A SQL-like language for querying sets of RDF graphs. May 12, 2009 69
    70. 70. SPARQL is… A simple protocol for issuing queries and receiving results over HTTP. So… Every SPARQL client works with every SPARQL server! May 12, 2009 70
    71. 71. Why SPARQL? SPARQL lets us: Pull information from structured and semi- structured data. Explore data by discovering unknown relationships. Query and search an integrated view of disparate data sources. Glue separate software applications together by transforming data from one vocabulary to another. May 12, 2009 71
    72. 72. Dealer 2 Dealer 3 Dealer 1 Employee ERP / Budget Directory System Web EPA Fuel Efficiency Spreadsheet SPARQL Query Engine What automobiles get more than 25 miles per gallon, fit within my department’s budget, and can be purchased at a dealer located within 10 miles of one of my employees? SELECT ?automobile WHERE { ?automobile a ex:Car ; epa:mpg ?mpg ; ex:dealer ?dealer . ?employee a ex:Employee ; geo:loc ?loc . ?dealer geo:loc ?dealerloc . FILTER(?mpg > 25 && geo:dist(?loc, ?dealerloc) <= 10) . } Web dashboard SPARQL query
    73. 73. SPARQL Example: Querying Wikipedia Find me all landlocked countries with a population greater than 15 million. PREFIX type: <http://dbpedia.org/class/yago/> PREFIX prop: <http://dbpedia.org/property/> SELECT ?country_name ?population WHERE { ?country a type:LandlockedCountries ; rdfs:label ?country_name ; prop:populationEstimate ?population . FILTER ( ?population > 15000000 && langMatches(lang(?country_name), quot;ENquot;) ). } ORDER BY DESC(?population) May 12, 2009 73
    74. 74. SPARQL Example: Querying Wikipedia DBPedia SPARQL Endpoint
    75. 75. SPARQL Example: Querying Wikipedia
    76. 76. Types of SPARQL Tools Query engines Things that can run queries Most RDF stores provide a SPARQL engine Query rewriters E.g. to query relational databases (more later) Endpoints Things that accept queries on the Web and return results Client libraries Things that make it easy to ask queries May 12, 2009 76
    77. 77. Finding SPARQL Tools Community-maintained list of query engines http://esw.w3.org/topic/SparqlImplementations Publicly accessible SPARQL endpoints http://esw.w3.org/topic/SparqlEndpoints Michael Bergman’s Sweet Tools searchable list: http://www.mkbergman.com/?page_id=325 May 12, 2009 77
    78. 78. (Some) SPARQL’able Data Sets May 12, 2009 78
    79. 79. bio2rdf.org – querying life sciences data May 12, 2009 79
    80. 80. bio2rdf.org – querying life sciences data May 12, 2009 80
    81. 81. Agenda Introduction The data model (RDF) The query language (SPARQL) Adding structure & semantics (RDFS, OWL, RIF) Working in the real world (GRDDL, RDF2RDB) Working on the Web (Linked Data, RDFa, POWDER) May 12, 2009 81
    82. 82. Where’s the magic? We haven’t seen anything yet that begins to approach the long-term Semantic Web vision May 12, 2009 82
    83. 83. From the explicit to the inferred 3 pieces of the Semantic Web technology stack are about describing a domain well enough to capture (some of) the meaning of resources and relationships in the domain RDF Schema OWL RIF Apply knowledge to data to get more data. May 12, 2009 83
    84. 84. RDFS is… RDF Schema May 12, 2009 84
    85. 85. RDF Schema is… Elements of: Vocabulary (defining terms) I define a relationship called “prescribed dose.” Schema (defining types) “prescribed dose” relates “treatments” to “dosagees” Taxonomy (defining hierarchies) Any “doctor” is a “medical professional” May 12, 2009 85
    86. 86. WOL OWL is… Web Ontology Language May 12, 2009 86
    87. 87. OWL is… Elements of ontology Same/different identity “author” and “auteur” are the same relation two resources with the same “ISBN” are the same “book” More expressive type definitions A “cycle” is a “vehicle” with at least one “wheel” A “bicycle” is a “cycle” with exactly two “wheels” More expressive relation definitions “sibling” is a symmetric predicate the value of the “favorite dwarf” relation must be one of “happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”, “bashful”, “doc” May 12, 2009 87
    88. 88. What can we do with OWL? Answer questions of Consistency Are there any contradictions in this model? Classification What are all the inferred types of this resource? Satisfiability Are there any classes in this ontology that cannot possibly have any members? May 12, 2009 88
    89. 89. Building Useful Ontologies Developing and maintaining quality ontolgies is very challenging Users need tools and services, e.g., to help check if ontology is: Meaningful — all named classes can have instances http://www.aber.ac.uk/compsci/public/media/presentations/OUCL-seminar.ppt
    90. 90. Building Useful Ontologies Developing and maintaining quality ontolgies is very challenging Users need tools and services, e.g., to help check if ontology is: Meaningful — all named classes can have instances Correct — captures intuitions of domain experts
    91. 91. Building Useful Ontologies Developing and maintaining quality ontolgies is very challenging Users need tools and services, e.g., to help check if ontology is: Meaningful — all named classes can have instances Correct — captures intuitions of domain experts Minimally redundant — no unintended synonyms Banana split Banana sundae
    92. 92. Example: SNOMED Large: 373,731 concepts & over 1 million terms NHS version extended to 542,380 classes with 19,828 additional named classes 148,821 class drug taxonomy (primitive hierarchy) OWL reasoner (FaCT++) classified NHS ontology Able to classify whole ontology in <4 hours Interesting results come from 19,828 additional named classes 180 missing subClass relationships were found, e.g.: Periocular_dermatitis subClassOf Disease_of_face May 12, 2009 92
    93. 93. Example: SNOMED May 12, 2009 93
    94. 94. RIF is… Rules Interchange Format May 12, 2009 97
    95. 95. RIF is… Standard representation for exchanging sets of logical and business rules Logical rules A buyer buys an item from a seller if the seller sells the item to the buyer A customer becomes a quot;Goldquot; customer as soon as his cumulative purchases during the current year top $5000 Production rules Customers that become quot;Goldquot; customers must be notified immediately, and a golden customer card will be printed and sent to them within one week For shopping carts worth more than $1000, quot;Goldquot; customers receive an additional discount of 10% of the total amount May 12, 2009 98
    96. 96. Developing Tools and Infrastructure Editors/environments Oiled, Protégé, Swoop, TopBraid, Ontotrack, … May 12, 2009 99
    97. 97. Developing Tools and Infrastructure Editors/environments Oiled, Protégé, Swoop, TopBraid, Ontotrack, … Reasoning systems Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, … Pellet KAON2 CEL May 12, 2009 100
    98. 98. Visualizing and Publishing Vocabularies May 12, 2009 101
    99. 99. Reusable, public ontologies FOAF The Event Ontology Measurement Units Ontology May 12, 2009 102
    100. 100. Agenda Introduction The data model (RDF) The query language (SPARQL) Adding structure & semantics (RDFS, OWL, RIF) Working in the real world (GRDDL, RDF2RDB) Working on the Web (Linked Data, RDFa, POWDER) May 12, 2009 103
    101. 101. Fantasy Land Architecture Ontology / + Schema Custo Custo Custo Custo Custo Custo m UI m UI m UI m UI m UI m UI May 12, 2009 104
    102. 102. Reality Internet DB2 XML LDAP Oracle Directory RDB Custo Custo Custo Custo Custo Custo m UI m UI m UI m UI m UI m UI May 12, 2009 105
    103. 103. GRDDL is… Gleaning Resource Descriptions from Dialects of Language May 12, 2009 106
    104. 104. GRDDL is… A method for authoritatively getting RDF data from XML and XHTML documents. May 12, 2009 107
    105. 105. GRDDL is… A mechanism for authoritatively deriving RDF data from families of XML and XHTML documents. May 12, 2009 108
    106. 106. GRDDL tools Most GRDDL tools are adapters to existing RDF stores or SPARQL engines to allow loading or querying data from XML and XHTML sources. Community-maintained list: http://esw.w3.org/topic/GrddlImplementations Host System GRDDL tool Jena GRDDL Reader for Jena RDFLib GRDDL.py Redland (built in) Swignition (built in) Virtuoso GRDDL “Sponger” May 12, 2009 109
    107. 107. RDB2RDF is… Relational Database to RDF May 12, 2009 110
    108. 108. RDB2RDF is… A proposed W3C Working Group to define a standard way to map from relational databases to RDF (and SPARQL). May 12, 2009 111
    109. 109. RDF2RDB tools Survey of existing approaches: http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf Tool Mapping Approach Dynamic vs. Static (ETL) Anzo D2RQ configuration graph Both Asio Tools OWL file, SWRL rules Both Dartgrid XML file, visual mapper Dynamic D2RQ D2RQ configuration file Both R2O R2O XML file Both RDBtoOnto Constraint rules Static (ETL) SDS EII Query Engine/OOM XML Both Triplify SQL config file Linked Data Virtuoso RDF View Meta-Schema Language Both May 12, 2009 112
    110. 110. What about… everything else? Standards don’t yet exist, but many tools exist to derive RDF and/or run SPARQL queries against other sources of data. May 12, 2009 113
    111. 111. LDAP Directories Squirrel RDF http://jena.sourceforge.net/SquirrelRDF/ May 12, 2009 114
    112. 112. Excel spreadsheets Anzo for Excel http://www.cambridgesemantics.com/products/anzo_for_excel May 12, 2009 115
    113. 113. Excel spreadsheets Semantic Discovery System http://insilicodiscovery.com/installation/index.php May 12, 2009 116
    114. 114. Web-based data sources Virtuoso Sponger Cartridges http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger May 12, 2009 117
    115. 115. Unstructured Text Calais http://www.opencalais.com/ May 12, 2009 118
    116. 116. Unstructured Text Zemanta Web Service http://developer.zemanta.com/ May 12, 2009 119
    117. 117. Agenda Introduction The data model (RDF) The query language (SPARQL) Adding structure & semantics (RDFS, OWL, RIF) Working in the real world (GRDDL, RDF2RDB) Working on the Web (Linked Data, RDFa, POWDER) May 12, 2009 120
    118. 118. Linked Data is… A simple set of 4 guidelines for publishing RDF data on the Web (over HTTP) Developed by Tim Berners-Lee in 2006 1. Use URIs as names for things • Globally unique identity 2. Use HTTP URIs • Everyone has a Web browser/client 3. When someone looks up a URI, provide useful information • …in the form of RDF data 4. Include links to other URIs • Foster discovery of additional information May 12, 2009 121
    119. 119. The Linking Open Data Project is... A community project started within the W3C Semantic Web Education & Outreach group in 2007 A wealth of existing, open Web-based data sets exposed in RDF and linked together A growing number of publicly available SPARQL endpoints The first steps of “The” Semantic Web? No longer easily measured or depicted! May 12, 2009 122
    120. 120. The LOD “cloud”, May 2007 May 12, 2009 123
    121. 121. The LOD “cloud”, March 2008 May 12, 2009 124
    122. 122. The LOD “cloud”, September 2008 May 12, 2009 125
    123. 123. The LOD “cloud”, March 2009 May 12, 2009 126
    124. 124. Application specific portions of the cloud Notably, bio-related data sets (in light purple) some by the W3C “Linking Open Drug Data” task force May 12, 2009 127
    125. 125. Sindice - Another view of data on the Web May 12, 2009 128
    126. 126. Tools: Publishing linked data Many tools we’ve already seen publish RDF data according to linked data principles E.g. Talis platform, Virtuoso, Triplify Others sit on top of existing systems and make the data available as Linked Data E.g. pubby May 12, 2009 129
    127. 127. Tools: the Data Browser World Wide Web : Web pages :: The Semantic Web : Data World Wide Web : Web browser :: Linked Data Web : Data browser May 12, 2009 130
    128. 128. Tabulator: Generic Data Browser May 12, 2009 131
    129. 129. Disco Hyperdata Browser May 12, 2009 132
    130. 130. OpenLink Data Explorer May 12, 2009 133
    131. 131. Marbles Linked Data Browser May 12, 2009 134
    132. 132. DBPedia Mobile May 12, 2009 135
    133. 133. DBPedia Mobile May 12, 2009 136
    134. 134. DBPedia Mobile May 12, 2009 137
    135. 135. DBPedia Mobile May 12, 2009 138
    136. 136. QDOS – your online digital status May 12, 2009 139
    137. 137. BBC Music Beta May 12, 2009 140
    138. 138. Producer-oriented Web to consumer- oriented Web On the current Web… Content publishers decide what can be done with the data (via links, script) On the Semantic Web… Content publishers publish actionable data Content consumers decide how to act on it May 12, 2009 141
    139. 139. UltraLink UltraLink is Novartis’s solution for cross-linking over 1,500,000 biologic and chemical terms, including synonyms, taxonomies, and pointers into data repositories. May 12, 2009 142
    140. 140. UltraLink What if an acquisition brings with it a new Web-based corpus of pathway data that uses terms not recognized by the annotators? New text miners must be created & deployed Finding & consuming data are too tightly coupled May 12, 2009 143
    141. 141. RDFa is… RDF in Attributes May 12, 2009 144
    142. 142. RDFa is… A collection of HTML attributes that allow RDF to be embedded directly in Web pages. May 12, 2009 145
    143. 143. Why RDFa? Don’t Repeat Yourself (DRY) In-context metadata (copy & paste) Authoritative (no screen scrapig) May 12, 2009 146
    144. 144. Who’s using RDFa? STW Thesaurus for Economics May 12, 2009 147
    145. 145. RDFa in action May 12, 2009 148
    146. 146. POWDER is… Protocol for Web Description Resources May 12, 2009 149
    147. 147. http://www.slideshare.net/fabien_gandon/powder-in-a-nutshell-presentation descriptions applied to groups of online resources 150
    148. 148. many resources one description 151
    149. 149. grouping mechanisms... ... list URIs ... domain names, paths ... regular expressions on URIs 152
    150. 150. descriptions may be grouped queries are on individual resources 153
    151. 151. description… • Which resources does the DR describe? • What is the description? • Who has created the description? • When was the description created? • Until when is the description considered valid? • From when is the description considered valid? • Does anybody agree with this description? • Do other descriptions exist about this group of resources? 154
    152. 152. in order to... adapt authorize protect trust search monitor 155
    153. 153. Thanks & Questions lee@cambridgesemantics.com May 12, 2009 156
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×