CSHALS 2010 W3C Semanic Web Tutorial
Upcoming SlideShare
Loading in...5
×
 

CSHALS 2010 W3C Semanic Web Tutorial

on

  • 6,943 views

These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at ...

These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 .

A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .

Statistics

Views

Total Views
6,943
Views on SlideShare
6,913
Embed Views
30

Actions

Likes
8
Downloads
176
Comments
4

4 Embeds 30

http://www.slideshare.net 19
http://linkeddata.uriburner.com 8
http://www.techgig.com 2
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • One of the goals of this tutorial is to de-mystify the all of the names of technologies, tools, projects, etc. that swirl around the Semantic Web story.And since I saw that as I researched this presentation, everyone seems to like this particular Gary Larson cartoon, it behooved me to include it.
  • The good – emphasize the importance of the foundational layers (URIs and RDF) ; emphasizes the long-term roadmap/vision of what’s needed for the Semantic WebThe bad – implies that perhaps things can’t be taken serious until all the pieces are in place ; implies an order to the research ; various versions of the cake tell different stories (importance of XML, absence of query, lack of UI/application layer, …)Valentin Zacharias wrote about the “infamy” part of the layer cake here: http://www.valentinzacharias.de/blog/2007/04/ban-semantic-web-layer-cake.html
  • The Ontology/ontology dichotomy is captured well by Jim Hendler at http://www.cs.rpi.edu/%7Ehendler/presentations/SemTech2008-2Towers.pdf
  • Definition.
  • Prescriptive.
  • Descriptive.
  • Formal.
  • The first is as opposed to relational tables or XML schemas where the schema needs to be explicitly adjusted to accommodate whatever data is being merged.The second is due to the expressivity of the model – can handle lists, trees, n-ary relations, etc.The third is as opposed to table & column identifiers or XML attribute names.
  • Definition.
  • Prescriptive.
  • Descriptive.
  • Descriptive (part 2). This is leagues ahead of the situation with SQL!
  • http://bio2rdf.org/
  • http://bio2rdf.org/
  • Definition.
  • Definition.
  • Definition.

CSHALS 2010 W3C Semanic Web Tutorial Presentation Transcript

  • 1. The Semantic Web LandscapeA Practical Introduction
    Lee Feigenbaum
    VP Technology & Standards, Cambridge Semantics
    Co-chair, W3C SPARQL Working Group
    For CSHALS 2010 Tutorial Attendees
    February 24, 2010
  • 2. The W3C HCLS interest group set out to use Semantic Web technologies to receive precise answers to a complex question:
    A Motivating Example: Drug Discovery
    Find me genes involved in signal transduction that are related to pyramidal neurons.
  • 3. General search
    223,000 hits, 0 results
  • 4. Domain-limited search
    2,580 potential results
  • 5. Specific databases
    Too many silos!
  • 6. A Semantic Web Approach
    Integrate disparate databases…
    MeSH
    PubMed
    Entrez Gene
    Gene Ontology

  • 7. A Semantic Web Approach (cont’d)
    …so that onequery…
  • 8. A Semantic Web Approach (cont’d)
    …(trivially) spans several databases…
  • 9. A Semantic Web Approach (cont’d)
    …to deliver targeted results…
  • 10. Agreement on common terms and relationships
    Incremental, flexible data structure
    Good-enough modeling
    Query interface tailored to the data model
    What’s the trick?
  • 11. What is the semantic web?
  • 12. Names
  • 13. Semantic Web
    Web of Data
    Giant Global Graph
    Data Web
    Web 3.0
    Linked Data Web
    Semantic Data Web
    Branding
  • 14. “The Semantic Web” a.k.a “Linked Open Data”
    Augments the World Wide Web
    Represents the Web’s information in a machine-readable fashion
    Enables…
    …targeted search
    …data browsing
    …automated agents
    What is it & why do we care? (1)
    World Wide Web : Web pages :: The Semantic Web : Data
  • 15. “Semantic Web technologies”
    A family of technology standards that ‘play nice together’, including:
    Flexible data model
    Expressive ontology language
    Distributed query language
    Drive Web sites, enterprise applications
    What is it & why do we care? (2)
    The technologies enable us to build applications and solutions that were not possible, practical, or feasible traditionally.
  • 16. A common set of technologies:
    ...enables diverse uses
    ...encourages interoperability
    A coherent set of technologies:
    …encourage incremental application
    …provide a substantial base for innovation
    A standard set of technologies:
    ...reduces proprietary vendor lock-in
    ...encourages many choices for tool sets
    A Common & Coherent Set of Technology Standards
  • 17. The (In)Famous Layer Cake
  • 18. Semantic Web Technology Timeline
    2001
    2004
    2008
    2010
    2007
    1999
    RIF
    HCLS
  • 19. As technologies & tools have evolved, Semantic Web advocates have progressed through stages:
    2010: Where we are
  • 20. 2010: Where we’re not
    Image from Trey Ideker via Enoch Huang
    Semantic Web technologies are not a ‘magic crank’ for discovering new drugs (or solving other problems, for that matter)!
  • 21. 2010: Where we’re not (cont’d)
    XML vs. RDF?
    “Ontology” vs. “ontology”?
    Data integration vs. reasoning vs. KBs vs. search vs. app. development vs. …
    Semantic Web vs. Linked Data?
    The Semantic Web still suffers from confusing and conflicting messaging, each of which asserts it’s “correct”.
  • 22. 2010: Where we’re not (cont’d)
    People with appropriate skill sets for designing & building Semantic Web solutions are not widely available.
  • 23. 2010: Where we’re not (cont’d)
    We don’t yet have standard solutions for privacy, trust, probability, and other elements of the Semantic Web vision.
  • 24. What do Semantic Web solutions look like?
  • 25. RDF is…
    Resource Description Framework
  • 26. RDF is…
    The data model of the Semantic Web.
  • 27. RDF is…
    A schema-less data model that features unambiguous identifiers and named relations between pairs of resources.
  • 28. RDF is…
    A labeled, directed graph of relations between resources and literal values.
    RDF graphs are collections of triples
    Triples are made up of a subject, a predicate, and an object
    Resources and relationships are named with URIs
    predicate
    subject
    object
  • 29. “Lee Feigenbaum works for Cambridge Semantics”
    “Lee Feigenbaum was born in 1978”
    “Cambridge Semantics is headquartered in Massachusetts”
    Example RDF triples
    works for
    born in
    headquartered
    Lee Feigenbaum
    Cambridge Semantics
    Lee Feigenbaum
    Cambridge Semantics
    1978
    Massachusetts
  • 30. Triples connect to form graphs
    headquartered
    lives in
    Massachusetts
    born in
    capital
    works for
    Lee Feigenbaum
    Cambridge Semantics
    Boston
    1978
  • 31. The graph data structure makes merging datawith shared identifiers trivial
    Triples act as a least common denominatorfor expressing data
    URIs for naming remove ambiguity
    …the same identifier means the same thing
    Why RDF? What’s different here?
  • 32. Why RDF? Incremental Integration
    RelationalDatabase
    RDF
  • 33. RDF is the model, for which there are several concrete syntaxes:
    RDF/XML – standard, complex XML syntax
    Turtle – common, textual, triples-oriented syntax
    N3 – more expressive superset of Turtle
    N-Triples – textual, line-oriented, useful for streaming
    What does RDF look like?
    When writing RDF by hand and in many guides, examples, and discussions these days, you’ll see Turtle most often.
  • 34. Write a triple by writing its parts separated by spaces (subject predicate object)
    A Bit of Turtle
    @prefix ex: <http://example.org/myvocab/> .
    @prefix geo: <http://geonames.example/> .
    ex:LeeFeigenbaumex:employerex:CambridgeSemantics .
    ex:LeeFeigenbaumex:birthYear 1978 .
    ex:CambridgeSemanticsex:headquartersgeo:BostonMA .
    geo:BostonMAex:population 574000 .
  • 35. SPARQL is…
    SPARQL Protocol And RDF Query Language
  • 36. SPARQL is…
    The query language of the Semantic Web.
  • 37. SPARQL is…
    A SQL-like language for querying sets of RDF graphs.
  • 38. SPARQL is…
    A simple protocol for issuing queries and receiving results over HTTP. So…
    Every SPARQL client works with every SPARQL server!
  • 39. SPARQL lets us:
    Pull information from structured and semi-structured data.
    Explore data by discovering unknown relationships.
    Query and search an integrated view of disparate data sources.
    Glue separate software applications together by transforming data from one vocabulary to another.
    Why SPARQL?
  • 40. Dealer 1
    Dealer 2
    Dealer 3
    Employee
    Directory
    ERP / Budget
    System
    Web
    EPA Fuel Efficiency
    Spreadsheet
    SPARQL Query Engine
    What automobiles get more than 25 miles per gallon, fit within my department’s budget, and can be purchased at a dealer located within 10 miles of one of my employees?
    SELECT ?automobile
    WHERE { ?automobile a ex:Car ; epa:mpg ?mpg ;
    ex:dealer ?dealer .
    ?employee a ex:Employee ; geo:loc ?loc .
    ?dealer geo:loc ?dealerloc .
    FILTER(?mpg > 25 &&
    geo:dist(?loc, ?dealerloc) <= 10) .
    }
    Web dashboard
    SPARQL query
  • 41. bio2rdf.org – querying life sciences data
  • 42. bio2rdf.org – querying life sciences data
  • 43. 3 pieces of the Semantic Web technology stack are about describing a domain well enough to capture (some of) the meaning of resources and relationships in the domain
    RDF Schema
    OWL
    RIF
    From the explicit to the inferred
    Apply knowledge to data to get more data.
  • 44. RDFS is…
    RDF Schema
  • 45. Elements of:
    Vocabulary (defining terms)
    I define a relationship called “prescribed dose.”
    Schema (defining types)
    “prescribed dose” relates “treatments” to “dosages”
    (my prescribed dose is 2mg; therefore 2mg is a dosage)
    Taxonomy (defining hierarchies)
    Any “doctor” is a “medical professional”
    (therefore Dr. Brown is a medical professional)
    RDF Schema is…
  • 46. WOL OWL is…
    Web Ontology Language
  • 47. Elements of ontology
    Same/different identity
    “author” and “auteur” are the same relation
    two resources with the same “ISBN” are the same “book”
    More expressive type definitions
    A “cycle” is a “vehicle” with at least one “wheel”
    A “bicycle” is a “cycle” with exactly two “wheels”
    More expressive relation definitions
    “sibling” is a symmetric predicate
    the value of the “favorite dwarf” relation must be one of “happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”, “bashful”, “doc”
    OWL is…
  • 48. A class is a (named) collection of things with similar attributes
    OWL: Rich Class Definitions
  • 49. A class is a (named) collection of things with similar attributes
    OWL: Rich Class Definitions
  • 50. A class is a (named) collection of things with similar attributes
    OWL: Rich Class Definitions
  • 51. OWL: Rich Class Definitions
  • 52. RIF is…
    Rules Interchange Format
  • 53. Standard representation for exchanging sets of logical and business rules
    Logical rules
    A buyer buys an item from a seller if the seller sells the item to the buyer
    A customer becomes a "Gold" customer as soon as his cumulative purchases during the current year top $5000
    Production rules
    Customers that become "Gold" customers must be notified immediately, and a golden customer card will be printed and sent to them within one week
    For shopping carts worth more than $1000, "Gold" customers receive an additional discount of 10% of the total amount
    RIF is…
  • 54. Fantasy Land Architecture
    Ontology / Schema
    +
    Custom UI
    Custom UI
    Custom UI
    Custom UI
    Custom UI
    Custom UI
  • 55. Reality
    Internet
    DB2
    XML
    LDAP Directory
    Oracle
    RDB
    Custom UI
    Custom UI
    Custom UI
    Custom UI
    Custom UI
    Custom UI
  • 56. GRDDL is…
    Gleaning Resource Descriptions from Dialects of Language
  • 57. GRDDL is…
    A method for authoritatively getting RDF data from XML and XHTML documents.
  • 58. GRDDL is…
    A mechanism for authoritatively deriving RDF data from families of XML and XHTML documents.
  • 59. RDB2RDF is…
    Relational Database toRDF
  • 60. RDB2RDF is…
    A W3C Working Group to define a standard way to map from relational databases to RDF (and SPARQL).
  • 61. A simple set of 4 guidelines for publishing RDF data on the Web (over HTTP)
    Developed by Tim Berners-Lee in 2006
    Use URIs as names for things
    • Globally unique identity
    Use HTTP URIs
    • Everyone has a Web browser/client
    When someone looks up a URI, provide useful information
    • …in the form of RDF data
    Include links to other URIs
    • Foster discovery of additional information
    Linked Data is…
  • 62. The LOD “cloud”, March 2009
  • 63. Application specific portions of the cloud
    • Notably, bio-related data sets (in light purple)
    • 64. some by the W3C “Linking Open Drug Data” task force
  • RDFa is…
    RDF in Attributes
  • 65. RDFa is…
    A collection of HTML attributes that allow RDF to be embedded directly in Web pages.
  • 66. Don’t Repeat Yourself (DRY)
    In-context metadata (copy & paste)
    Authoritative (no screen scrapig)
    Why RDFa?
  • 67. RDFa in action
  • 68. Semantic Web landscape today
  • 69. Semantic Web Tools
    In 2010, there are a wide variety of open-source and commercial Semantic Web tools available.
  • 70. Triple stores
    Built on relational database
    Native RDF store
    Development libraries
    Full-featured application servers
    Types of RDF Tools
    Most RDF tools contain some elements of each of these.
  • 71. Community-maintained lists
    http://esw.w3.org/topic/SemanticWebTools
    Emphasis on large triple stores
    http://esw.w3.org/topic/LargeTripleStores
    Michael Bergman’s Sweet Tools searchable list:
    http://www.mkbergman.com/?page_id=325
    Finding RDF Tools
  • 72. Query engines
    Things that can run queries
    Most RDF stores provide a SPARQL engine
    Query rewriters
    E.g. to query relational databases (more later)
    Endpoints
    Things that accept queries on the Web and return results
    Client libraries
    Things that make it easy to ask queries
    Types of SPARQL Tools
  • 73. Community-maintained list of query engines
    http://esw.w3.org/topic/SparqlImplementations
    Publicly accessible SPARQL endpoints
    http://esw.w3.org/topic/SparqlEndpoints
    Michael Bergman’s Sweet Tools searchable list:
    http://www.mkbergman.com/?page_id=325
    Finding SPARQL Tools
  • 74. Editors/environments
    Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
    Developing Tools and Infrastructure
  • 75. Editors/environments
    Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
    Reasoning systems
    Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …
    Developing Tools and Infrastructure
    Pellet
    KAON2
    CEL
  • 76. Visualizing and Publishing Vocabularies
  • 77. Reusable, public ontologies
    FOAF
    The Event Ontology
    Measurement Units Ontology
  • 78. Community-maintained list:
    http://esw.w3.org/topic/GrddlImplementations
    GRDDL tools
    Most GRDDL tools are adapters to existing RDF stores or SPARQL engines to allow loading or querying data from XML and XHTML sources.
  • 79. What about… everything else?
    Standards don’t yet exist, but many tools exist to derive RDF and/or run SPARQL queries against other sources of data.
  • 80. LDAP Directories
    Squirrel RDF
    http://jena.sourceforge.net/SquirrelRDF/
  • 81. Excel spreadsheets
    Anzo for Excel
    http://www.cambridgesemantics.com/products/anzo_for_excel
  • 82. Web-based data sources
    Virtuoso Sponger Cartridges
    http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger
  • 83. Unstructured Text
    Calais
    http://www.opencalais.com/
  • 84. Unstructured Text
    Zemanta Web Service
    http://developer.zemanta.com/
  • 85. On the Web
    Google, Yahoo!
    Best Buy
    NY Times
    US Government
    UK Government
    Where is it being used?
  • 86. Industries
    Oil & Gas (integration, classification)
    Finance (structured data, ontologies, XBRL)
    Publishing (metadata)
    Government (structured data, metadata, classification)
    Libraries & museums (metadata, classification)
    IT (rapid application development & evolution)
    Where is it being used?
  • 87. Health Care
    Cleveland Clinic
    Clinical research
    Data integration, classification (= better search)
    UT School of Health
    Public health surveillance
    SAPPHIRE—classification, ontology-driven development
    Various
    Clinical Decision Support
    Agile, rule-driven, scalable in the face of change
    Where is it being used?
  • 88. Life Sciences
    Agile knowledgebases at Pfizer
    Target assessment at Eli Lilly
    Integrated information links at Novartis
    Astra Zeneca, J&J, UCB, …
    Where is it being used?
    CSHALS chronicles many of these uses and many more.
  • 89. Take-away Advice
  • 90. These are horizontal, enabling technologies.
    But they apply particularly well to problems with these characteristics:
    Heterogeneous data from multiple sources
    Increasing reliance on connections within this data
    Rapidly changing information needs
    Significant early-mover advantage
    Large amounts of data that would benefit from classification
    Why are Semantic Web technologies appropriate for the life sciences?
    Many tactical and strategic challenges in the life sciences industry feature these traits.
  • 91. Getting Started with Semantic Web technologies
    Don’t boil the ocean.
  • 92. Getting Started with Semantic Web technologies
    Goal: quick tactical wins on the path to large strategic value
    Be sure to consider the operational ramifications
    Who does what differently?
    Ideal Semantic Web projects/applications have an incremental path towards broad deployment that generates demonstrable value along the way
  • 93. Look beyond the core Semantic Web capabilities and consider:
    integration with existing enterprise systems
    development & extension models
    deployment, logging, maintenance, backup
    tooling
    user experience
    Choose practical, enterprise-ready tools
    If you choose to build new components and assemble existing components together, it’s quite likely you’ll end up reinventing the wheel.
  • 94. What level of expertise is necessary?
    Technologies only?
    Technologies + API?
    Technologies + tooling?
    Tooling only?

    How will we acquire the expertise?
    In-house (and if so, how?)
    Vendor services
    3rd-party services
    Open-source community
    Plan for Acquiring Expertise
  • 95. I’m always happy to field questions & engage in discussion:
    lee@cambridgesemantics.com
    Thanks & Discussion