Ivan Herman, W3CSemantic Technology & Business Conference              5th June, 2012         San Francisco, CA, USA
   A system manipulating and analyzing knowledge           e.g., via big ontologies, vocabularies           Google’s Kn...
   We have to acknowledge that the field has grown          and has become multi-faceted         All different “views” h...
   Some technologies are, essentially, done:             Ontology for Media Resources             Media Fragments URI  ...
   Some areas are subject of work           Update of RDF           Provenance           Linked Data Platform(8)
   We are discussing new works, new areas, e.g.,           Access Control issues           Constraint checking on Seman...
   Various communities have different emphasis on           which part of the Semantic Web they want to use          W3C...
Photo credit Robert Freund(11)
(12)
(13)
   The “usual” Semantic Web problem: what           vocabularies to use?          The problem is not that there aren’t a...
   Questions:            what is the standard URI for, say, a temporal fragment of a             video?            what...
   Ontologies for Media Resources: published as a           Recommendation in February 2012          Media Fragments URI...
Photo credit “reedster”, Flickr
   Nested queries (i.e., SELECT within a WHERE clause)          Negation (MINUS, and a NOT EXIST filter)          Aggre...
SPARQL Endpoint                                                                               SPARQL Endpoint             ...
   Technology has been finalized          Goes to “Proposed Recommendation” soon          Should be published as a stan...
Photo credit “mayhem”, Flickr
   Relational database vendors realize the importance           of the Semantic Web market          Many systems have a ...
   “Export” does not necessarily mean physical           conversion            for very large databases a “duplication” ...
   A canonical RDF “view” of RDB tables          Only needs the information in the RDB Schema(24)
Each column name provides a predicate                 ISBN        Author           Title        Publisher            Year ...
Tables         RDB     Direct       Schema   Mapping                          “Direct Graph”(26)
   Pros:            Direct Mapping is simple, does not require any other             concepts            know the Schem...
Tables         RDB         Direct       Schema       Mapping                                “Direct Graph”                ...
   Separate vocabulary to control the details of the           mapping, e.g.:              finer control over the choice...
R2RML           Tables         RDB    Instance       Schema                 R2RML                Mapping                  ...
   Fundamentals are similar:            each row is turned into a series of triples with a common             subject   ...
   Technology has been finalized          Implementations revealed some minor issues to fold           into the specific...
   “Implementations of R2RML”, Wednesday, 8:45am            Souripriya Das (Oracle), Jans Aasman (Franz Inc.), Juan     ...
Structured data in HTML:RDFa & microdata                    Photo credit “shetladd”, Flickr
   Not necessarily large amount of data per page, but           lots of them…          Have become very valuable to sear...
<http://www.ivan-herman.net/foaf#me>           schema:alumniOf     <http://www.elte.hu> ;           foaf:schoolHomePage <h...
[ rdf:type schema:Review ;         schema:name "Oscars 2012: The Artist, review" ;         schema:description "The Artist,...
   Both have similar philosophies:            the structured data is expressed via attributes only (no             speci...
   Microdata has been optimized for simpler use cases:            one vocabulary at a time            tree shaped data ...
… 25% of webpages containing RDFa       data […] over 7% of web pages       containing microdata.                         ...
   For RDFa 1.1            Technology has been finalized            Is in Proposed Recommendation            Should be...
   “Schema.org panel”, Wednesday, 9:45am            Dan Brickley (schema.org), Ramanathan Guha (Google),             Ste...
Nexus Simulation Credit Erich Bremmer
   Many issues have come up since 2004:            deployment issues            new functionalities are needed         ...
   Standardize Turtle as a serialization format          Clean up some aspects of datatyping, e.g.:            plain vs...
   Cleanup the documents, make them more readable            maybe a completely new primer            probably a new st...
   Turtle is almost finalized          Literal cleanup is done          JSON-LD is in a good shape          Lots of di...
   “Updates to the Core RDF Standards”, Wednesday,           3:30pm            David Wood (3 Round Stones, Inc., also co...
   We should be able to express all sorts of “meta”           information on the data            who played what role in...
   Requires a complete model describing the various           constituents (actors, revisions, etc.)          The model ...
ex:photosDB        ex:facetedView                                ex:integratedData                                        ...
   Drafts have been published            abstract data model, OWL version            protocol (where to find provenance...
   “Benefits and Applications of W3C’s Provenance           Standards in Enterprise Semantic Web Applications”,          ...
(63)Courtesy   of Richard Cyganiak and Anja Jentzsch
   The datasets are essentially read-only            they are curated “out of band”: regularly extracted from other     ...
(66)
   Application Lifecycle Management            integration of development teams around the globe            management ...
Point-to-point via API                                            Centralized repository                                  ...
   They force to re-invent the wheel on many fronts            distribution of data around the Internet            acce...
   Distributed at its core          Scalable in terms of users, of data, of hardware and           software architecture...
Application            Application                            #1                     #2                                  H...
   Provide a simple, HTTP based infrastructure to           publish, read, write, or modify linked data          The inf...
   Main Work items:            define a RESTful way to access/update RDF data via HTTP            • what does HTTP GET/P...
   Has just started a few days ago!(74)
   “Linked Enterprise Data Patterns”, Tuesday, 11:30am            David Wood (3 Round Stones, Inc.), Arnaud Le Hors (IBM...
   Knowledge vs. data ratio is different: very shallow,           simple vocabularies for huge sets of data            t...
   Check on the quality of the published data          Reconsider rule languages for (e.g., for Linked Data           ap...
   Issues around internationalization of Semantic Web           technologies          Relationship between Semantic Web ...
   Lots of issues to be           solved          But… W3C needs           experts!            consider joining        ...
These slides are also available on the Web:         http://www.w3.org/2012/Talks/0605-SemTech-IH/(82)
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Upcoming SlideShare
Loading in …5
×

Semantic Web and Related Work at W3C

1,088 views

Published on

Talk given at the SemTech 2012 conference on the current activities at W3C

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,088
On SlideShare
0
From Embeds
0
Number of Embeds
68
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • For the second bullet: in some cases graph based databases simply work better
  • 295 datasets, cca. 31.6B triples
  • Integrated datasets: dbpedia, open street maps, INSEE geographical information, official french list of historical monuments; not all made available in RDF, but integrated in RDF with the available conversion tools; results are merged in an RDF database (over 4.5 million triples) and a company tools is created for the user interface. Done in a few days using available tools
  • Semantic Web and Related Work at W3C

    1. 1. Ivan Herman, W3CSemantic Technology & Business Conference 5th June, 2012 San Francisco, CA, USA
    2. 2.  A system manipulating and analyzing knowledge  e.g., via big ontologies, vocabularies  Google’s Knowledge Graph?  Improve search by adding structure to embedded data  A means to integrate many different pieces of data  Integrate data-oriented applications  And a mixture of all these…(3)
    3. 3.  We have to acknowledge that the field has grown and has become multi-faceted  All different “views” have their success stories  There are also no clear and water-proof boundaries between the different views(5)
    4. 4.  Some technologies are, essentially, done:  Ontology for Media Resources  Media Fragments URI  SPARQL 1.1 (SPARQL Protocol and RDF Query Language)  RDB2RDF (Relational Databases to RDF)  RDFa 1.1 (RDF in attributes)(7)
    5. 5.  Some areas are subject of work  Update of RDF  Provenance  Linked Data Platform(8)
    6. 6.  We are discussing new works, new areas, e.g.,  Access Control issues  Constraint checking on Semantic Web data …(9)
    7. 7.  Various communities have different emphasis on which part of the Semantic Web they want to use  W3C has contacts with some of those  health care and life sciences (a separate IG is up and running)  libraries, publishing  financials  the oil, gas, and chemicals community  governments  … but there are many more!(10)
    8. 8. Photo credit Robert Freund(11)
    9. 9. (12)
    10. 10. (13)
    11. 11.  The “usual” Semantic Web problem: what vocabularies to use?  The problem is not that there aren’t any… but that there are too many!  EXIF, MPEG7, XMP, MRSS, …  none of these cover all aspects  The Ontology for Media Resources document  defines a core vocabulary  defines a set of mappings to other formats(14)
    12. 12.  Questions:  what is the standard URI for, say, a temporal fragment of a video?  what should be the behavior of the user agents for these URIs?  These are covered by the Media Fragments URI document, e.g.  http://www.example.com/video.ogv#t=10,20  http://www.example.com/video.ogv#track=audio  http://www.example.com/video.ogv#xywh=160,120,320,240(15)
    13. 13.  Ontologies for Media Resources: published as a Recommendation in February 2012  Media Fragments URI: should be published as a Recommendation very soon(16)
    14. 14. Photo credit “reedster”, Flickr
    15. 15.  Nested queries (i.e., SELECT within a WHERE clause)  Negation (MINUS, and a NOT EXIST filter)  Aggregate function on search results (SUM, MIN,…)  Property path expression (?x foaf:knows+ ?y)  SPARQL UPDATE facilities (INSERT, DELETE, CREATE)  Combination with entailment regimes  Return format definition in JSON and in CSV(18)
    16. 16. SPARQL Endpoint SPARQL Endpoint Application Triple store Database SPARQL Processor Inferencing NLP Techniques SQLRDF RDF Graph Relational Database Inferencing HTML Unstructured Text XML/XHTML(19)
    17. 17.  Technology has been finalized  Goes to “Proposed Recommendation” soon  Should be published as a standard by this fall(20)
    18. 18. Photo credit “mayhem”, Flickr
    19. 19.  Relational database vendors realize the importance of the Semantic Web market  Many systems have a “hybrid” view:  traditional, relational storage, usually coupled with SQL  RDF storage, usually coupled with SPARQL  examples: Oracle 3g, IBM’s DB2, OpenLink Virtuoso,…(22)
    20. 20.  “Export” does not necessarily mean physical conversion  for very large databases a “duplication” would not be an option  systems may provide SPARQL⇔SQL “bridges” to make queries on the fly  Result of export is a “logical” view of the RDB content(23)
    21. 21.  A canonical RDF “view” of RDB tables  Only needs the information in the RDB Schema(24)
    22. 22. Each column name provides a predicate ISBN Author Title Publisher Year Each row is a 0006511409X id_xyz The Glass Palace id_qpr 2000 0007179871 id_xyz The Hungry Tide id_qpr 2004 subject Table references are URI objects Cells are Literal objects ID Name Homepage id_xyz Ghosh, Amitav http://www.amitavghosh.com(25)
    23. 23. Tables RDB Direct Schema Mapping “Direct Graph”(26)
    24. 24.  Pros:  Direct Mapping is simple, does not require any other concepts  know the Schema ⇒ know the RDF graph structure  know the RDF graph structure ⇒ good idea of the Schema(!)  Cons:  the resulting graph is not what the application really wants(27)
    25. 25. Tables RDB Direct Schema Mapping “Direct Graph” Graph Processing (Rules, SPARQL, …) Final, Application Graph(28)
    26. 26.  Separate vocabulary to control the details of the mapping, e.g.:  finer control over the choice of the subject  creation of URI references from cells  predicates may be chosen from a vocabulary  datatypes may be assigned  etc.  Gets to the final RDF graph with one processing step(29)
    27. 27. R2RML Tables RDB Instance Schema R2RML Mapping Final, Application Graph(30)
    28. 28.  Fundamentals are similar:  each row is turned into a series of triples with a common subject  Direct mapping is a “default” R2RML mapping  Which of the two approaches is used depend on local tools, personal experiences and background,…  e.g., user can begin with a “default” R2RML, and gradually refine it(31)
    29. 29.  Technology has been finalized  Implementations revealed some minor issues to fold into the specification  Should be finished this summer(32)
    30. 30.  “Implementations of R2RML”, Wednesday, 8:45am  Souripriya Das (Oracle), Jans Aasman (Franz Inc.), Juan Sequeda (Capsenta), and Tony Vachino (Spry, Inc.)(33)
    31. 31. Structured data in HTML:RDFa & microdata Photo credit “shetladd”, Flickr
    32. 32.  Not necessarily large amount of data per page, but lots of them…  Have become very valuable to search engines  Google, Bing, Yahoo!, or Yandex (i.e., schema.org) all committed to use such data  Two syntaxes have emerged at W3C:  microdata with HTML5  RDFa with HTML5, XHTML, and with XML languages in general(35)
    33. 33. <http://www.ivan-herman.net/foaf#me> schema:alumniOf <http://www.elte.hu> ; foaf:schoolHomePage <http://www.elte.hu> ; schema:worksFor <http://www.w3.org/W3C#data> ; … <http://www.elte.hu> dc:title "Eötvös Loránd University of Budapest" . … <http://www.w3.org/W3C#data> dc:title "World Wide Web Consortium (W3C)” …(39)
    34. 34. [ rdf:type schema:Review ; schema:name "Oscars 2012: The Artist, review" ; schema:description "The Artist, an utterly beguiling…" ; schema:ratingValue "5" ; … ](43)
    35. 35.  Both have similar philosophies:  the structured data is expressed via attributes only (no specialized elements)  both define some special attributes • e.g., itemscope for microdata, resource for RDFa  both reuse some HTML core attributes (e.g., href)  both reuse the textual content of the HTML source, if needed  RDF data can be extracted from both(45)
    36. 36.  Microdata has been optimized for simpler use cases:  one vocabulary at a time  tree shaped data  no datatypes  RDFa provides a full serialization of RDF in XML or HTML  the price is an extra complexity compared to microdata  RDFa 1.1 Lite is a simplified authoring profile of RDFa, very similar to microdata(46)
    37. 37. … 25% of webpages containing RDFa data […] over 7% of web pages containing microdata. Mail from Peter Mika, Yahoo! Based on a crawl evaluation by P. Mika and T. Potter LDOW2012 Workshop, April 2012, Lyon, France … web pages that contain structured data has increased from 6% in 2010 to 12% in 2012. Hannes Mühleisen and Christian Bizer Web Data Commons—Extracting Structured Data from Two Large Web Corpora, LDOW2012 Workshop, April 2012, Lyon, France(47)
    38. 38.  For RDFa 1.1  Technology has been finalized  Is in Proposed Recommendation  Should be published as a Recommendation any day now  For microdata  Technology has been finalized  There is microdata→RDF mapping in a separate Note  Is part of HTML5, hence its formal advancement depends on other technologies(48)
    39. 39.  “Schema.org panel”, Wednesday, 9:45am  Dan Brickley (schema.org), Ramanathan Guha (Google), Steve Macbeth (Microsoft), Peter Mika (Yahoo!), Jeff Preston (Disney Interactive), Evan Sandhaus (NYT), Alexander Shubin (Yandex); moderator: Ivan Herman (W3C)(49)
    40. 40. Nexus Simulation Credit Erich Bremmer
    41. 41.  Many issues have come up since 2004:  deployment issues  new functionalities are needed  underlying technology may have moved on (e.g., datatypes)  The goal of the RDF Working Group is to refresh RDF  NOT a complete reshaping of the standard!(51)
    42. 42.  Standardize Turtle as a serialization format  Clean up some aspects of datatyping, e.g.:  plain vs. typed literals  introduction of an rdf:HTML datatype  better definition of rdf:XMLLiteral  Proper definition for “named graphs”  including concepts, semantics, syntax, … • obviously important for linked data access • but generates quite some discussions on the details  Standardize a JSON format for linked data (JSON- LD)(52)
    43. 43.  Cleanup the documents, make them more readable  maybe a completely new primer  probably a new structure for the Semantics document(53)
    44. 44.  Turtle is almost finalized  Literal cleanup is done  JSON-LD is in a good shape  Lots of discussion currently on named graphs…  A new version of the RDF 1.1 concepts has just been published this morning!  http://www.w3.org/TR/rdf11-concepts/(54)
    45. 45.  “Updates to the Core RDF Standards”, Wednesday, 3:30pm  David Wood (3 Round Stones, Inc., also co-chair of the W3C RDF WG)  “JSON-LD: JSON for Linked Data”, Thursday, 9:45am  Gregg Kellogg (Kellogg Associates)(55)
    46. 46.  We should be able to express all sorts of “meta” information on the data  who played what role in creating the data (author, reviewer, etc.)  view of the full revision chain of the data  in case of data integration: which part comes from which original data and under what process  what vocabularies/ontologies/rules were used to generate some portions of the data  etc.(57)
    47. 47.  Requires a complete model describing the various constituents (actors, revisions, etc.)  The model should be usable with RDF  Has to find a balance between  simple provenance: easily usable and editable  complex provenance: allows for a detailed reporting of origins, versions, etc.  That is the role of the Provenance Working Group (started in 2011)(58)
    48. 48. ex:photosDB ex:facetedView ex:integratedData used ex:metadata wasGeneratedBy used wasGeneratedBy used simile:exhibit ex:integrate wasAttributedTo wasAssociatedWith wasAssociatedWith foaf:name foaf:mbox foaf:mbox foaf:name mailto:derek@ex.com mailto:betty@ex.com Betty Derek(59) Screendump from Zepheira
    49. 49.  Drafts have been published  abstract data model, OWL version  protocol (where to find provenance data)  primer  Goal is to finalize the technical design in the fall of 2012(60)
    50. 50.  “Benefits and Applications of W3C’s Provenance Standards in Enterprise Semantic Web Applications”, Thursday, 10:45am  Reza Bfar (Oracle)(61)
    51. 51. (63)Courtesy of Richard Cyganiak and Anja Jentzsch
    52. 52.  The datasets are essentially read-only  they are curated “out of band”: regularly extracted from other databases, changed manually by data owners, etc  The dominating paradigm is to extract data via SPARQL queries  Applications use (very) large datasets via (RDF based) integration(65)
    53. 53. (66)
    54. 54.  Application Lifecycle Management  integration of development teams around the globe  management of bug report, user requirements  versioning  Distributed access to, and management of Library Catalogue data  Integrated view of corporate and private address book data(67)
    55. 55. Point-to-point via API Centralized repository Central Hub/Bus(68) Inspired by Arnaud Le Hors, IBM, LDOW2012 Presentation
    56. 56.  They force to re-invent the wheel on many fronts  distribution of data around the Internet  access control issues  definition and implementation of new API-s, Protocols, data formats  etc.(69)
    57. 57.  Distributed at its core  Scalable in terms of users, of data, of hardware and software architecture  Open to anyone  Has a wide variety of available tools  Widely known and deployed Sounds Familiar? (70)
    58. 58. Application Application #1 #2 HTTP GET, HEAD URI+HTTP HTTP PUT, DELETE Linked Data Linked Data on Server #1 on Server #2(71)
    59. 59.  Provide a simple, HTTP based infrastructure to publish, read, write, or modify linked data  The infrastructure should be easy to implement and install  more complex applications may require more sophisticated tools like SPARQL, Provenance, OWL,…  provides an “entry point” for Linked Data applications!(72)
    60. 60.  Main Work items:  define a RESTful way to access/update RDF data via HTTP • what does HTTP GET/PUT/DELETE/POST/… mean for Linked Data?  define a “profile” of minimal requirements for applications: • what RDF datatypes are used • what serialization syntax(es) must be supported • how to access reasonable chunks of information (paging) • how to manage collections of RDF data • what vocabulary items to use for metadata • etc.(73)
    61. 61.  Has just started a few days ago!(74)
    62. 62.  “Linked Enterprise Data Patterns”, Tuesday, 11:30am  David Wood (3 Round Stones, Inc.), Arnaud Le Hors (IBM), Ashok Malhotra (Oracle)  LDP Working Group BOF: Tuesday, 12:30pm, Franciscan “C”(75)
    63. 63.  Knowledge vs. data ratio is different: very shallow, simple vocabularies for huge sets of data  the role of reasoning is different (vocabularies, OWL DL, etc., may not be feasible)  Not enough links among datasets  lots of work on “creating” further links  Scale: billions of triples, increasing every day  setting a SPARQL endpoint everywhere may not be realistic  Highly distributed  data may not be in one single database, even within the same organization(77)
    64. 64.  Check on the quality of the published data  Reconsider rule languages for (e.g., for Linked Data applications)  Relationship to JSON  Constraint checking of Data  API-s for client-side Web Application Developers(78)
    65. 65.  Issues around internationalization of Semantic Web technologies  Relationship between Semantic Web technologies and Big Data, Cloud Storage and Computing,…  Specific standard vocabularies (e.g., data annotation, governmental vocabularies)  some of these may be defined at W3C, some elsewhere(79)
    66. 66.  Lots of issues to be solved  But… W3C needs experts!  consider joining W3C, as well as the work done there!(80)
    67. 67. These slides are also available on the Web: http://www.w3.org/2012/Talks/0605-SemTech-IH/(82)

    ×