Binary RDF for Scalable Publishing,   Exchanging and Consumption        in the Web of DataJavier D. FernándezSupervised by...
Brief RDF Introduction(1) Resource Description Framework     Webs, services, protocols     Persons, Proteins, geography…...
RDF Example                                                                                           literalSubject, Pred...
1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, pro...
Image:PhD Symposium      Danilo Rizzuti / FreeDigitalPhotos.net
Scalability problems                    DBPedia (en)   233 M.triples   ~ 33 GB                    Uniprot        845     “...
RDF Publication                                                   dereferenceable URIs                                    ...
RDF Exchanging issues RDF/XML, N3, Turtle, JSON.          Document-centric (verbose)  data-centric view (machine) No s...
RDF Processing/Consumption (After Exchanging) Costly Post-processing          Decompression          Indexing (RDF Stor...
The scalability problems hasa main impact on Users         Would you download hundreds of GB...                           ...
In the following...1. Proposed approach for scalable publishing, exchanging and consumption   of large RDF datasets2. Prel...
An integrated solutionWe call for, and we study in this thesis, a Binary RDF Serialization format:     Machine oriented (...
HDT Overview PhD Symposium
Dictionary+Triples partition   1   <http://books/author33>   2   <http://books/book21>             6   3   dc:author   4  ...
Key concepts: The Dictionary   Largest component (up to 74%)     Long URIs, shared prefixes     Lang, datatype tags in ...
Preliminary results in Rich Functional DictionariesWe propose to adapt techniques for string dictionaries;  Front-Coding ...
Key concepts: Triples   Specific compression:       More efficient compression than just gzip.   Data indexing for cons...
Preliminary results in Triples EncodingWe propose to use Bitmap indexes:   [*] Compact Representation of Large RDF Data Se...
Methodology RDF structure in theory and practice. Binary RDF Specification. Succinct Dictionaries. Triples Indexes. P...
Some Results… HDT Acknowledged as W3Cmember submission:http://www.w3.org/Submission/2011/03/                              ...
Some Results... HDT for exchanging PhD Symposium
Some Results... HDT for consumptionDirect Consumption, without decompression after exchanging           Example of use: H...
On-going promising work: HDT-FoQ    [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto,        Mario...
In conclusionBinary RDF aims to lightweight the Web of Data;    Logical decomposition: Header, Dictionary, and Triples   ...
Still much work on… Getting a global understanding of the real structure of RDF networks. Applying this knowledge in inn...
Thanks!HDT:        http://www.rdfhdt.org/Group: http://dataweb.infor.uva.es/Slides: http://www.slideshare.net/javifer  ...
Upcoming SlideShare
Loading in …5
×

Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

1,946 views

Published on

Slides of my presentation at WWW 2012 PhD Symposium

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,946
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

  1. 1. Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of DataJavier D. FernándezSupervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile)PhD Symposium
  2. 2. Brief RDF Introduction(1) Resource Description Framework  Webs, services, protocols  Persons, Proteins, geography…(2) A standard model for data exchange on the Web  Understandable by computers(3) W3C Recommendation (2004)(4) Data model  (subject, predicate, object) PhD Symposium
  3. 3. RDF Example literalSubject, Predicate, Object(U,B) , U , (U,B,L) “Pablo Neruda” URI URI URI <http://books/author33> <http://books/book21> “Spain in the Heart” _collection <http://myblog/lectures> lectures:to_read_list Blank PhD Symposium
  4. 4. 1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)4. Include links to other URIs, so that they can discover more things. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  5. 5. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  6. 6. Scalability problems DBPedia (en) 233 M.triples ~ 33 GB Uniprot 845 “ ~ 230 GB Publish? Exchange? Process/Consume/Query? PhD Symposium
  7. 7. RDF Publication dereferenceable URIs RDF dump sensor SPARQL Endpoints/ APIs No Recommendations/methodology to publish at large scale Related Work: Some metadata for discovery, such as Void, Semantic Sitemaps. PhD Symposium
  8. 8. RDF Exchanging issues RDF/XML, N3, Turtle, JSON.  Document-centric (verbose)  data-centric view (machine) No structure (chunks, universal compression) Related Work: Universal compression (gzip, bzip2) and the Efficient XML Interchange Format (EXI).Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  9. 9. RDF Processing/Consumption (After Exchanging) Costly Post-processing  Decompression  Indexing (RDF Store)  Finally… consume Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat).Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  10. 10. The scalability problems hasa main impact on Users Would you download hundreds of GB... … if you don’t know exactly what they contain, that need costly exchange and post-processing, and require a powerful store to query them ?Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  11. 11. In the following...1. Proposed approach for scalable publishing, exchanging and consumption of large RDF datasets2. Preliminary results3. Methodology4. On-going work and conclusions Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  12. 12. An integrated solutionWe call for, and we study in this thesis, a Binary RDF Serialization format:  Machine oriented (binary)  Clean publication  Metadata  Modular  Efficient exchange  Compression  Basic data operations  Easy to parse and consume  Primitive query resolution Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  13. 13. HDT Overview PhD Symposium
  14. 14. Dictionary+Triples partition 1 <http://books/author33> 2 <http://books/book21> 6 3 dc:author 4 dc:title 5 foaf:name 1 2 6 “Pablo Neruda” 7 “Spain in the Heart” 7 PhD Symposium
  15. 15. Key concepts: The Dictionary Largest component (up to 74%)  Long URIs, shared prefixes  Lang, datatype tags in literals Efficient IDString operationsWe plan to work on a specific organization which  Optimizes space (regularities)  Provides efficient performance in operations PhD Symposium
  16. 16. Preliminary results in Rich Functional DictionariesWe propose to adapt techniques for string dictionaries;  Front-Coding  Making dictionary partitions [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández, Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012). PhD Symposium
  17. 17. Key concepts: Triples Specific compression:  More efficient compression than just gzip. Data indexing for consumption:  Allows direct patterns resolution without decompression (s,p,o), (s,?p,?o) and (s,p,?o)We plan to work on a specific technique which  optimizes space  provides efficient performance in primitive operations PhD Symposium
  18. 18. Preliminary results in Triples EncodingWe propose to use Bitmap indexes: [*] Compact Representation of Large RDF Data Sets for Publishing and Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez. International Semantic Web Conference(ISWC 2010). PhD Symposium
  19. 19. Methodology RDF structure in theory and practice. Binary RDF Specification. Succinct Dictionaries. Triples Indexes. Practical deployment.Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  20. 20. Some Results… HDT Acknowledged as W3Cmember submission:http://www.w3.org/Submission/2011/03/ supported by: PhD Symposium
  21. 21. Some Results... HDT for exchanging PhD Symposium
  22. 22. Some Results... HDT for consumptionDirect Consumption, without decompression after exchanging  Example of use: HDT-it (Thanks to Mario Arias, DERI)Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  23. 23. On-going promising work: HDT-FoQ [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto, Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC 2012). To appear PhD Symposium
  24. 24. In conclusionBinary RDF aims to lightweight the Web of Data;  Logical decomposition: Header, Dictionary, and Triples  Clean publication  Compressed RDF format for exchanging  Machine-friendly, direct consumption  Rich Functional Dictionary/Triples representations for querying PhD Symposium
  25. 25. Still much work on… Getting a global understanding of the real structure of RDF networks. Applying this knowledge in innovative dictionary and triples indexes.  full SPARQL at consumption Supporting dynamic operations  inserting, deleting, and updating binary RDF PhD Symposium
  26. 26. Thanks!HDT: http://www.rdfhdt.org/Group: http://dataweb.infor.uva.es/Slides: http://www.slideshare.net/javifer Javier D. Fernández (jfergar@infor.uva.es) Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium

×