"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data
1. Binary RDF for Scalable Publishing,
Exchanging and Consumption
in the Web of Data
Javier D. Fernández
Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez
University of Valladolid (Spain)
University of Chile (Chile)
PhD Symposium
2. Brief RDF Introduction
(1) Resource Description Framework
Webs, services, protocols
Persons, Proteins, geography…
(2) A standard model for data exchange on the Web
Understandable by computers
(3) W3C Recommendation (2004)
(4) Data model
(subject, predicate, object)
PhD Symposium
3. RDF Example
literal
Subject, Predicate, Object
(U,B) , U , (U,B,L)
“Pablo Neruda”
URI
URI URI
<http://books/author33>
<http://books/book21>
“Spain in the Heart”
_collection <http://myblog/lectures>
lectures:to_read_list
Blank
PhD Symposium
4. 1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs, so that they can discover more things.
Image:PhD Symposium
Danilo Rizzuti / FreeDigitalPhotos.net
7. RDF Publication
dereferenceable URIs
RDF dump
sensor
SPARQL Endpoints/
APIs
No Recommendations/methodology to publish at large scale
Related Work: Some metadata for discovery, such as Void, Semantic
Sitemaps.
PhD Symposium
8. RDF Exchanging issues
RDF/XML, N3, Turtle, JSON.
Document-centric (verbose) data-centric view (machine)
No structure (chunks, universal compression)
Related Work: Universal compression (gzip, bzip2) and the Efficient XML
Interchange Format (EXI).
Image:PhD krishnan / FreeDigitalPhotos.net
renjith Symposium
9. RDF Processing/Consumption (After Exchanging)
Costly Post-processing
Decompression
Indexing (RDF Store)
Finally… consume
Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes
(RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat).
Image:PhD krishnan / FreeDigitalPhotos.net
renjith Symposium
10. The scalability problems has
a main impact on Users
Would you download hundreds of GB...
… if you don’t know exactly what they contain,
that need costly exchange and post-processing,
and require a powerful store to query them ?
Image:PhD krishnan / FreeDigitalPhotos.net
renjith Symposium
11. In the following...
1. Proposed approach for scalable publishing, exchanging and consumption
of large RDF datasets
2. Preliminary results
3. Methodology
4. On-going work and conclusions
Image:PhD Symposium
jscreationzs / FreeDigitalPhotos.net
12. An integrated solution
We call for, and we study in this thesis, a Binary RDF Serialization format:
Machine oriented (binary)
Clean publication
Metadata
Modular
Efficient exchange
Compression
Basic data operations
Easy to parse and consume
Primitive query resolution
Image:PhD Symposium
jscreationzs / FreeDigitalPhotos.net
15. Key concepts: The Dictionary
Largest component (up to 74%)
Long URIs, shared prefixes
Lang, datatype tags in literals
Efficient IDString operations
We plan to work on a specific organization which
Optimizes space (regularities)
Provides efficient performance in operations
PhD Symposium
16. Preliminary results in Rich Functional Dictionaries
We propose to adapt techniques for string dictionaries;
Front-Coding
Making dictionary partitions
[*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández,
Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012).
PhD Symposium
17. Key concepts: Triples
Specific compression:
More efficient compression than just gzip.
Data indexing for consumption:
Allows direct patterns resolution without decompression
(s,p,o), (s,?p,?o) and (s,p,?o)
We plan to work on a specific technique which
optimizes space
provides efficient performance in primitive operations
PhD Symposium
18. Preliminary results in Triples Encoding
We propose to use Bitmap indexes:
[*] Compact Representation of Large RDF Data Sets for Publishing and
Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez.
International Semantic Web Conference(ISWC 2010).
PhD Symposium
19. Methodology
RDF structure in theory and practice.
Binary RDF Specification.
Succinct Dictionaries.
Triples Indexes.
Practical deployment.
Image:PhD Symposium
jscreationzs / FreeDigitalPhotos.net
20. Some Results… HDT Acknowledged as W3C
member submission:
http://www.w3.org/Submission/2011/03/
supported by:
PhD Symposium
22. Some Results... HDT for consumption
Direct Consumption, without decompression after exchanging
Example of use: HDT-it (Thanks to Mario Arias, DERI)
Image:PhD Symposium
jscreationzs / FreeDigitalPhotos.net
23. On-going promising work: HDT-FoQ
[*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto,
Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC
2012). To appear
PhD Symposium
24. In conclusion
Binary RDF aims to lightweight the Web of Data;
Logical decomposition: Header, Dictionary, and Triples
Clean publication
Compressed RDF format for exchanging
Machine-friendly, direct consumption
Rich Functional Dictionary/Triples representations for querying
PhD Symposium
25. Still much work on…
Getting a global understanding of the real structure of RDF networks.
Applying this knowledge in innovative dictionary and triples indexes.
full SPARQL at consumption
Supporting dynamic operations
inserting, deleting, and updating binary RDF
PhD Symposium
26. Thanks!
HDT: http://www.rdfhdt.org/
Group: http://dataweb.infor.uva.es/
Slides: http://www.slideshare.net/javifer
Javier D. Fernández (jfergar@infor.uva.es)
Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez
University of Valladolid (Spain)
University of Chile (Chile)
PhD Symposium