SlideShare a Scribd company logo
1 of 26
Download to read offline
Binary RDF for Scalable Publishing,
   Exchanging and Consumption
        in the Web of Data

Javier D. Fernández
Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez

                                        University of Valladolid (Spain)
                                            University of Chile (Chile)




PhD Symposium
Brief RDF Introduction

(1) Resource Description Framework
     Webs, services, protocols
     Persons, Proteins, geography…


(2) A standard model for data exchange on the Web
    Understandable by computers


(3) W3C Recommendation (2004)

(4) Data model
    (subject, predicate, object)


   PhD Symposium
RDF Example
                                                                                           literal
Subject, Predicate, Object
(U,B) , U        , (U,B,L)
                                                                                   “Pablo Neruda”
                                                                URI
               URI                         URI



                                                                 <http://books/author33>

    <http://books/book21>

                                                                 “Spain in the Heart”




                 _collection                      <http://myblog/lectures>

                               lectures:to_read_list

       Blank

 PhD Symposium
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs, so that they can discover more things.




     Image:PhD Symposium
           Danilo Rizzuti / FreeDigitalPhotos.net
Image:PhD Symposium
      Danilo Rizzuti / FreeDigitalPhotos.net
Scalability problems



                    DBPedia (en)   233 M.triples   ~ 33 GB
                    Uniprot        845     “       ~ 230 GB



   Publish?
   Exchange?
   Process/Consume/Query?



    PhD Symposium
RDF Publication



                                                   dereferenceable URIs

                                     RDF dump
                  sensor
                                                SPARQL Endpoints/
                                                      APIs


 No Recommendations/methodology to publish at large scale
 Related Work: Some metadata for discovery, such as Void, Semantic
  Sitemaps.




  PhD Symposium
RDF Exchanging issues
 RDF/XML, N3, Turtle, JSON.
          Document-centric (verbose)  data-centric view (machine)
 No structure (chunks, universal compression)



 Related Work: Universal compression (gzip, bzip2) and the Efficient XML
  Interchange Format (EXI).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
RDF Processing/Consumption (After Exchanging)
 Costly Post-processing
          Decompression
          Indexing (RDF Store)
          Finally… consume


 Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes
  (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
The scalability problems has
a main impact on Users

         Would you download hundreds of GB...


                                              … if you don’t know exactly what they contain,
                                             that need costly exchange and post-processing,
                                                and require a powerful store to query them ?




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
In the following...
1. Proposed approach for scalable publishing, exchanging and consumption
   of large RDF datasets
2. Preliminary results
3. Methodology
4. On-going work and conclusions




   Image:PhD Symposium
         jscreationzs / FreeDigitalPhotos.net
An integrated solution
We call for, and we study in this thesis, a Binary RDF Serialization format:
     Machine oriented (binary)
     Clean publication
               Metadata
               Modular
     Efficient exchange
               Compression
     Basic data operations
               Easy to parse and consume
               Primitive query resolution




    Image:PhD Symposium
          jscreationzs / FreeDigitalPhotos.net
HDT Overview




 PhD Symposium
Dictionary+Triples partition



   1   <http://books/author33>
   2   <http://books/book21>             6
   3   dc:author
   4   dc:title
   5   foaf:name                     1
                                 2
   6   “Pablo Neruda”
   7   “Spain in the Heart”          7




  PhD Symposium
Key concepts: The Dictionary

   Largest component (up to 74%)
     Long URIs, shared prefixes
     Lang, datatype tags in literals
   Efficient IDString operations



We plan to work on a specific organization which
  Optimizes space (regularities)
  Provides efficient performance in operations




         PhD Symposium
Preliminary results in Rich Functional Dictionaries

We propose to adapt techniques for string dictionaries;
  Front-Coding
     Making dictionary partitions




  [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández,
     Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012).

       PhD Symposium
Key concepts: Triples

   Specific compression:
       More efficient compression than just gzip.
   Data indexing for consumption:
       Allows direct patterns resolution without decompression
           (s,p,o), (s,?p,?o) and (s,p,?o)


We plan to work on a specific technique which
  optimizes space
  provides efficient performance in primitive operations




          PhD Symposium
Preliminary results in Triples Encoding

We propose to use Bitmap indexes:




   [*] Compact Representation of Large RDF Data Sets for Publishing and
       Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez.
       International Semantic Web Conference(ISWC 2010).

       PhD Symposium
Methodology
 RDF structure in theory and practice.
 Binary RDF Specification.
 Succinct Dictionaries.
 Triples Indexes.
 Practical deployment.




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
Some Results… HDT Acknowledged as W3C
member submission:
http://www.w3.org/Submission/2011/03/
                                        supported by:




   PhD Symposium
Some Results... HDT for exchanging




 PhD Symposium
Some Results... HDT for consumption
Direct Consumption, without decompression after exchanging
           Example of use: HDT-it (Thanks to Mario Arias, DERI)




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
On-going promising work: HDT-FoQ




    [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto,
        Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC
        2012). To appear
 PhD Symposium
In conclusion
Binary RDF aims to lightweight the Web of Data;
    Logical decomposition: Header, Dictionary, and Triples
    Clean publication
    Compressed RDF format for exchanging
    Machine-friendly, direct consumption
         Rich Functional Dictionary/Triples representations for querying




      PhD Symposium
Still much work on…
 Getting a global understanding of the real structure of RDF networks.
 Applying this knowledge in innovative dictionary and triples indexes.
     full SPARQL at consumption
 Supporting dynamic operations
     inserting, deleting, and updating binary RDF




       PhD Symposium
Thanks!



HDT:        http://www.rdfhdt.org/
Group: http://dataweb.infor.uva.es/
Slides: http://www.slideshare.net/javifer


                                   Javier D. Fernández (jfergar@infor.uva.es)
                   Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez

                                                   University of Valladolid (Spain)
                                                       University of Chile (Chile)


  PhD Symposium

More Related Content

What's hot

Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
GDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of GraphsGDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of GraphsNeo4j
 
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -Dongbum Kim
 
SPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeSPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeAdriel Café
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Ontotext
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
オープンデータとSPARQLでビジュアライズ
オープンデータとSPARQLでビジュアライズオープンデータとSPARQLでビジュアライズ
オープンデータとSPARQLでビジュアライズuedayou
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight OverviewJacques Nadeau
 
Module: Content Addressing in IPFS
Module: Content Addressing in IPFSModule: Content Addressing in IPFS
Module: Content Addressing in IPFSIoannis Psaras
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkLi Jin
 
LOD技術の概要と LinkData.orgを用いたLOD公開
LOD技術の概要とLinkData.orgを用いたLOD公開LOD技術の概要とLinkData.orgを用いたLOD公開
LOD技術の概要と LinkData.orgを用いたLOD公開Kouji Kozaki
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013Juan Sequeda
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to GraphNeo4j
 

What's hot (20)

Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
GDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of GraphsGDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of Graphs
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
Linked Data의 RDF 어휘 이해하고 체험하기 - FOAF, SIOC, SKOS를 중심으로 -
 
SPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeSPARQL-DL - Theory & Practice
SPARQL-DL - Theory & Practice
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
オープンデータとSPARQLでビジュアライズ
オープンデータとSPARQLでビジュアライズオープンデータとSPARQLでビジュアライズ
オープンデータとSPARQLでビジュアライズ
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
Module: Content Addressing in IPFS
Module: Content Addressing in IPFSModule: Content Addressing in IPFS
Module: Content Addressing in IPFS
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Big data on aws
Big data on awsBig data on aws
Big data on aws
 
Graph database
Graph database Graph database
Graph database
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
LOD技術の概要と LinkData.orgを用いたLOD公開
LOD技術の概要とLinkData.orgを用いたLOD公開LOD技術の概要とLinkData.orgを用いたLOD公開
LOD技術の概要と LinkData.orgを用いたLOD公開
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 

Viewers also liked

Ch 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyCh 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyGale Pooley
 
Magnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityMagnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityNorman Garcia
 
B. indonesia
B. indonesiaB. indonesia
B. indonesiaJay
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьуŞamil Tzva
 
10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!Timo Savolainen
 
101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discriminationGale Pooley
 
Hen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsHen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsGale Pooley
 
Mollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationMollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationJoanan Hernandez
 
Lecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemLecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemGale Pooley
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbcHatti Knuts
 

Viewers also liked (20)

Exel budget
Exel budgetExel budget
Exel budget
 
The pitch[1]
The pitch[1]The pitch[1]
The pitch[1]
 
Ch 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyCh 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economy
 
Proposal salam bgi
Proposal salam bgiProposal salam bgi
Proposal salam bgi
 
TGV Pequim-Xangai
TGV Pequim-XangaiTGV Pequim-Xangai
TGV Pequim-Xangai
 
Magnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityMagnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon City
 
A good story
A good storyA good story
A good story
 
B. indonesia
B. indonesiaB. indonesia
B. indonesia
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьу
 
10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!
 
101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination
 
Hen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsHen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costs
 
London
LondonLondon
London
 
Mollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationMollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentation
 
The pitch
The pitchThe pitch
The pitch
 
CONCURSO BOLETÍN
CONCURSO BOLETÍNCONCURSO BOLETÍN
CONCURSO BOLETÍN
 
Lecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemLecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial system
 
O drama na Síria ...
O drama na Síria ...O drama na Síria ...
O drama na Síria ...
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbc
 
202 lecture 1
202 lecture 1202 lecture 1
202 lecture 1
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...João Rocha da Silva
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." Avalon Media System
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data (20)

Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World."
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

  • 1. Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data Javier D. Fernández Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium
  • 2. Brief RDF Introduction (1) Resource Description Framework  Webs, services, protocols  Persons, Proteins, geography… (2) A standard model for data exchange on the Web  Understandable by computers (3) W3C Recommendation (2004) (4) Data model  (subject, predicate, object) PhD Symposium
  • 3. RDF Example literal Subject, Predicate, Object (U,B) , U , (U,B,L) “Pablo Neruda” URI URI URI <http://books/author33> <http://books/book21> “Spain in the Heart” _collection <http://myblog/lectures> lectures:to_read_list Blank PhD Symposium
  • 4. 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover more things. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 5. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 6. Scalability problems DBPedia (en) 233 M.triples ~ 33 GB Uniprot 845 “ ~ 230 GB  Publish?  Exchange?  Process/Consume/Query? PhD Symposium
  • 7. RDF Publication dereferenceable URIs RDF dump sensor SPARQL Endpoints/ APIs  No Recommendations/methodology to publish at large scale  Related Work: Some metadata for discovery, such as Void, Semantic Sitemaps. PhD Symposium
  • 8. RDF Exchanging issues  RDF/XML, N3, Turtle, JSON.  Document-centric (verbose)  data-centric view (machine)  No structure (chunks, universal compression)  Related Work: Universal compression (gzip, bzip2) and the Efficient XML Interchange Format (EXI). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 9. RDF Processing/Consumption (After Exchanging)  Costly Post-processing  Decompression  Indexing (RDF Store)  Finally… consume  Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 10. The scalability problems has a main impact on Users Would you download hundreds of GB... … if you don’t know exactly what they contain, that need costly exchange and post-processing, and require a powerful store to query them ? Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 11. In the following... 1. Proposed approach for scalable publishing, exchanging and consumption of large RDF datasets 2. Preliminary results 3. Methodology 4. On-going work and conclusions Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 12. An integrated solution We call for, and we study in this thesis, a Binary RDF Serialization format:  Machine oriented (binary)  Clean publication  Metadata  Modular  Efficient exchange  Compression  Basic data operations  Easy to parse and consume  Primitive query resolution Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 13. HDT Overview PhD Symposium
  • 14. Dictionary+Triples partition 1 <http://books/author33> 2 <http://books/book21> 6 3 dc:author 4 dc:title 5 foaf:name 1 2 6 “Pablo Neruda” 7 “Spain in the Heart” 7 PhD Symposium
  • 15. Key concepts: The Dictionary  Largest component (up to 74%)  Long URIs, shared prefixes  Lang, datatype tags in literals  Efficient IDString operations We plan to work on a specific organization which  Optimizes space (regularities)  Provides efficient performance in operations PhD Symposium
  • 16. Preliminary results in Rich Functional Dictionaries We propose to adapt techniques for string dictionaries;  Front-Coding  Making dictionary partitions [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández, Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012). PhD Symposium
  • 17. Key concepts: Triples  Specific compression:  More efficient compression than just gzip.  Data indexing for consumption:  Allows direct patterns resolution without decompression (s,p,o), (s,?p,?o) and (s,p,?o) We plan to work on a specific technique which  optimizes space  provides efficient performance in primitive operations PhD Symposium
  • 18. Preliminary results in Triples Encoding We propose to use Bitmap indexes: [*] Compact Representation of Large RDF Data Sets for Publishing and Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez. International Semantic Web Conference(ISWC 2010). PhD Symposium
  • 19. Methodology  RDF structure in theory and practice.  Binary RDF Specification.  Succinct Dictionaries.  Triples Indexes.  Practical deployment. Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 20. Some Results… HDT Acknowledged as W3C member submission: http://www.w3.org/Submission/2011/03/ supported by: PhD Symposium
  • 21. Some Results... HDT for exchanging PhD Symposium
  • 22. Some Results... HDT for consumption Direct Consumption, without decompression after exchanging  Example of use: HDT-it (Thanks to Mario Arias, DERI) Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 23. On-going promising work: HDT-FoQ [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto, Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC 2012). To appear PhD Symposium
  • 24. In conclusion Binary RDF aims to lightweight the Web of Data;  Logical decomposition: Header, Dictionary, and Triples  Clean publication  Compressed RDF format for exchanging  Machine-friendly, direct consumption  Rich Functional Dictionary/Triples representations for querying PhD Symposium
  • 25. Still much work on…  Getting a global understanding of the real structure of RDF networks.  Applying this knowledge in innovative dictionary and triples indexes.  full SPARQL at consumption  Supporting dynamic operations  inserting, deleting, and updating binary RDF PhD Symposium
  • 26. Thanks! HDT: http://www.rdfhdt.org/ Group: http://dataweb.infor.uva.es/ Slides: http://www.slideshare.net/javifer Javier D. Fernández (jfergar@infor.uva.es) Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium