SlideShare a Scribd company logo
Binary RDF for Scalable Publishing,
   Exchanging and Consumption
        in the Web of Data

Javier D. Fernández
Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez

                                        University of Valladolid (Spain)
                                            University of Chile (Chile)




PhD Symposium
Brief RDF Introduction

(1) Resource Description Framework
     Webs, services, protocols
     Persons, Proteins, geography…


(2) A standard model for data exchange on the Web
    Understandable by computers


(3) W3C Recommendation (2004)

(4) Data model
    (subject, predicate, object)


   PhD Symposium
RDF Example
                                                                                           literal
Subject, Predicate, Object
(U,B) , U        , (U,B,L)
                                                                                   “Pablo Neruda”
                                                                URI
               URI                         URI



                                                                 <http://books/author33>

    <http://books/book21>

                                                                 “Spain in the Heart”




                 _collection                      <http://myblog/lectures>

                               lectures:to_read_list

       Blank

 PhD Symposium
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs, so that they can discover more things.




     Image:PhD Symposium
           Danilo Rizzuti / FreeDigitalPhotos.net
Image:PhD Symposium
      Danilo Rizzuti / FreeDigitalPhotos.net
Scalability problems



                    DBPedia (en)   233 M.triples   ~ 33 GB
                    Uniprot        845     “       ~ 230 GB



   Publish?
   Exchange?
   Process/Consume/Query?



    PhD Symposium
RDF Publication



                                                   dereferenceable URIs

                                     RDF dump
                  sensor
                                                SPARQL Endpoints/
                                                      APIs


 No Recommendations/methodology to publish at large scale
 Related Work: Some metadata for discovery, such as Void, Semantic
  Sitemaps.




  PhD Symposium
RDF Exchanging issues
 RDF/XML, N3, Turtle, JSON.
          Document-centric (verbose)  data-centric view (machine)
 No structure (chunks, universal compression)



 Related Work: Universal compression (gzip, bzip2) and the Efficient XML
  Interchange Format (EXI).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
RDF Processing/Consumption (After Exchanging)
 Costly Post-processing
          Decompression
          Indexing (RDF Store)
          Finally… consume


 Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes
  (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat).




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
The scalability problems has
a main impact on Users

         Would you download hundreds of GB...


                                              … if you don’t know exactly what they contain,
                                             that need costly exchange and post-processing,
                                                and require a powerful store to query them ?




Image:PhD krishnan / FreeDigitalPhotos.net
      renjith Symposium
In the following...
1. Proposed approach for scalable publishing, exchanging and consumption
   of large RDF datasets
2. Preliminary results
3. Methodology
4. On-going work and conclusions




   Image:PhD Symposium
         jscreationzs / FreeDigitalPhotos.net
An integrated solution
We call for, and we study in this thesis, a Binary RDF Serialization format:
     Machine oriented (binary)
     Clean publication
               Metadata
               Modular
     Efficient exchange
               Compression
     Basic data operations
               Easy to parse and consume
               Primitive query resolution




    Image:PhD Symposium
          jscreationzs / FreeDigitalPhotos.net
HDT Overview




 PhD Symposium
Dictionary+Triples partition



   1   <http://books/author33>
   2   <http://books/book21>             6
   3   dc:author
   4   dc:title
   5   foaf:name                     1
                                 2
   6   “Pablo Neruda”
   7   “Spain in the Heart”          7




  PhD Symposium
Key concepts: The Dictionary

   Largest component (up to 74%)
     Long URIs, shared prefixes
     Lang, datatype tags in literals
   Efficient IDString operations



We plan to work on a specific organization which
  Optimizes space (regularities)
  Provides efficient performance in operations




         PhD Symposium
Preliminary results in Rich Functional Dictionaries

We propose to adapt techniques for string dictionaries;
  Front-Coding
     Making dictionary partitions




  [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández,
     Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012).

       PhD Symposium
Key concepts: Triples

   Specific compression:
       More efficient compression than just gzip.
   Data indexing for consumption:
       Allows direct patterns resolution without decompression
           (s,p,o), (s,?p,?o) and (s,p,?o)


We plan to work on a specific technique which
  optimizes space
  provides efficient performance in primitive operations




          PhD Symposium
Preliminary results in Triples Encoding

We propose to use Bitmap indexes:




   [*] Compact Representation of Large RDF Data Sets for Publishing and
       Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez.
       International Semantic Web Conference(ISWC 2010).

       PhD Symposium
Methodology
 RDF structure in theory and practice.
 Binary RDF Specification.
 Succinct Dictionaries.
 Triples Indexes.
 Practical deployment.




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
Some Results… HDT Acknowledged as W3C
member submission:
http://www.w3.org/Submission/2011/03/
                                        supported by:




   PhD Symposium
Some Results... HDT for exchanging




 PhD Symposium
Some Results... HDT for consumption
Direct Consumption, without decompression after exchanging
           Example of use: HDT-it (Thanks to Mario Arias, DERI)




Image:PhD Symposium
      jscreationzs / FreeDigitalPhotos.net
On-going promising work: HDT-FoQ




    [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto,
        Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC
        2012). To appear
 PhD Symposium
In conclusion
Binary RDF aims to lightweight the Web of Data;
    Logical decomposition: Header, Dictionary, and Triples
    Clean publication
    Compressed RDF format for exchanging
    Machine-friendly, direct consumption
         Rich Functional Dictionary/Triples representations for querying




      PhD Symposium
Still much work on…
 Getting a global understanding of the real structure of RDF networks.
 Applying this knowledge in innovative dictionary and triples indexes.
     full SPARQL at consumption
 Supporting dynamic operations
     inserting, deleting, and updating binary RDF




       PhD Symposium
Thanks!



HDT:        http://www.rdfhdt.org/
Group: http://dataweb.infor.uva.es/
Slides: http://www.slideshare.net/javifer


                                   Javier D. Fernández (jfergar@infor.uva.es)
                   Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez

                                                   University of Valladolid (Spain)
                                                       University of Chile (Chile)


  PhD Symposium

More Related Content

What's hot

RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
Juan Sequeda
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
Databricks
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
Julian Hyde
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
Andrew Lamb
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
Jose Emilio Labra Gayo
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
inovex GmbH
 
Spark Summit EU 2015: SparkUI visualization: a lens into your application
Spark Summit EU 2015: SparkUI visualization: a lens into your applicationSpark Summit EU 2015: SparkUI visualization: a lens into your application
Spark Summit EU 2015: SparkUI visualization: a lens into your application
Databricks
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
Spark sql
Spark sqlSpark sql
Spark sql
Freeman Zhang
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
November 2022 IGRA Release
November 2022 IGRA ReleaseNovember 2022 IGRA Release
November 2022 IGRA Release
Israel Genealogy Research Association
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Edureka!
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflow
Databricks
 
A search engine in a world of events and microservices - SF Pot @Meetic
A search engine in a world of events and microservices - SF Pot @MeeticA search engine in a world of events and microservices - SF Pot @Meetic
A search engine in a world of events and microservices - SF Pot @Meetic
meeticTech
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Databricks
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
Cloudera, Inc.
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
Neo4j
 
Building a Highly Scalable, Open Source Twitter Clone
Building a Highly Scalable, Open Source Twitter CloneBuilding a Highly Scalable, Open Source Twitter Clone
Building a Highly Scalable, Open Source Twitter Clone
Paul Brown
 

What's hot (20)

RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Spark Summit EU 2015: SparkUI visualization: a lens into your application
Spark Summit EU 2015: SparkUI visualization: a lens into your applicationSpark Summit EU 2015: SparkUI visualization: a lens into your application
Spark Summit EU 2015: SparkUI visualization: a lens into your application
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
Spark sql
Spark sqlSpark sql
Spark sql
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
November 2022 IGRA Release
November 2022 IGRA ReleaseNovember 2022 IGRA Release
November 2022 IGRA Release
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflow
 
A search engine in a world of events and microservices - SF Pot @Meetic
A search engine in a world of events and microservices - SF Pot @MeeticA search engine in a world of events and microservices - SF Pot @Meetic
A search engine in a world of events and microservices - SF Pot @Meetic
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Building a Highly Scalable, Open Source Twitter Clone
Building a Highly Scalable, Open Source Twitter CloneBuilding a Highly Scalable, Open Source Twitter Clone
Building a Highly Scalable, Open Source Twitter Clone
 

Viewers also liked

Exel budget
Exel budgetExel budget
Exel budget
parevalo0041
 
The pitch[1]
The pitch[1]The pitch[1]
The pitch[1]
salesianas2011
 
Ch 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyCh 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economy
Gale Pooley
 
Proposal salam bgi
Proposal salam bgiProposal salam bgi
Proposal salam bgi
Achmad Susani
 
Magnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityMagnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon City
Norman Garcia
 
A good story
A good storyA good story
A good story
George Alex
 
B. indonesia
B. indonesiaB. indonesia
B. indonesia
Jay
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьуŞamil Tzva
 
10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!
Timo Savolainen
 
101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination
Gale Pooley
 
Hen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsHen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costs
Gale Pooley
 
London
LondonLondon
London
87honey
 
Mollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationMollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentation
Joanan Hernandez
 
The pitch
The pitchThe pitch
The pitch
salesianas2011
 
Lecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemLecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial system
Gale Pooley
 
O drama na Síria ...
O drama na Síria ...O drama na Síria ...
O drama na Síria ...
Umberto Pacheco
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbc
Hatti Knuts
 
202 lecture 1
202 lecture 1202 lecture 1
202 lecture 1
Gale Pooley
 

Viewers also liked (20)

Exel budget
Exel budgetExel budget
Exel budget
 
The pitch[1]
The pitch[1]The pitch[1]
The pitch[1]
 
Ch 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economyCh 33 macroeconomic theory open economy
Ch 33 macroeconomic theory open economy
 
Proposal salam bgi
Proposal salam bgiProposal salam bgi
Proposal salam bgi
 
TGV Pequim-Xangai
TGV Pequim-XangaiTGV Pequim-Xangai
TGV Pequim-Xangai
 
Magnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon CityMagnolia Residences @ New Manila Quezon City
Magnolia Residences @ New Manila Quezon City
 
A good story
A good storyA good story
A good story
 
B. indonesia
B. indonesiaB. indonesia
B. indonesia
 
МагIарулазул маргьу
МагIарулазул маргьуМагIарулазул маргьу
МагIарулазул маргьу
 
10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!10 Tips & Tricks for Your next crowdsourcing campaign!
10 Tips & Tricks for Your next crowdsourcing campaign!
 
101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination101 lecture 19 earnings and discrimination
101 lecture 19 earnings and discrimination
 
Hen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costsHen 368 lecture 8 production and costs
Hen 368 lecture 8 production and costs
 
London
LondonLondon
London
 
Mollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentationMollejuo Enter 2013 presentation
Mollejuo Enter 2013 presentation
 
The pitch
The pitchThe pitch
The pitch
 
CONCURSO BOLETÍN
CONCURSO BOLETÍNCONCURSO BOLETÍN
CONCURSO BOLETÍN
 
Lecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial systemLecture 9 saving investment and the financial system
Lecture 9 saving investment and the financial system
 
O drama na Síria ...
O drama na Síria ...O drama na Síria ...
O drama na Síria ...
 
My life as social media manager kbc
My life as social media manager kbcMy life as social media manager kbc
My life as social media manager kbc
 
202 lecture 1
202 lecture 1202 lecture 1
202 lecture 1
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
WU (Vienna University of Economics and Business)
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
Mark Wilkinson
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
CONUL Conference
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
henkvandenberg16
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
Evert Lammerts
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
João Rocha da Silva
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
François Belleau
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
Ivan Herman
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
Oscar Corcho
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
Roberto García
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
Oscar Corcho
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
Mark Wilkinson
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World."
Avalon Media System
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
WU (Vienna University of Economics and Business)
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
Laura Po
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
Jun Zhao
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 

Similar to Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data (20)

Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Exploring Linked Data
Exploring Linked DataExploring Linked Data
Exploring Linked Data
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World."
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 

Recently uploaded

Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 

Recently uploaded (20)

Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 

Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

  • 1. Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data Javier D. Fernández Supervised by: Miguel A. Martínez-Prieto and Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium
  • 2. Brief RDF Introduction (1) Resource Description Framework  Webs, services, protocols  Persons, Proteins, geography… (2) A standard model for data exchange on the Web  Understandable by computers (3) W3C Recommendation (2004) (4) Data model  (subject, predicate, object) PhD Symposium
  • 3. RDF Example literal Subject, Predicate, Object (U,B) , U , (U,B,L) “Pablo Neruda” URI URI URI <http://books/author33> <http://books/book21> “Spain in the Heart” _collection <http://myblog/lectures> lectures:to_read_list Blank PhD Symposium
  • 4. 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover more things. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 5. Image:PhD Symposium Danilo Rizzuti / FreeDigitalPhotos.net
  • 6. Scalability problems DBPedia (en) 233 M.triples ~ 33 GB Uniprot 845 “ ~ 230 GB  Publish?  Exchange?  Process/Consume/Query? PhD Symposium
  • 7. RDF Publication dereferenceable URIs RDF dump sensor SPARQL Endpoints/ APIs  No Recommendations/methodology to publish at large scale  Related Work: Some metadata for discovery, such as Void, Semantic Sitemaps. PhD Symposium
  • 8. RDF Exchanging issues  RDF/XML, N3, Turtle, JSON.  Document-centric (verbose)  data-centric view (machine)  No structure (chunks, universal compression)  Related Work: Universal compression (gzip, bzip2) and the Efficient XML Interchange Format (EXI). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 9. RDF Processing/Consumption (After Exchanging)  Costly Post-processing  Decompression  Indexing (RDF Store)  Finally… consume  Related Work (indexing): Based on Relational Storage (Virtuoso) Multi-indexes (RDF3X), Distributed Systems (Map-Reduce) and others (Bit-Mat). Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 10. The scalability problems has a main impact on Users Would you download hundreds of GB... … if you don’t know exactly what they contain, that need costly exchange and post-processing, and require a powerful store to query them ? Image:PhD krishnan / FreeDigitalPhotos.net renjith Symposium
  • 11. In the following... 1. Proposed approach for scalable publishing, exchanging and consumption of large RDF datasets 2. Preliminary results 3. Methodology 4. On-going work and conclusions Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 12. An integrated solution We call for, and we study in this thesis, a Binary RDF Serialization format:  Machine oriented (binary)  Clean publication  Metadata  Modular  Efficient exchange  Compression  Basic data operations  Easy to parse and consume  Primitive query resolution Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 13. HDT Overview PhD Symposium
  • 14. Dictionary+Triples partition 1 <http://books/author33> 2 <http://books/book21> 6 3 dc:author 4 dc:title 5 foaf:name 1 2 6 “Pablo Neruda” 7 “Spain in the Heart” 7 PhD Symposium
  • 15. Key concepts: The Dictionary  Largest component (up to 74%)  Long URIs, shared prefixes  Lang, datatype tags in literals  Efficient IDString operations We plan to work on a specific organization which  Optimizes space (regularities)  Provides efficient performance in operations PhD Symposium
  • 16. Preliminary results in Rich Functional Dictionaries We propose to adapt techniques for string dictionaries;  Front-Coding  Making dictionary partitions [*] Compression of RDF Dictionaries. Miguel A. Martínez-Prieto, Javier D. Fernández, Rodrigo Cánovas. ACM Symposium on Applied Computing (SAC 2012). PhD Symposium
  • 17. Key concepts: Triples  Specific compression:  More efficient compression than just gzip.  Data indexing for consumption:  Allows direct patterns resolution without decompression (s,p,o), (s,?p,?o) and (s,p,?o) We plan to work on a specific technique which  optimizes space  provides efficient performance in primitive operations PhD Symposium
  • 18. Preliminary results in Triples Encoding We propose to use Bitmap indexes: [*] Compact Representation of Large RDF Data Sets for Publishing and Exchange. Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez. International Semantic Web Conference(ISWC 2010). PhD Symposium
  • 19. Methodology  RDF structure in theory and practice.  Binary RDF Specification.  Succinct Dictionaries.  Triples Indexes.  Practical deployment. Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 20. Some Results… HDT Acknowledged as W3C member submission: http://www.w3.org/Submission/2011/03/ supported by: PhD Symposium
  • 21. Some Results... HDT for exchanging PhD Symposium
  • 22. Some Results... HDT for consumption Direct Consumption, without decompression after exchanging  Example of use: HDT-it (Thanks to Mario Arias, DERI) Image:PhD Symposium jscreationzs / FreeDigitalPhotos.net
  • 23. On-going promising work: HDT-FoQ [*] Exchange and Consumption of Huge RDF Data. Miguel A. Martínez-Prieto, Mario Arias, Javier D. Fernández. Extended Semantic Web Conference(ESWC 2012). To appear PhD Symposium
  • 24. In conclusion Binary RDF aims to lightweight the Web of Data;  Logical decomposition: Header, Dictionary, and Triples  Clean publication  Compressed RDF format for exchanging  Machine-friendly, direct consumption  Rich Functional Dictionary/Triples representations for querying PhD Symposium
  • 25. Still much work on…  Getting a global understanding of the real structure of RDF networks.  Applying this knowledge in innovative dictionary and triples indexes.  full SPARQL at consumption  Supporting dynamic operations  inserting, deleting, and updating binary RDF PhD Symposium
  • 26. Thanks! HDT: http://www.rdfhdt.org/ Group: http://dataweb.infor.uva.es/ Slides: http://www.slideshare.net/javifer Javier D. Fernández (jfergar@infor.uva.es) Supervised by: Miguel A. Martínez-Prieto, Claudio Gutierrez University of Valladolid (Spain) University of Chile (Chile) PhD Symposium