SlideShare a Scribd company logo
Digital Enterprise Research Institute                                                           www.deri.ie




                       Exchange and Consumption of
                             Huge RDF Data
                            Miguel A. Martínez-Prieto1,2 <migumar2@infor.uva.es>
                                                Mario Arias1,3 <mario.arias@deri.org>
                                        Javier D. Fernández1,2 <jfergar@infor.uva.es>

              1. Department   of Computer Science, Universidad de Valladolid (Spain)
              2. Department of Computer Science, Universidad de Chile (Chile)
              3. Digital Enterprise Research Institute, National University of Ireland Galway




 Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Sharing RDF in the Web of Data.
Digital Enterprise Research Institute                                                              www.deri.ie




                                                                              Parsing / Indexing
                                                                              Reasoning
                                                                                             R
                                                   •    Dataset analysis.      I
                                                   •    Setup a SPARQL server. P
                                                   •    Vocabulary interlinking / integration.
                                                   •    Browsing and Visualization.
                        sensor                     •    Exchange between servers
                                                   •    Data-intensive tasks.

                                                       dereferenceable URIs


                                        RDF dump
                                                       SPARQL Endpoints/
                                                             APIs
Dataset Exchange Workflow
Digital Enterprise Research Institute                                           www.deri.ie




                 1º                             2º                   3º
              Publication                    Exchange            Consumption

                   Convert                     Transfer            Decompress

                                 If RDF is meant to be machine processable,
                  Serialize                                            Parse
                         Why are we using plain text serialization formats??

                Compress                                               Index
HDT: RDF Binary Format
Digital Enterprise Research Institute                                www.deri.ie




            Compact Data Structure for RDF.
            W3C Submission. http://www.w3.org/Submission/2011/03/
            Open Source C++/Java library.
HDT Focused on Querying
Digital Enterprise Research Institute                                         www.deri.ie




                                                                 FoQ
            Contribution of this paper:
                   A complementary Index to make the HDT fully queryable.
                   Analysis on how HDT reduces exchange and indexing time.
                   Evaluate in-memory query performance.
Dictionary
Digital Enterprise Research Institute                    www.deri.ie




        Mapping of strings to correlative IDs. {1..n}
        Lexicographically sorted, no duplicates.
        Section compression explained at [8]
Triples Model
Digital Enterprise Research Institute                                                         www.deri.ie


            Triples
                                        S         1                     2                 3
             126
             132
             213                        P[   2        3]   [   1        2      ] [4   ]   3

             224
             225                        O[   6   ][   2]   [   ][
                                                               3    4   ] [5   ] [1   ]   2
             241
             332
Adjacency Lists
Digital Enterprise Research Institute                                                                                      www.deri.ie



                                        1                                2                       3

                           [     2      ,       3]   [           ,
                                                                 1           ,2       ] [4           ]       3
                                 1              2            3           4        5              6


                                        Array            2           3       1    2          4           3
                                     Bitmap              1           0       1    0          0           1

            Operations:
              –      access(g) = Given a global position, get the value.                                         O(1)
              –      findList(g) = Given a global position, get the list number.                                 O(1)
                                                                                                                 O(log log n)
              –      first(l), last(l), = Given a list, find the first and last.
Triples Model and Coding
Digital Enterprise Research Institute                                                                         www.deri.ie


            Triples
                                        S             1                       2                       3
             126
             132
             213                        P      2          3       1           2               4       3

             224
             225                        O      6          2       3       4           5       1       2
             241                            Array Y           2   3   1           2       4       3
             332                        Bitmap Y              1   0   1           0       0       1

                                            Array Z           6   2   3           4       5       1       2
                                        Bitmap Z              1   1   1           1       0       1   1
Searching by Subject
Digital Enterprise Research Institute                                                                           www.deri.ie


            Triples
                                        S             1   ( 2, 2, ? )           2                       3
             126
             132
             213                        P      2          3       1             2               4       3

             224
             225                        O      6          2       3         4           5       1       2
             241                            Array Y           2   3     1           2       4       3
             332                        Bitmap Y              1   0     1           0       0       1

     SPO, SP?                               Array Z           6   2     3           4       5       1       2
     S??, S?O                           Bitmap Z              1   1     1           1       0       1   1
Searching by Predicate
Digital Enterprise Research Institute                                                                           www.deri.ie


            Triples
                                        S             1       ( ?, 2, ? )       2                       3
             126
             132
             213                        P      2          3         1           2               4       3

             224
             225                        O      6          2         3       4           5       1       2
             241                            Array Y           2    3    1           2       4       3
             332                        Bitmap Y              1    0    1           0       0       1

           ?P?                              Array Z           6    2    3           4       5       1       2
                                        Bitmap Z              1    1    1           1       0       1   1
Wavelet Tree
Digital Enterprise Research Institute                                                                              www.deri.ie




            Compact Sequence of Integers {0,σ}.
                                                                        rank(3, 7) = 2
                      2      3      6       3       6       1   2
                                                                1   3     6   2    5     2   4   1    4   2
                      1      2     3    4       5       6   7   8   9    10 11 12 13 14 15
                                                                          9                      16
                                                                                  select(6, 3) = 9

                   access(position) = Value at position.
                   rank(entry, position) = Number of appearances of                                          O(log σ)
                                                                                                              O(log σ)
                    “entry” up to “position”.                                                                 O(log σ)
                   select(entry, i) = Position where “entry” appears for the
                    i-th time.
Searching by Predicate w/ Wavelet
Digital Enterprise Research Institute                                                                           www.deri.ie


            Triples
                                        S             1       ( ?, 2, ? )       2                       3
             126
             132
             213                        P      2          3         1           2               4       3

             224
             225                        O      6          2         3       4           5       1       2
             241
                                        Wavelet Y             2    3    1           2       4       3
             332                        Bitmap Y              1    0    1           0       0       1

           ?P?                              Array Z           6    2    3           4       5       1       2
                                        Bitmap Z              1    1    1           1       0       1   1
Triples: Object-Search
Digital Enterprise Research Institute                                                                        www.deri.ie


            Triples
                                        S            1       ( ?, ?, 2 )        2                        3
             126
             132
             213                        P       2        3         1            2               4        3

             224
             225                        O       6        2         3        4        5          1        2
             241
             332

     ??O                OP-Index            [   6   ][   2        ][
                                                                  7     ]3[         ] [4    ] [5 ]       1

     ?PO                                        O1           O2        O3       O4         O5       O6
Data Structure Summary.
Digital Enterprise Research Institute                            www.deri.ie




            From HDT to HDT-FoQ:
                   Convert Array Y to Wavelet.
                   Generate OP-Index.


            Triple Patterns:

                         SPO, SP?, S??, S?O       Original HDT
                         ?P?                      Wavelet Tree
                         ?PO, ??O                 OP-Index
Evaluation Environment
Digital Enterprise Research Institute                                                  www.deri.ie




          Dataset           Triples      Size NTriples
          LinkedMDB         6,1M         850 Mb
          DBLP              73M          11,1 Gb
          Geonames          112M         12,3 Gb
                                                         Producer:       Consumer:
          DBPedia           258M         37,3 Gb
                                                         Xeon @ 2.4Ghz   Phenom-II @ 3.2Ghz
                         Datasets                        96GB RAM        8GB RAM



                                        Compressors:                     RDF Storage

                                        • GZIP                           • Virtuoso
                                        • LZMA                           • RDF-3x
                                                                         • Hexastore
Compression Ratio
Digital Enterprise Research Institute                                                                 www.deri.ie



            DBPedia



         Geonames

                                                                                      hdt

                                                                                      gz
                 DBLP
                                                                                      lzma

                                                                                      hdt.gz
        LinkedMDB
                                                                                      hdt.lzma


                            0       1   2    3    4    5    6    7    8    9   10   11      12   13    14
                                            Compression ratio (% against plain ntriples)
Publication Times
Digital Enterprise Research Institute                                                                                          www.deri.ie


                                         NT+GZIP         NT+LZMA          HDT             HDT+GZIP        HDT+LZMA
                          linkedMDB      11,3 sec        14,7 min         1,05 min        1,09 min        1,52 min
                          DBLP           2,72 min        103 min          12 min          13,5 min        21,9 min
                          Geonames       3,28 min        244 min          25 min          26,4 min        38,9 min
                          DBPedia        15,9 min        466 min          56 min          60 min          121 min



          dbpedia



       geonames



             dblp



       linkedMDB


                    0      5      10    15    20    25      30      35      40       45         50   55   60   65    70   75       80
                                                            Times slower than Ntriples+GZIP

                                                     gz     lzma    hdt   hdt.gz     hdt.lzma
Publication Times2
Digital Enterprise Research Institute                                                                                         www.deri.ie


                                            NT+GZIP        NT+LZMA             HDT            HDT+GZIP       HDT+LZMA
                          linkedMDB         11,3 sec       14,7 min            1,05 min       1,09 min       1,52 min
                          DBLP              2,72 min       103 min             12 min         13,5 min       21,9 min
                          Geonames          3,28 min       244 min             25 min         26,4 min       38,9 min
                          DBPedia           15,9 min       466 min             56 min         60 min         121 min



          dbpedia



       geonames



             dblp



       linkedMDB


                    0       1           2      3       4        5          6         7        8          9   10     11   12    13
                                                            Times slower than Ntriples + GZIP

                                                           gz       hdt   hdt.gz   hdt.lzma
Exchange & Decompression Time
Digital Enterprise Research Institute                                                                 www.deri.ie




            GZIP




            LZMA




       HDT+GZIP




      HDT+LZMA                                                                           Exchange
                                                                                         Decompress

                   0                    50   100               150                200   250              300
                                             Seconds (Geometric Mean of all datasets)



                                                       *Assuming a Network Bandwidth of 2MByte/s
Overall Client Time
Digital Enterprise Research Institute                                                                                                    www.deri.ie




     LZMA+Virtuoso




       GZ+Virtuoso



                                                                                                                                 Exchange
      LZMA+RDF3x
                                                                                                                                 Decompress
                                                                                                                                 Index

         GZ+RDF3x
                                                                                                     LZMA+RDF3x              HDT+LZMA
                                                                                linkedMDB                      2,1 min              9,21 sec
  HDT+LZMA+FOQ
                                                                                dblp                           27 min              2,02 min
                                                                                geonames                     49,2 min              3,04 min
   HDT+GZIP+FOQ                                                                 dbpedia                       159 min              14,3 min

                     0     200    400   600   800   1000   1200   1400   1600   1800   2000   2200   2400   2600   2800   3000   3200    3400   3600
                                                            Seconds (Geometric mean of all datasets)
In-memory Data Store.
Digital Enterprise Research Institute                                                               www.deri.ie




                                        Triples                   Index Size (Mb)
                                                  Virtuoso       Hexastore       RDF3x    HDT-FoQ
              LinkedMDB                    6,1M         518           6976         337          68
              DBLP                          46M        3982                  -     3252        850
              Geonames                     112M        9216                  -     6678       1435
              DBPedia                      258M              -               -    15802       5260



            Less size = more data in memory = less I/O access!
Query Performance, Triple Patterns
Digital Enterprise Research Institute                                                                 www.deri.ie



                                LinkedMDB                                     Geonames
                    16                                       16
                    15                                       15
                    14                                       14                            RDF-3x
                    13                                       13                            Virtuoso
                    12                                       12
                    11                                       11
 Times HDT Faster




                    10                                       10
                     9                                        9
                     8                                        8
                     7                                        7
                     6                                        6
                     5                                        5
                     4                                        4
                     3                                        3
                     2                                        2
                     1                                        1
                     0                                        0
                         SP?   S?O   S??   ?PO   ?P?   ??O        SP?   S?O    S??   ?PO      ?P?      ??O
Query Performance Two-way Joins
Digital Enterprise Research Institute                                                                                                 www.deri.ie


                                     LinkedMDB                                                          Geonames
                     3                                                           3

                                                                                                                                   RDF-3x
                                                                                                                                   Virtuoso
                    2.5                                                         2.5




                     2                                                           2
 Times HDT Faster




                    1.5                                                         1.5




                     1                                                           1




                    0.5                                                         0.5




                     0                                                           0
                          SSbig   SSsmall   SObig   SOsmall   OObig   OOsmall         SSbig   SSsmall    SObig   SOsmall   OObig      OOsmall
Conclusions
Digital Enterprise Research Institute                                        www.deri.ie




         Data is ready to be consumed 10-15x faster.
               Exchange time reduced.
               Indexing burden on server = Lightweight client processing.
         Competitive query performance.
               Very fast on triple patterns.
               Joins on the same scale of existing solutions.
         This is useful to you:
               If you need a fast, compact read-only in-memory RDF store.
               If you want to share self-queryable RDF dumps.
               If you need fast download & query.
         Addresses the volume issue of Big Data.
Future work.
Digital Enterprise Research Institute                 www.deri.ie


            Full SPARQL support.
                   UNION, Optional, Multiple Join.
                   Optimized query evaluation.
            Integration:
                   Jena, Any23…
            Diffussion.
                   Get more people to use it!
            Additional services on top of HDT.
                   SPARQL Endpoint.
                   Distributed Stream Processing.
                   Mobile Applications.
Thanks! http://www.rdf-hdt.org
Digital Enterprise Research Institute   www.deri.ie

More Related Content

What's hot

From SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom OntologiesFrom SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom Ontologies
Semantic Web Company
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
LeeFeigenbaum
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
DATAVERSITY
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
Neo4j
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
Jose Emilio Labra Gayo
 
Building and using ontologies
Building and using ontologies Building and using ontologies
Building and using ontologies
Elena Simperl
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
Richard Cyganiak
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQL
Olaf Hartig
 
Ontology In A Nutshell (version 2)
Ontology In A Nutshell (version 2)Ontology In A Nutshell (version 2)
Ontology In A Nutshell (version 2)
Fabien Gandon
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
Nilesh Wagmare
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Functional requirements for bibliographic records & functional requirements f...
Functional requirements for bibliographic records & functional requirements f...Functional requirements for bibliographic records & functional requirements f...
Functional requirements for bibliographic records & functional requirements f...
UDAYA VARADARAJAN
 
SPARQL - Basic and Federated Queries
SPARQL - Basic and Federated QueriesSPARQL - Basic and Federated Queries
SPARQL - Basic and Federated Queries
Knud Möller
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
Neo4j
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Ontotext
 
MicroStrategy Design Challenges - Tips and Best Practices
MicroStrategy Design Challenges - Tips and Best PracticesMicroStrategy Design Challenges - Tips and Best Practices
MicroStrategy Design Challenges - Tips and Best Practices
BiBoard.Org
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
Jeff Z. Pan
 
Will Lyon- Entity Resolution
Will Lyon- Entity ResolutionWill Lyon- Entity Resolution
Will Lyon- Entity Resolution
Neo4j
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
Jose Emilio Labra Gayo
 

What's hot (20)

From SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom OntologiesFrom SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom Ontologies
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
Building and using ontologies
Building and using ontologies Building and using ontologies
Building and using ontologies
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
 
An Introduction to SPARQL
An Introduction to SPARQLAn Introduction to SPARQL
An Introduction to SPARQL
 
Ontology In A Nutshell (version 2)
Ontology In A Nutshell (version 2)Ontology In A Nutshell (version 2)
Ontology In A Nutshell (version 2)
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Functional requirements for bibliographic records & functional requirements f...
Functional requirements for bibliographic records & functional requirements f...Functional requirements for bibliographic records & functional requirements f...
Functional requirements for bibliographic records & functional requirements f...
 
SPARQL - Basic and Federated Queries
SPARQL - Basic and Federated QueriesSPARQL - Basic and Federated Queries
SPARQL - Basic and Federated Queries
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
MicroStrategy Design Challenges - Tips and Best Practices
MicroStrategy Design Challenges - Tips and Best PracticesMicroStrategy Design Challenges - Tips and Best Practices
MicroStrategy Design Challenges - Tips and Best Practices
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Will Lyon- Entity Resolution
Will Lyon- Entity ResolutionWill Lyon- Entity Resolution
Will Lyon- Entity Resolution
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 

Similar to Exchange and Consumption of Huge RDF Data

SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
net2-project
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
dhiguero
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
Databricks
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Hubert Fan Chiang
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
Will Du
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
Michael Hausenblas
 
Building DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in JapaneseBuilding DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in Japanese
National Institute of Informatics (NII)
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
Data Ninja API
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
Michael Hausenblas
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
Christopher Brown
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
Databricks
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
nvvrajesh
 
Capturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsCapturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance Workflows
Andre Freitas
 
Omitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalOmitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalTope Omitola
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
Alexandre Passant
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
On the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataOn the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataAnisa Rula
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
scorlosquet
 

Similar to Exchange and Consumption of Huge RDF Data (20)

SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
Building DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in JapaneseBuilding DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in Japanese
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
Capturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsCapturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance Workflows
 
Omitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalOmitola o rian_eswc_idts final
Omitola o rian_eswc_idts final
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
On the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataOn the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open data
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
 

Recently uploaded

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 

Exchange and Consumption of Huge RDF Data

  • 1. Digital Enterprise Research Institute www.deri.ie Exchange and Consumption of Huge RDF Data Miguel A. Martínez-Prieto1,2 <migumar2@infor.uva.es> Mario Arias1,3 <mario.arias@deri.org> Javier D. Fernández1,2 <jfergar@infor.uva.es> 1. Department of Computer Science, Universidad de Valladolid (Spain) 2. Department of Computer Science, Universidad de Chile (Chile) 3. Digital Enterprise Research Institute, National University of Ireland Galway Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
  • 2. Sharing RDF in the Web of Data. Digital Enterprise Research Institute www.deri.ie Parsing / Indexing Reasoning R • Dataset analysis. I • Setup a SPARQL server. P • Vocabulary interlinking / integration. • Browsing and Visualization. sensor • Exchange between servers • Data-intensive tasks. dereferenceable URIs RDF dump SPARQL Endpoints/ APIs
  • 3. Dataset Exchange Workflow Digital Enterprise Research Institute www.deri.ie 1º 2º 3º Publication Exchange Consumption Convert Transfer Decompress If RDF is meant to be machine processable, Serialize Parse Why are we using plain text serialization formats?? Compress Index
  • 4. HDT: RDF Binary Format Digital Enterprise Research Institute www.deri.ie  Compact Data Structure for RDF.  W3C Submission. http://www.w3.org/Submission/2011/03/  Open Source C++/Java library.
  • 5. HDT Focused on Querying Digital Enterprise Research Institute www.deri.ie FoQ  Contribution of this paper:  A complementary Index to make the HDT fully queryable.  Analysis on how HDT reduces exchange and indexing time.  Evaluate in-memory query performance.
  • 6. Dictionary Digital Enterprise Research Institute www.deri.ie  Mapping of strings to correlative IDs. {1..n}  Lexicographically sorted, no duplicates.  Section compression explained at [8]
  • 7. Triples Model Digital Enterprise Research Institute www.deri.ie Triples S 1 2 3 126 132 213 P[ 2 3] [ 1 2 ] [4 ] 3 224 225 O[ 6 ][ 2] [ ][ 3 4 ] [5 ] [1 ] 2 241 332
  • 8. Adjacency Lists Digital Enterprise Research Institute www.deri.ie 1 2 3 [ 2 , 3] [ , 1 ,2 ] [4 ] 3 1 2 3 4 5 6 Array 2 3 1 2 4 3 Bitmap 1 0 1 0 0 1  Operations: – access(g) = Given a global position, get the value. O(1) – findList(g) = Given a global position, get the list number. O(1) O(log log n) – first(l), last(l), = Given a list, find the first and last.
  • 9. Triples Model and Coding Digital Enterprise Research Institute www.deri.ie Triples S 1 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Array Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 Array Z 6 2 3 4 5 1 2 Bitmap Z 1 1 1 1 0 1 1
  • 10. Searching by Subject Digital Enterprise Research Institute www.deri.ie Triples S 1 ( 2, 2, ? ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Array Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 SPO, SP? Array Z 6 2 3 4 5 1 2 S??, S?O Bitmap Z 1 1 1 1 0 1 1
  • 11. Searching by Predicate Digital Enterprise Research Institute www.deri.ie Triples S 1 ( ?, 2, ? ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Array Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 ?P? Array Z 6 2 3 4 5 1 2 Bitmap Z 1 1 1 1 0 1 1
  • 12. Wavelet Tree Digital Enterprise Research Institute www.deri.ie  Compact Sequence of Integers {0,σ}. rank(3, 7) = 2 2 3 6 3 6 1 2 1 3 6 2 5 2 4 1 4 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 9 16 select(6, 3) = 9  access(position) = Value at position.  rank(entry, position) = Number of appearances of O(log σ) O(log σ) “entry” up to “position”. O(log σ)  select(entry, i) = Position where “entry” appears for the i-th time.
  • 13. Searching by Predicate w/ Wavelet Digital Enterprise Research Institute www.deri.ie Triples S 1 ( ?, 2, ? ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Wavelet Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 ?P? Array Z 6 2 3 4 5 1 2 Bitmap Z 1 1 1 1 0 1 1
  • 14. Triples: Object-Search Digital Enterprise Research Institute www.deri.ie Triples S 1 ( ?, ?, 2 ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 332 ??O OP-Index [ 6 ][ 2 ][ 7 ]3[ ] [4 ] [5 ] 1 ?PO O1 O2 O3 O4 O5 O6
  • 15. Data Structure Summary. Digital Enterprise Research Institute www.deri.ie  From HDT to HDT-FoQ:  Convert Array Y to Wavelet.  Generate OP-Index.  Triple Patterns: SPO, SP?, S??, S?O Original HDT ?P? Wavelet Tree ?PO, ??O OP-Index
  • 16. Evaluation Environment Digital Enterprise Research Institute www.deri.ie Dataset Triples Size NTriples LinkedMDB 6,1M 850 Mb DBLP 73M 11,1 Gb Geonames 112M 12,3 Gb Producer: Consumer: DBPedia 258M 37,3 Gb Xeon @ 2.4Ghz Phenom-II @ 3.2Ghz Datasets 96GB RAM 8GB RAM Compressors: RDF Storage • GZIP • Virtuoso • LZMA • RDF-3x • Hexastore
  • 17. Compression Ratio Digital Enterprise Research Institute www.deri.ie DBPedia Geonames hdt gz DBLP lzma hdt.gz LinkedMDB hdt.lzma 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Compression ratio (% against plain ntriples)
  • 18. Publication Times Digital Enterprise Research Institute www.deri.ie NT+GZIP NT+LZMA HDT HDT+GZIP HDT+LZMA linkedMDB 11,3 sec 14,7 min 1,05 min 1,09 min 1,52 min DBLP 2,72 min 103 min 12 min 13,5 min 21,9 min Geonames 3,28 min 244 min 25 min 26,4 min 38,9 min DBPedia 15,9 min 466 min 56 min 60 min 121 min dbpedia geonames dblp linkedMDB 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Times slower than Ntriples+GZIP gz lzma hdt hdt.gz hdt.lzma
  • 19. Publication Times2 Digital Enterprise Research Institute www.deri.ie NT+GZIP NT+LZMA HDT HDT+GZIP HDT+LZMA linkedMDB 11,3 sec 14,7 min 1,05 min 1,09 min 1,52 min DBLP 2,72 min 103 min 12 min 13,5 min 21,9 min Geonames 3,28 min 244 min 25 min 26,4 min 38,9 min DBPedia 15,9 min 466 min 56 min 60 min 121 min dbpedia geonames dblp linkedMDB 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Times slower than Ntriples + GZIP gz hdt hdt.gz hdt.lzma
  • 20. Exchange & Decompression Time Digital Enterprise Research Institute www.deri.ie GZIP LZMA HDT+GZIP HDT+LZMA Exchange Decompress 0 50 100 150 200 250 300 Seconds (Geometric Mean of all datasets) *Assuming a Network Bandwidth of 2MByte/s
  • 21. Overall Client Time Digital Enterprise Research Institute www.deri.ie LZMA+Virtuoso GZ+Virtuoso Exchange LZMA+RDF3x Decompress Index GZ+RDF3x LZMA+RDF3x HDT+LZMA linkedMDB 2,1 min 9,21 sec HDT+LZMA+FOQ dblp 27 min 2,02 min geonames 49,2 min 3,04 min HDT+GZIP+FOQ dbpedia 159 min 14,3 min 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 Seconds (Geometric mean of all datasets)
  • 22. In-memory Data Store. Digital Enterprise Research Institute www.deri.ie Triples Index Size (Mb) Virtuoso Hexastore RDF3x HDT-FoQ LinkedMDB 6,1M 518 6976 337 68 DBLP 46M 3982 - 3252 850 Geonames 112M 9216 - 6678 1435 DBPedia 258M - - 15802 5260  Less size = more data in memory = less I/O access!
  • 23. Query Performance, Triple Patterns Digital Enterprise Research Institute www.deri.ie LinkedMDB Geonames 16 16 15 15 14 14 RDF-3x 13 13 Virtuoso 12 12 11 11 Times HDT Faster 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 SP? S?O S?? ?PO ?P? ??O SP? S?O S?? ?PO ?P? ??O
  • 24. Query Performance Two-way Joins Digital Enterprise Research Institute www.deri.ie LinkedMDB Geonames 3 3 RDF-3x Virtuoso 2.5 2.5 2 2 Times HDT Faster 1.5 1.5 1 1 0.5 0.5 0 0 SSbig SSsmall SObig SOsmall OObig OOsmall SSbig SSsmall SObig SOsmall OObig OOsmall
  • 25. Conclusions Digital Enterprise Research Institute www.deri.ie  Data is ready to be consumed 10-15x faster.  Exchange time reduced.  Indexing burden on server = Lightweight client processing.  Competitive query performance.  Very fast on triple patterns.  Joins on the same scale of existing solutions.  This is useful to you:  If you need a fast, compact read-only in-memory RDF store.  If you want to share self-queryable RDF dumps.  If you need fast download & query.  Addresses the volume issue of Big Data.
  • 26. Future work. Digital Enterprise Research Institute www.deri.ie  Full SPARQL support.  UNION, Optional, Multiple Join.  Optimized query evaluation.  Integration:  Jena, Any23…  Diffussion.  Get more people to use it!  Additional services on top of HDT.  SPARQL Endpoint.  Distributed Stream Processing.  Mobile Applications.
  • 27. Thanks! http://www.rdf-hdt.org Digital Enterprise Research Institute www.deri.ie

Editor's Notes

  1. Importance of exchange. The Web is for exchanging data. Data flows between nodes. We are in the “Big Data era” We need fast speed, from the network to the application layers.Role of providers / Consumers.Consumption =~ QueryingHow data is shared:Dereferenceable URIs.SPARQL Endpoints.Big datasets: RDF dump. ( Similar to XML, PDF ).Examples where RDF dumps are important: - Setup a mirror. - Overloaded SPARQL Server. - Data analysis. - Vocabulary integration. - Download instead of crawl. - Visualization.Opens new applications. - Processing intensive. - Cooperating applications.
  2. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  3. CPUs are fast, memory/bandwidth are precious.Variable-length.Compression.Compact In-memory representations.
  4. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  5. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  6. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  7. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  8. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  9. DatasetsServersData stores.CompilerCompressors.GZIPLZMA
  10. From NTRIPLES to XXXFrom a data store could be faster (Already sorted).
  11. Includes dictionary!!!Great for mobile.