SlideShare a Scribd company logo
Digital Enterprise Research Institute                                                           www.deri.ie




                       Exchange and Consumption of
                             Huge RDF Data
                            Miguel A. Martínez-Prieto1,2 <migumar2@infor.uva.es>
                                                Mario Arias1,3 <mario.arias@deri.org>
                                        Javier D. Fernández1,2 <jfergar@infor.uva.es>

              1. Department   of Computer Science, Universidad de Valladolid (Spain)
              2. Department of Computer Science, Universidad de Chile (Chile)
              3. Digital Enterprise Research Institute, National University of Ireland Galway




 Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Sharing RDF in the Web of Data.
Digital Enterprise Research Institute                                                              www.deri.ie




                                                                              Parsing / Indexing
                                                                              Reasoning
                                                                                             R
                                                   •    Dataset analysis.      I
                                                   •    Setup a SPARQL server. P
                                                   •    Vocabulary interlinking / integration.
                                                   •    Browsing and Visualization.
                        sensor                     •    Exchange between servers
                                                   •    Data-intensive tasks.

                                                       dereferenceable URIs


                                        RDF dump
                                                       SPARQL Endpoints/
                                                             APIs
Dataset Exchange Workflow
Digital Enterprise Research Institute                                           www.deri.ie




                 1º                             2º                   3º
              Publication                    Exchange            Consumption

                   Convert                     Transfer            Decompress

                                 If RDF is meant to be machine processable,
                  Serialize                                            Parse
                         Why are we using plain text serialization formats??

                Compress                                               Index
HDT: RDF Binary Format
Digital Enterprise Research Institute                                www.deri.ie




            Compact Data Structure for RDF.
            W3C Submission. http://www.w3.org/Submission/2011/03/
            Open Source C++/Java library.
HDT Focused on Querying
Digital Enterprise Research Institute                                         www.deri.ie




                                                                 FoQ
            Contribution of this paper:
                   A complementary Index to make the HDT fully queryable.
                   Analysis on how HDT reduces exchange and indexing time.
                   Evaluate in-memory query performance.
Dictionary
Digital Enterprise Research Institute                    www.deri.ie




        Mapping of strings to correlative IDs. {1..n}
        Lexicographically sorted, no duplicates.
        Section compression explained at [8]
Triples Model
Digital Enterprise Research Institute                                                         www.deri.ie


            Triples
                                        S         1                     2                 3
             126
             132
             213                        P[   2        3]   [   1        2      ] [4   ]   3

             224
             225                        O[   6   ][   2]   [   ][
                                                               3    4   ] [5   ] [1   ]   2
             241
             332
Adjacency Lists
Digital Enterprise Research Institute                                                                                      www.deri.ie



                                        1                                2                       3

                           [     2      ,       3]   [           ,
                                                                 1           ,2       ] [4           ]       3
                                 1              2            3           4        5              6


                                        Array            2           3       1    2          4           3
                                     Bitmap              1           0       1    0          0           1

            Operations:
              –      access(g) = Given a global position, get the value.                                         O(1)
              –      findList(g) = Given a global position, get the list number.                                 O(1)
                                                                                                                 O(log log n)
              –      first(l), last(l), = Given a list, find the first and last.
Triples Model and Coding
Digital Enterprise Research Institute                                                                         www.deri.ie


            Triples
                                        S             1                       2                       3
             126
             132
             213                        P      2          3       1           2               4       3

             224
             225                        O      6          2       3       4           5       1       2
             241                            Array Y           2   3   1           2       4       3
             332                        Bitmap Y              1   0   1           0       0       1

                                            Array Z           6   2   3           4       5       1       2
                                        Bitmap Z              1   1   1           1       0       1   1
Searching by Subject
Digital Enterprise Research Institute                                                                           www.deri.ie


            Triples
                                        S             1   ( 2, 2, ? )           2                       3
             126
             132
             213                        P      2          3       1             2               4       3

             224
             225                        O      6          2       3         4           5       1       2
             241                            Array Y           2   3     1           2       4       3
             332                        Bitmap Y              1   0     1           0       0       1

     SPO, SP?                               Array Z           6   2     3           4       5       1       2
     S??, S?O                           Bitmap Z              1   1     1           1       0       1   1
Searching by Predicate
Digital Enterprise Research Institute                                                                           www.deri.ie


            Triples
                                        S             1       ( ?, 2, ? )       2                       3
             126
             132
             213                        P      2          3         1           2               4       3

             224
             225                        O      6          2         3       4           5       1       2
             241                            Array Y           2    3    1           2       4       3
             332                        Bitmap Y              1    0    1           0       0       1

           ?P?                              Array Z           6    2    3           4       5       1       2
                                        Bitmap Z              1    1    1           1       0       1   1
Wavelet Tree
Digital Enterprise Research Institute                                                                              www.deri.ie




            Compact Sequence of Integers {0,σ}.
                                                                        rank(3, 7) = 2
                      2      3      6       3       6       1   2
                                                                1   3     6   2    5     2   4   1    4   2
                      1      2     3    4       5       6   7   8   9    10 11 12 13 14 15
                                                                          9                      16
                                                                                  select(6, 3) = 9

                   access(position) = Value at position.
                   rank(entry, position) = Number of appearances of                                          O(log σ)
                                                                                                              O(log σ)
                    “entry” up to “position”.                                                                 O(log σ)
                   select(entry, i) = Position where “entry” appears for the
                    i-th time.
Searching by Predicate w/ Wavelet
Digital Enterprise Research Institute                                                                           www.deri.ie


            Triples
                                        S             1       ( ?, 2, ? )       2                       3
             126
             132
             213                        P      2          3         1           2               4       3

             224
             225                        O      6          2         3       4           5       1       2
             241
                                        Wavelet Y             2    3    1           2       4       3
             332                        Bitmap Y              1    0    1           0       0       1

           ?P?                              Array Z           6    2    3           4       5       1       2
                                        Bitmap Z              1    1    1           1       0       1   1
Triples: Object-Search
Digital Enterprise Research Institute                                                                        www.deri.ie


            Triples
                                        S            1       ( ?, ?, 2 )        2                        3
             126
             132
             213                        P       2        3         1            2               4        3

             224
             225                        O       6        2         3        4        5          1        2
             241
             332

     ??O                OP-Index            [   6   ][   2        ][
                                                                  7     ]3[         ] [4    ] [5 ]       1

     ?PO                                        O1           O2        O3       O4         O5       O6
Data Structure Summary.
Digital Enterprise Research Institute                            www.deri.ie




            From HDT to HDT-FoQ:
                   Convert Array Y to Wavelet.
                   Generate OP-Index.


            Triple Patterns:

                         SPO, SP?, S??, S?O       Original HDT
                         ?P?                      Wavelet Tree
                         ?PO, ??O                 OP-Index
Evaluation Environment
Digital Enterprise Research Institute                                                  www.deri.ie




          Dataset           Triples      Size NTriples
          LinkedMDB         6,1M         850 Mb
          DBLP              73M          11,1 Gb
          Geonames          112M         12,3 Gb
                                                         Producer:       Consumer:
          DBPedia           258M         37,3 Gb
                                                         Xeon @ 2.4Ghz   Phenom-II @ 3.2Ghz
                         Datasets                        96GB RAM        8GB RAM



                                        Compressors:                     RDF Storage

                                        • GZIP                           • Virtuoso
                                        • LZMA                           • RDF-3x
                                                                         • Hexastore
Compression Ratio
Digital Enterprise Research Institute                                                                 www.deri.ie



            DBPedia



         Geonames

                                                                                      hdt

                                                                                      gz
                 DBLP
                                                                                      lzma

                                                                                      hdt.gz
        LinkedMDB
                                                                                      hdt.lzma


                            0       1   2    3    4    5    6    7    8    9   10   11      12   13    14
                                            Compression ratio (% against plain ntriples)
Publication Times
Digital Enterprise Research Institute                                                                                          www.deri.ie


                                         NT+GZIP         NT+LZMA          HDT             HDT+GZIP        HDT+LZMA
                          linkedMDB      11,3 sec        14,7 min         1,05 min        1,09 min        1,52 min
                          DBLP           2,72 min        103 min          12 min          13,5 min        21,9 min
                          Geonames       3,28 min        244 min          25 min          26,4 min        38,9 min
                          DBPedia        15,9 min        466 min          56 min          60 min          121 min



          dbpedia



       geonames



             dblp



       linkedMDB


                    0      5      10    15    20    25      30      35      40       45         50   55   60   65    70   75       80
                                                            Times slower than Ntriples+GZIP

                                                     gz     lzma    hdt   hdt.gz     hdt.lzma
Publication Times2
Digital Enterprise Research Institute                                                                                         www.deri.ie


                                            NT+GZIP        NT+LZMA             HDT            HDT+GZIP       HDT+LZMA
                          linkedMDB         11,3 sec       14,7 min            1,05 min       1,09 min       1,52 min
                          DBLP              2,72 min       103 min             12 min         13,5 min       21,9 min
                          Geonames          3,28 min       244 min             25 min         26,4 min       38,9 min
                          DBPedia           15,9 min       466 min             56 min         60 min         121 min



          dbpedia



       geonames



             dblp



       linkedMDB


                    0       1           2      3       4        5          6         7        8          9   10     11   12    13
                                                            Times slower than Ntriples + GZIP

                                                           gz       hdt   hdt.gz   hdt.lzma
Exchange & Decompression Time
Digital Enterprise Research Institute                                                                 www.deri.ie




            GZIP




            LZMA




       HDT+GZIP




      HDT+LZMA                                                                           Exchange
                                                                                         Decompress

                   0                    50   100               150                200   250              300
                                             Seconds (Geometric Mean of all datasets)



                                                       *Assuming a Network Bandwidth of 2MByte/s
Overall Client Time
Digital Enterprise Research Institute                                                                                                    www.deri.ie




     LZMA+Virtuoso




       GZ+Virtuoso



                                                                                                                                 Exchange
      LZMA+RDF3x
                                                                                                                                 Decompress
                                                                                                                                 Index

         GZ+RDF3x
                                                                                                     LZMA+RDF3x              HDT+LZMA
                                                                                linkedMDB                      2,1 min              9,21 sec
  HDT+LZMA+FOQ
                                                                                dblp                           27 min              2,02 min
                                                                                geonames                     49,2 min              3,04 min
   HDT+GZIP+FOQ                                                                 dbpedia                       159 min              14,3 min

                     0     200    400   600   800   1000   1200   1400   1600   1800   2000   2200   2400   2600   2800   3000   3200    3400   3600
                                                            Seconds (Geometric mean of all datasets)
In-memory Data Store.
Digital Enterprise Research Institute                                                               www.deri.ie




                                        Triples                   Index Size (Mb)
                                                  Virtuoso       Hexastore       RDF3x    HDT-FoQ
              LinkedMDB                    6,1M         518           6976         337          68
              DBLP                          46M        3982                  -     3252        850
              Geonames                     112M        9216                  -     6678       1435
              DBPedia                      258M              -               -    15802       5260



            Less size = more data in memory = less I/O access!
Query Performance, Triple Patterns
Digital Enterprise Research Institute                                                                 www.deri.ie



                                LinkedMDB                                     Geonames
                    16                                       16
                    15                                       15
                    14                                       14                            RDF-3x
                    13                                       13                            Virtuoso
                    12                                       12
                    11                                       11
 Times HDT Faster




                    10                                       10
                     9                                        9
                     8                                        8
                     7                                        7
                     6                                        6
                     5                                        5
                     4                                        4
                     3                                        3
                     2                                        2
                     1                                        1
                     0                                        0
                         SP?   S?O   S??   ?PO   ?P?   ??O        SP?   S?O    S??   ?PO      ?P?      ??O
Query Performance Two-way Joins
Digital Enterprise Research Institute                                                                                                 www.deri.ie


                                     LinkedMDB                                                          Geonames
                     3                                                           3

                                                                                                                                   RDF-3x
                                                                                                                                   Virtuoso
                    2.5                                                         2.5




                     2                                                           2
 Times HDT Faster




                    1.5                                                         1.5




                     1                                                           1




                    0.5                                                         0.5




                     0                                                           0
                          SSbig   SSsmall   SObig   SOsmall   OObig   OOsmall         SSbig   SSsmall    SObig   SOsmall   OObig      OOsmall
Conclusions
Digital Enterprise Research Institute                                        www.deri.ie




         Data is ready to be consumed 10-15x faster.
               Exchange time reduced.
               Indexing burden on server = Lightweight client processing.
         Competitive query performance.
               Very fast on triple patterns.
               Joins on the same scale of existing solutions.
         This is useful to you:
               If you need a fast, compact read-only in-memory RDF store.
               If you want to share self-queryable RDF dumps.
               If you need fast download & query.
         Addresses the volume issue of Big Data.
Future work.
Digital Enterprise Research Institute                 www.deri.ie


            Full SPARQL support.
                   UNION, Optional, Multiple Join.
                   Optimized query evaluation.
            Integration:
                   Jena, Any23…
            Diffussion.
                   Get more people to use it!
            Additional services on top of HDT.
                   SPARQL Endpoint.
                   Distributed Stream Processing.
                   Mobile Applications.
Thanks! http://www.rdf-hdt.org
Digital Enterprise Research Institute   www.deri.ie

More Related Content

What's hot

ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
Peter Haase
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
Ioan Toma
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Imply
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)
Thomas Francart
 
Mapping french open data actors on the web with common crawl
Mapping french open data actors on the web with common crawlMapping french open data actors on the web with common crawl
Mapping french open data actors on the web with common crawl
data publica
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
Mongo DB
Mongo DBMongo DB
Mongo DB
Edureka!
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
Chris Mungall
 
Vector Similarity Search & Indexing Methods
Vector Similarity Search & Indexing MethodsVector Similarity Search & Indexing Methods
Vector Similarity Search & Indexing Methods
Kate Shao
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
Mariano Rodriguez-Muro
 
SPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeSPARQL-DL - Theory & Practice
SPARQL-DL - Theory & Practice
Adriel Café
 
Weaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdfWeaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdf
ConnorShorten2
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
Elena Simperl
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
Walaa Eldin Moustafa
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
Fabien Gandon
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
Open Data Support
 
Python and MongoDB
Python and MongoDBPython and MongoDB
Python and MongoDB
Christiano Anderson
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
Databricks
 

What's hot (20)

ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)
 
Mapping french open data actors on the web with common crawl
Mapping french open data actors on the web with common crawlMapping french open data actors on the web with common crawl
Mapping french open data actors on the web with common crawl
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Vector Similarity Search & Indexing Methods
Vector Similarity Search & Indexing MethodsVector Similarity Search & Indexing Methods
Vector Similarity Search & Indexing Methods
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
SPARQL-DL - Theory & Practice
SPARQL-DL - Theory & PracticeSPARQL-DL - Theory & Practice
SPARQL-DL - Theory & Practice
 
Weaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdfWeaviate Air #3 - New in AI segment.pdf
Weaviate Air #3 - New in AI segment.pdf
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Python and MongoDB
Python and MongoDBPython and MongoDB
Python and MongoDB
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
 

Similar to Exchange and Consumption of Huge RDF Data

SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
net2-project
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
dhiguero
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
Databricks
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Hubert Fan Chiang
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
Richard Cyganiak
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
Will Du
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
Michael Hausenblas
 
Building DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in JapaneseBuilding DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in Japanese
National Institute of Informatics (NII)
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
Data Ninja API
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
Michael Hausenblas
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
Christopher Brown
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
Databricks
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
nvvrajesh
 
Capturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsCapturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance Workflows
Andre Freitas
 
Omitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalOmitola o rian_eswc_idts final
Omitola o rian_eswc_idts final
Tope Omitola
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
Alexandre Passant
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
On the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataOn the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open data
Anisa Rula
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
scorlosquet
 

Similar to Exchange and Consumption of Huge RDF Data (20)

SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
Building DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in JapaneseBuilding DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in Japanese
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
Capturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsCapturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance Workflows
 
Omitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalOmitola o rian_eswc_idts final
Omitola o rian_eswc_idts final
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
On the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataOn the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open data
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
 

Recently uploaded

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

Exchange and Consumption of Huge RDF Data

  • 1. Digital Enterprise Research Institute www.deri.ie Exchange and Consumption of Huge RDF Data Miguel A. Martínez-Prieto1,2 <migumar2@infor.uva.es> Mario Arias1,3 <mario.arias@deri.org> Javier D. Fernández1,2 <jfergar@infor.uva.es> 1. Department of Computer Science, Universidad de Valladolid (Spain) 2. Department of Computer Science, Universidad de Chile (Chile) 3. Digital Enterprise Research Institute, National University of Ireland Galway Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
  • 2. Sharing RDF in the Web of Data. Digital Enterprise Research Institute www.deri.ie Parsing / Indexing Reasoning R • Dataset analysis. I • Setup a SPARQL server. P • Vocabulary interlinking / integration. • Browsing and Visualization. sensor • Exchange between servers • Data-intensive tasks. dereferenceable URIs RDF dump SPARQL Endpoints/ APIs
  • 3. Dataset Exchange Workflow Digital Enterprise Research Institute www.deri.ie 1º 2º 3º Publication Exchange Consumption Convert Transfer Decompress If RDF is meant to be machine processable, Serialize Parse Why are we using plain text serialization formats?? Compress Index
  • 4. HDT: RDF Binary Format Digital Enterprise Research Institute www.deri.ie  Compact Data Structure for RDF.  W3C Submission. http://www.w3.org/Submission/2011/03/  Open Source C++/Java library.
  • 5. HDT Focused on Querying Digital Enterprise Research Institute www.deri.ie FoQ  Contribution of this paper:  A complementary Index to make the HDT fully queryable.  Analysis on how HDT reduces exchange and indexing time.  Evaluate in-memory query performance.
  • 6. Dictionary Digital Enterprise Research Institute www.deri.ie  Mapping of strings to correlative IDs. {1..n}  Lexicographically sorted, no duplicates.  Section compression explained at [8]
  • 7. Triples Model Digital Enterprise Research Institute www.deri.ie Triples S 1 2 3 126 132 213 P[ 2 3] [ 1 2 ] [4 ] 3 224 225 O[ 6 ][ 2] [ ][ 3 4 ] [5 ] [1 ] 2 241 332
  • 8. Adjacency Lists Digital Enterprise Research Institute www.deri.ie 1 2 3 [ 2 , 3] [ , 1 ,2 ] [4 ] 3 1 2 3 4 5 6 Array 2 3 1 2 4 3 Bitmap 1 0 1 0 0 1  Operations: – access(g) = Given a global position, get the value. O(1) – findList(g) = Given a global position, get the list number. O(1) O(log log n) – first(l), last(l), = Given a list, find the first and last.
  • 9. Triples Model and Coding Digital Enterprise Research Institute www.deri.ie Triples S 1 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Array Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 Array Z 6 2 3 4 5 1 2 Bitmap Z 1 1 1 1 0 1 1
  • 10. Searching by Subject Digital Enterprise Research Institute www.deri.ie Triples S 1 ( 2, 2, ? ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Array Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 SPO, SP? Array Z 6 2 3 4 5 1 2 S??, S?O Bitmap Z 1 1 1 1 0 1 1
  • 11. Searching by Predicate Digital Enterprise Research Institute www.deri.ie Triples S 1 ( ?, 2, ? ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Array Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 ?P? Array Z 6 2 3 4 5 1 2 Bitmap Z 1 1 1 1 0 1 1
  • 12. Wavelet Tree Digital Enterprise Research Institute www.deri.ie  Compact Sequence of Integers {0,σ}. rank(3, 7) = 2 2 3 6 3 6 1 2 1 3 6 2 5 2 4 1 4 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 9 16 select(6, 3) = 9  access(position) = Value at position.  rank(entry, position) = Number of appearances of O(log σ) O(log σ) “entry” up to “position”. O(log σ)  select(entry, i) = Position where “entry” appears for the i-th time.
  • 13. Searching by Predicate w/ Wavelet Digital Enterprise Research Institute www.deri.ie Triples S 1 ( ?, 2, ? ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Wavelet Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 ?P? Array Z 6 2 3 4 5 1 2 Bitmap Z 1 1 1 1 0 1 1
  • 14. Triples: Object-Search Digital Enterprise Research Institute www.deri.ie Triples S 1 ( ?, ?, 2 ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 332 ??O OP-Index [ 6 ][ 2 ][ 7 ]3[ ] [4 ] [5 ] 1 ?PO O1 O2 O3 O4 O5 O6
  • 15. Data Structure Summary. Digital Enterprise Research Institute www.deri.ie  From HDT to HDT-FoQ:  Convert Array Y to Wavelet.  Generate OP-Index.  Triple Patterns: SPO, SP?, S??, S?O Original HDT ?P? Wavelet Tree ?PO, ??O OP-Index
  • 16. Evaluation Environment Digital Enterprise Research Institute www.deri.ie Dataset Triples Size NTriples LinkedMDB 6,1M 850 Mb DBLP 73M 11,1 Gb Geonames 112M 12,3 Gb Producer: Consumer: DBPedia 258M 37,3 Gb Xeon @ 2.4Ghz Phenom-II @ 3.2Ghz Datasets 96GB RAM 8GB RAM Compressors: RDF Storage • GZIP • Virtuoso • LZMA • RDF-3x • Hexastore
  • 17. Compression Ratio Digital Enterprise Research Institute www.deri.ie DBPedia Geonames hdt gz DBLP lzma hdt.gz LinkedMDB hdt.lzma 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Compression ratio (% against plain ntriples)
  • 18. Publication Times Digital Enterprise Research Institute www.deri.ie NT+GZIP NT+LZMA HDT HDT+GZIP HDT+LZMA linkedMDB 11,3 sec 14,7 min 1,05 min 1,09 min 1,52 min DBLP 2,72 min 103 min 12 min 13,5 min 21,9 min Geonames 3,28 min 244 min 25 min 26,4 min 38,9 min DBPedia 15,9 min 466 min 56 min 60 min 121 min dbpedia geonames dblp linkedMDB 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Times slower than Ntriples+GZIP gz lzma hdt hdt.gz hdt.lzma
  • 19. Publication Times2 Digital Enterprise Research Institute www.deri.ie NT+GZIP NT+LZMA HDT HDT+GZIP HDT+LZMA linkedMDB 11,3 sec 14,7 min 1,05 min 1,09 min 1,52 min DBLP 2,72 min 103 min 12 min 13,5 min 21,9 min Geonames 3,28 min 244 min 25 min 26,4 min 38,9 min DBPedia 15,9 min 466 min 56 min 60 min 121 min dbpedia geonames dblp linkedMDB 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Times slower than Ntriples + GZIP gz hdt hdt.gz hdt.lzma
  • 20. Exchange & Decompression Time Digital Enterprise Research Institute www.deri.ie GZIP LZMA HDT+GZIP HDT+LZMA Exchange Decompress 0 50 100 150 200 250 300 Seconds (Geometric Mean of all datasets) *Assuming a Network Bandwidth of 2MByte/s
  • 21. Overall Client Time Digital Enterprise Research Institute www.deri.ie LZMA+Virtuoso GZ+Virtuoso Exchange LZMA+RDF3x Decompress Index GZ+RDF3x LZMA+RDF3x HDT+LZMA linkedMDB 2,1 min 9,21 sec HDT+LZMA+FOQ dblp 27 min 2,02 min geonames 49,2 min 3,04 min HDT+GZIP+FOQ dbpedia 159 min 14,3 min 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 Seconds (Geometric mean of all datasets)
  • 22. In-memory Data Store. Digital Enterprise Research Institute www.deri.ie Triples Index Size (Mb) Virtuoso Hexastore RDF3x HDT-FoQ LinkedMDB 6,1M 518 6976 337 68 DBLP 46M 3982 - 3252 850 Geonames 112M 9216 - 6678 1435 DBPedia 258M - - 15802 5260  Less size = more data in memory = less I/O access!
  • 23. Query Performance, Triple Patterns Digital Enterprise Research Institute www.deri.ie LinkedMDB Geonames 16 16 15 15 14 14 RDF-3x 13 13 Virtuoso 12 12 11 11 Times HDT Faster 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 SP? S?O S?? ?PO ?P? ??O SP? S?O S?? ?PO ?P? ??O
  • 24. Query Performance Two-way Joins Digital Enterprise Research Institute www.deri.ie LinkedMDB Geonames 3 3 RDF-3x Virtuoso 2.5 2.5 2 2 Times HDT Faster 1.5 1.5 1 1 0.5 0.5 0 0 SSbig SSsmall SObig SOsmall OObig OOsmall SSbig SSsmall SObig SOsmall OObig OOsmall
  • 25. Conclusions Digital Enterprise Research Institute www.deri.ie  Data is ready to be consumed 10-15x faster.  Exchange time reduced.  Indexing burden on server = Lightweight client processing.  Competitive query performance.  Very fast on triple patterns.  Joins on the same scale of existing solutions.  This is useful to you:  If you need a fast, compact read-only in-memory RDF store.  If you want to share self-queryable RDF dumps.  If you need fast download & query.  Addresses the volume issue of Big Data.
  • 26. Future work. Digital Enterprise Research Institute www.deri.ie  Full SPARQL support.  UNION, Optional, Multiple Join.  Optimized query evaluation.  Integration:  Jena, Any23…  Diffussion.  Get more people to use it!  Additional services on top of HDT.  SPARQL Endpoint.  Distributed Stream Processing.  Mobile Applications.
  • 27. Thanks! http://www.rdf-hdt.org Digital Enterprise Research Institute www.deri.ie

Editor's Notes

  1. Importance of exchange. The Web is for exchanging data. Data flows between nodes. We are in the “Big Data era” We need fast speed, from the network to the application layers.Role of providers / Consumers.Consumption =~ QueryingHow data is shared:Dereferenceable URIs.SPARQL Endpoints.Big datasets: RDF dump. ( Similar to XML, PDF ).Examples where RDF dumps are important: - Setup a mirror. - Overloaded SPARQL Server. - Data analysis. - Vocabulary integration. - Download instead of crawl. - Visualization.Opens new applications. - Processing intensive. - Cooperating applications.
  2. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  3. CPUs are fast, memory/bandwidth are precious.Variable-length.Compression.Compact In-memory representations.
  4. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  5. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  6. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  7. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  8. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  9. DatasetsServersData stores.CompilerCompressors.GZIPLZMA
  10. From NTRIPLES to XXXFrom a data store could be faster (Already sorted).
  11. Includes dictionary!!!Great for mobile.