SlideShare a Scribd company logo
SIB . 23.03.2011 . Page 1                         http://lod2.eu




WP2
Storing and Querying
Very Large Knowledge Bases
                             Vienna Update
                             March 2012 – M18

                             Peter Boncz


                                                http://lod2.eu
SIB . 23.03.2011 . Page 2                                             http://lod2.eu




 Table of Contents

 • WP2 Refresher
 • LOD Cloud Hosted on the Knowledge Store Cluster
    * 50B mark reached, column-store Virtuoso deployed
 • State of the Art LOD Laboratory (“Benchmarking”)
    * LDBC – RDF Store Industry council
    * BSBM at large scale
    * RDF-H + Social Intelligence Benchmark (SIB)
 • Technical work
    * column-store Virtuoso  cluster version
    * recycling query results
 • Next up
   * LOD cloud @250B triples
    * Virtuoso: adaptive query optimizer (and more)
    * first MonetDB/SPARQL version (RDF clustering, graph indexing)
LOD2 Title . 02.09.2010 . Page 3                          http://lod2.eu




 WP2 Organization

 CWI (MonetDB):
 • Peter Boncz (also in VUA group of Frank v Harmelen)
 • Duc Pham Minh (Phd student)
 • Irini Fundulaki (1-year sabbatical from FORTH)

 OpenLink (Virtuoso):
 • Orri Erling
 • Hugh Williams
 • Ivan Mikhailov

 + FU Berlin (BSBM)
 + DERI (BSBM text+ LOD cloud + text retrieval/sindice)
 + ULEI (DBpedia benchmark)
SIB . 23.03.2011 . Page 4                              http://lod2.eu


      WP2
      Storing and Querying Very Large Knowledge Bases

Goal: enabling large-scale, feature-rich & enterprise-ready Linked
  Data management solutions

Database Partners in LOD2:
CWI: Leading open source analytics RDBMS
OpenLink: Leading Linked data deployment platform

Technological Excellence:
Creating and publishing metrics for choosing RDF solutions
Bringing Column Store Technology for Business Intelligence on RDF
Ground-breaking database innovations for RDF stores
   (Dynamic Query optimization, Adaptive Caching of Joins,
   Optimized Graph Processing, Cluster/Cloud scalability)
LOD2 Title . 02.09.2010 . Page 5                   http://lod2.eu




 Task 2.1: State of the Art, Evaluation & Benchmarking

 LOD cloud cache scalability
 • M0: 20B triples
 • M12: 50B triples
 • M24: 250B triples
 • M36: 1T triples

 D2.4 completed: 50B triples in LOD cache @ DERI
 First deployment of Virtuoso7 Cluster
 • Currently hosting about 55 billion triples
 • 8 node Virtuoso v7 (column store) Cluster
 • 384GB RAM
 • 2TB Disk Storage
 • 14B/quads, excl literals

 Next up:
 • hardware provisioning for 250B and 1T triples
  (need 512GB RAM resp. 2TB RAM somewhere)
LOD2 Title . 02.09.2010 . Page 6                         http://lod2.eu




 Task 2.1: State of the Art, Evaluation & Benchmarking

 Benchmarking

 • creating new benchmarks
      • BSBM-BI (FU Berlin)
      • DBpedia Benchmark (ULEI) – best paper award
      • RDF-H (OGL,CWI)
      • Social Intelligence Benchmark (OGL,CWI)
 • running benchmark evaluations
      • BSBM on a large cluster cluster (Lisa @ SARA)
      • BSBM on large single-server (40cores, 1TB RAM)
 • creating industry consensus
      • Benchmark Auditing Service
      • LOD Benchmark Council
LOD2 Title . 02.09.2010 . Page 7                               http://lod2.eu




 BSBM Large Scale Experiments (still ongoing..)

 New Aspects:
 • The Business Intelligence Use Case (BI)
 • Benchmark Rules
 • BSBM V3 Results
 • trying cluster versions

 SARA LISA cluster
 • experiments with up to 64 nodes

 VectorWise high-end server
 • 40-core machine with 1TB RAM

 Benchmarked at SARA and Vectorwise
 4store 1.1.2      Garlik       http://4store.org/
 BigData r4169     SYSTAP LLC   http://www.systap.com/bigdata.htm
 BigOwlim 3.4.3129 OntoText     http://www.ontotext.com/owlim/
 Jena TDB 0.8.9    openjena.org http://www.openjena.org/TDB/
 Fuseki 0.1.0      openjena.org http://openjena.org/wiki/Fuseki
 Virtuoso 7.0      OpenLink     http://virtuoso.openlinksw.com/
LOD2 Title . 02.09.2010 . Page 9                           http://lod2.eu




           Social Intelligence Benchmark




                                       14 dictionaries
                                        of real data
Facebook schema style
                                     Realistic scenario
                                        simulation

         Synthetic Generated Data                         Linked Open Data
LOD2 Title . 02.09.2010 . Page 11                                  http://lod2.eu




 Technical Work: Recycling (D2.4)

 Dynamic caching of intermediate query results
 • SPARQL problem: hard to index workload / expensive backward chaining
 Idea: compute once, re-use many times
LOD2 Title . 02.09.2010 . Page 13                           http://lod2.eu




 Technical Work: Virtuoso 7

 Major now upcoming release V7, due for release in 2012

 • column store technology:
       • aggressive compression  more data fits in RAM
       • vectored execution  things run faster
 • elastic cluster implementation
       • partitions can migrate across nodes
 • bringing computation to the data
       • arbitrary recursive functions in the cluster
 • geospatial support
       • full openGIS support, R-tree backed, EWKT format
 • future enhancements
       • adaptive query optimization (CWI ROX)
       •re-use of intermediates (CWI recycling)
       • using SSDs as cache
LOD2 Title . 02.09.2010 . Page 14                             http://lod2.eu




 Next 6 months


 Virtuoso: sampled query optimizer
 • query optimization in SPARQL is difficult (no stats)
 • use adaptive, run-time, query optimization with sampling

 MonetDB and SPARQL
 • First version in sight (cooperation with FORTH)
 • research tracks
       • RDF clustering on Characteristic Sets
       • correlated join path indexing

 LOD cache at 250B triples
 • what triples to use?
 • what hardware to use? (need 512GB RAM)
SIB . 23.03.2011 . Page 15            http://lod2.eu




      Contact

      Address

      Centrum Wiskunde Informatica (CWI)
      Science Park 123
      1098 XG Amsterdam
      The Netherlands

      monetdb.cwi.nl




Thanks for your attention!
LOD2 Title . 02.09.2010 . Page 16                                  http://lod2.eu




 LOD2 Benchmark Auditing Service

 Benchmarking needs of SPARQL engine vendors:
 • vendors want to publish in their own timescale
 • using new or upcoming releases (not yet public)
 • using properly tuned settings and hardware to their solution
 • yet need credibility (is it fair)

 Tournaments organized by one institution have
 • bad timing, wrong version, one more bug to fix, etc
 • not the right hardware or settings
 • may become a legal liability once matters become more serious

 LOD2 should reach out to the SPARQL technical community and
 provide independent benchmark auditing services
 • start with BSBM  working on Auditing Rules Document
 • maybe other benchmarks later

More Related Content

Viewers also liked

México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013EdelmanMexico
 
Social Media for Small Business
Social Media for Small BusinessSocial Media for Small Business
Social Media for Small Business
Caroline Cummings
 
Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1dapaz93
 
How to create cned school servr 40000 total
How to create cned school servr 40000 totalHow to create cned school servr 40000 total
How to create cned school servr 40000 total
Prachoom Rangkasikorn
 
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
Departamento de Derecho UNS
 
バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析0nly0
 
JMESI Bioethics Two Applications
JMESI Bioethics Two ApplicationsJMESI Bioethics Two Applications
JMESI Bioethics Two ApplicationsKevin Parrish
 

Viewers also liked (9)

México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013México Edelman Trust Barometer 2013
México Edelman Trust Barometer 2013
 
Social Media for Small Business
Social Media for Small BusinessSocial Media for Small Business
Social Media for Small Business
 
Podcast
PodcastPodcast
Podcast
 
Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1Retribution Storyboard1 Pt 1
Retribution Storyboard1 Pt 1
 
How to create cned school servr 40000 total
How to create cned school servr 40000 totalHow to create cned school servr 40000 total
How to create cned school servr 40000 total
 
resum 2015
resum 2015resum 2015
resum 2015
 
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010Ares José Luis -  Juicio abreviado: lo que hay que saber - 2010
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
 
バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析バリューチェーン分析・VRIO分析
バリューチェーン分析・VRIO分析
 
JMESI Bioethics Two Applications
JMESI Bioethics Two ApplicationsJMESI Bioethics Two Applications
JMESI Bioethics Two Applications
 

Similar to LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases

LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Creating Knowledge out of Interlinked Data
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oak
Michael Dürig
 
osscon_mysql_redis_plugin
osscon_mysql_redis_pluginosscon_mysql_redis_plugin
osscon_mysql_redis_plugin
hyeongchae lee
 
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and AuthoringLOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Creating Knowledge out of Interlinked Data
 
Solr 4
Solr 4Solr 4
Solr 4
Erik Hatcher
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
OpenLink Software
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Kingsley Uyi Idehen
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Creating Knowledge out of Interlinked Data
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)
Joachim Neubert
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
OpenEBS
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
Jim Dowling
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012Amazon Web Services
 
AWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data CommonsAWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data CommonsHannes Mühleisen
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
Maris Elsins
 

Similar to LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases (20)

LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and RepairLOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oak
 
osscon_mysql_redis_plugin
osscon_mysql_redis_pluginosscon_mysql_redis_plugin
osscon_mysql_redis_plugin
 
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and AuthoringLOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
 
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and RepairLOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
Publishing Linked Data from RDB
Publishing Linked Data from RDBPublishing Linked Data from RDB
Publishing Linked Data from RDB
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
Solr 4
Solr 4Solr 4
Solr 4
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
 
AWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data CommonsAWS Summit Berlin 2012 Talk on Web Data Commons
AWS Summit Berlin 2012 Talk on Web Data Commons
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 

More from LOD2 Creating Knowledge out of Interlinked Data

LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia SpotlightLOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKANLOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industryLOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 General Presentation 2012
LOD2 General Presentation 2012LOD2 General Presentation 2012
LOD2 General Presentation 2012
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolPartyLOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolParty
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMESLOD2 Webinar Series: LIMES
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project ManagementLOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data WebLOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 StackLOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Creating Knowledge out of Interlinked Data
 

More from LOD2 Creating Knowledge out of Interlinked Data (20)

LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViews
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7
 
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia SpotlightLOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia Spotlight
 
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKANLOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKAN
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
 
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industryLOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industry
 
LOD2 General Presentation 2012
LOD2 General Presentation 2012LOD2 General Presentation 2012
LOD2 General Presentation 2012
 
LOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolPartyLOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolParty
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMESLOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMES
 
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project ManagementLOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project Management
 
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
 
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
 
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
 
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data WebLOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
 
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
 
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 StackLOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
 

Recently uploaded

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 

Recently uploaded (20)

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 

LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases

  • 1. SIB . 23.03.2011 . Page 1 http://lod2.eu WP2 Storing and Querying Very Large Knowledge Bases Vienna Update March 2012 – M18 Peter Boncz http://lod2.eu
  • 2. SIB . 23.03.2011 . Page 2 http://lod2.eu Table of Contents • WP2 Refresher • LOD Cloud Hosted on the Knowledge Store Cluster * 50B mark reached, column-store Virtuoso deployed • State of the Art LOD Laboratory (“Benchmarking”) * LDBC – RDF Store Industry council * BSBM at large scale * RDF-H + Social Intelligence Benchmark (SIB) • Technical work * column-store Virtuoso  cluster version * recycling query results • Next up * LOD cloud @250B triples * Virtuoso: adaptive query optimizer (and more) * first MonetDB/SPARQL version (RDF clustering, graph indexing)
  • 3. LOD2 Title . 02.09.2010 . Page 3 http://lod2.eu WP2 Organization CWI (MonetDB): • Peter Boncz (also in VUA group of Frank v Harmelen) • Duc Pham Minh (Phd student) • Irini Fundulaki (1-year sabbatical from FORTH) OpenLink (Virtuoso): • Orri Erling • Hugh Williams • Ivan Mikhailov + FU Berlin (BSBM) + DERI (BSBM text+ LOD cloud + text retrieval/sindice) + ULEI (DBpedia benchmark)
  • 4. SIB . 23.03.2011 . Page 4 http://lod2.eu WP2 Storing and Querying Very Large Knowledge Bases Goal: enabling large-scale, feature-rich & enterprise-ready Linked Data management solutions Database Partners in LOD2: CWI: Leading open source analytics RDBMS OpenLink: Leading Linked data deployment platform Technological Excellence: Creating and publishing metrics for choosing RDF solutions Bringing Column Store Technology for Business Intelligence on RDF Ground-breaking database innovations for RDF stores (Dynamic Query optimization, Adaptive Caching of Joins, Optimized Graph Processing, Cluster/Cloud scalability)
  • 5. LOD2 Title . 02.09.2010 . Page 5 http://lod2.eu Task 2.1: State of the Art, Evaluation & Benchmarking LOD cloud cache scalability • M0: 20B triples • M12: 50B triples • M24: 250B triples • M36: 1T triples D2.4 completed: 50B triples in LOD cache @ DERI First deployment of Virtuoso7 Cluster • Currently hosting about 55 billion triples • 8 node Virtuoso v7 (column store) Cluster • 384GB RAM • 2TB Disk Storage • 14B/quads, excl literals Next up: • hardware provisioning for 250B and 1T triples (need 512GB RAM resp. 2TB RAM somewhere)
  • 6. LOD2 Title . 02.09.2010 . Page 6 http://lod2.eu Task 2.1: State of the Art, Evaluation & Benchmarking Benchmarking • creating new benchmarks • BSBM-BI (FU Berlin) • DBpedia Benchmark (ULEI) – best paper award • RDF-H (OGL,CWI) • Social Intelligence Benchmark (OGL,CWI) • running benchmark evaluations • BSBM on a large cluster cluster (Lisa @ SARA) • BSBM on large single-server (40cores, 1TB RAM) • creating industry consensus • Benchmark Auditing Service • LOD Benchmark Council
  • 7. LOD2 Title . 02.09.2010 . Page 7 http://lod2.eu BSBM Large Scale Experiments (still ongoing..) New Aspects: • The Business Intelligence Use Case (BI) • Benchmark Rules • BSBM V3 Results • trying cluster versions SARA LISA cluster • experiments with up to 64 nodes VectorWise high-end server • 40-core machine with 1TB RAM Benchmarked at SARA and Vectorwise 4store 1.1.2 Garlik http://4store.org/ BigData r4169 SYSTAP LLC http://www.systap.com/bigdata.htm BigOwlim 3.4.3129 OntoText http://www.ontotext.com/owlim/ Jena TDB 0.8.9 openjena.org http://www.openjena.org/TDB/ Fuseki 0.1.0 openjena.org http://openjena.org/wiki/Fuseki Virtuoso 7.0 OpenLink http://virtuoso.openlinksw.com/
  • 8. LOD2 Title . 02.09.2010 . Page 9 http://lod2.eu Social Intelligence Benchmark 14 dictionaries of real data Facebook schema style Realistic scenario simulation Synthetic Generated Data Linked Open Data
  • 9. LOD2 Title . 02.09.2010 . Page 11 http://lod2.eu Technical Work: Recycling (D2.4) Dynamic caching of intermediate query results • SPARQL problem: hard to index workload / expensive backward chaining Idea: compute once, re-use many times
  • 10. LOD2 Title . 02.09.2010 . Page 13 http://lod2.eu Technical Work: Virtuoso 7 Major now upcoming release V7, due for release in 2012 • column store technology: • aggressive compression  more data fits in RAM • vectored execution  things run faster • elastic cluster implementation • partitions can migrate across nodes • bringing computation to the data • arbitrary recursive functions in the cluster • geospatial support • full openGIS support, R-tree backed, EWKT format • future enhancements • adaptive query optimization (CWI ROX) •re-use of intermediates (CWI recycling) • using SSDs as cache
  • 11. LOD2 Title . 02.09.2010 . Page 14 http://lod2.eu Next 6 months Virtuoso: sampled query optimizer • query optimization in SPARQL is difficult (no stats) • use adaptive, run-time, query optimization with sampling MonetDB and SPARQL • First version in sight (cooperation with FORTH) • research tracks • RDF clustering on Characteristic Sets • correlated join path indexing LOD cache at 250B triples • what triples to use? • what hardware to use? (need 512GB RAM)
  • 12. SIB . 23.03.2011 . Page 15 http://lod2.eu Contact Address Centrum Wiskunde Informatica (CWI) Science Park 123 1098 XG Amsterdam The Netherlands monetdb.cwi.nl Thanks for your attention!
  • 13. LOD2 Title . 02.09.2010 . Page 16 http://lod2.eu LOD2 Benchmark Auditing Service Benchmarking needs of SPARQL engine vendors: • vendors want to publish in their own timescale • using new or upcoming releases (not yet public) • using properly tuned settings and hardware to their solution • yet need credibility (is it fair) Tournaments organized by one institution have • bad timing, wrong version, one more bug to fix, etc • not the right hardware or settings • may become a legal liability once matters become more serious LOD2 should reach out to the SPARQL technical community and provide independent benchmark auditing services • start with BSBM  working on Auditing Rules Document • maybe other benchmarks later

Editor's Notes

  1. From the aforementioned reasons, we proposed an RDF and graph database benchmark, called Social Intelligence benchmark, that can exploit the advantages of RDF in graph representation. We are aiming at testing the graph database performance on a highly connected graph. As social network is a high profile for graph data management, we design our benchmark over the scenarios of a social network. We try to generate data as realistic as possible with correlations and offer challenging queries over the data correlations.Besides, since a very large amount of useful information is available in many linked-open datasets, we exploit these resources by linking to them.
  2. Now, I will describe the data specification of SIB. As Facebook is the most popular social network with more than 800 millions active users, we take the schema style of Facebook as the baseline for designing SIB. For generating realistic data, we use 14 dictionaries that we build from real data. These dictionaries cover various domains, for example, geographical information, personal names,..SIB data is designed so that it can simulate realistic scenario including the real behaviors of the users and the characteristics of data distributions in social networks.As we mention before, our synthetic data is linked with well-known linked open data. And here, SIB is linked with DBPedia, one of the largest linked open dataset.
  3. I think most of us know FB and even have a Facebook account. The logical schema of our benchmark simulates the Facebook schema in which a user can have many friends, and there are friendships between them. A user can provide many profile information such as his name, where he is studying at, where he is living at. He can also specify his current status, for example, in Relation ship with another user. The user can upload many photo, start a discussion by writing posts, and get a lot of comments from his friends.