SlideShare a Scribd company logo
WP2Storing and Querying Very Large Knowledge Bases Peter Boncz
[object Object]
 LOD Cloud Hosted on the Knowledge Store Cluster   * major performance increases in OpenLink ,[object Object],   * BSBM-v3 / BSBM-BI    * LOD2 Benchmark Auditing Service (BAS)    * new benchmarks: RDF-H +      Social Intelligence Benchmark (SIB) ,[object Object],   * dynamic cluster repartitioning    * integration MonetDB-OpenLink    * caching intermediates    * graph path processing    * entity ranking    * geo
WP2 Storing and Querying Very Large Knowledge Bases Goal: enabling large-scale, feature-rich & enterprise-ready Linked Data management solutions Database Partners in LOD2: CWI: Leading open sourceanalytics RDBMS OpenLink: LeadingLinked data deployment platform TechnologicalExcellence:  Creating and publishing metrics for choosing RDF solutions Bringing Column Store Technology for Business Intelligence on RDF   Ground-breaking database innovations for RDF stores 	(Dynamic Query optimization, Adaptive Caching of Joins,      Optimized Graph Processing, Cluster/Cloud scalability)
WP2 Linked Open Data for real in your Apps Business Advantages: Enrichyourapplicationwith (free & rich) Linked Open Data RDF store technology has 10x lowerdeploymentcoststhanrelational for ragged data TechnologicalFlexibility: DeliverSchema-LastFlexibility and Inference at Relational Data WarehouseCost and Performance Grow as you go: the LOD2 platform dynamicallyadapts to yourusagepatterns and structure of your data Integrate, resolve, alignanything: Schema, instanceidentity Rich Features for complex Applications: Advanced SPARQL and SQL query processing  SPARQL and SQL Federation Full Text, Geospatial, Text Search   Scale-Outon Clusters, Replication
LOD CloudDeliverables (OGL) D2.1.1 Initial  (M3)			original targets: D2.1.3 Intermediate (M15)  	50B triples D2.1.4 Intermediate2 (M27) 	250B triples D2.1.5 Final (M39) 			1T triples Activities: ,[object Object],	making it faster (adopting MonetDB principles) 	* column store and compression 	* vectored execution 	introducing multi-core features ,[object Object],getting more data sets crawling the web, NLP extraction of data  benchmarking with synthetic data sets
Task 2.1: State of the Art, Evaluation & Benchmarking This task reviews the state of the art in RDF and relational analytics databases and creates a laboratory with the leading products of both categories installed. This can serve as a testing and benchmarking resource for constantly measuring the project's progress against the baseline of the best in the market.  Benchmarking in LOD2 serves two purposes:  measuring the relative cost of RDF versus equivalent relational functionality and  measuring RDF performance in applications which are RDF's home terrain, e.g. integration of highly heterogeneous, "ragged" content with alignment at preprocessing/run time by rules and machine learning approaches.  For the first case, we can use TPC H and its star schema derivative (SSBM). For the second case, new benchmarks need to be developed, encompassing different functionality.
Task 2.1: State of the Art, Evaluation & Benchmarking The benchmarks will be developed primarily during the first year, with work on integration quality metrics extending over the second year. The benchmarks will be run and results published at each milestone of the project.  Huge data size scalability (e.g. trillion triples) is expected to require a cluster, most feasibly temporary deployment in a cloud system, and the goal of the DB work in LOD2 is to reduce the cost of deployment as much as possible, by devising techniques that reduce the memory requirements of large RDF deployments.  We currently envision Oracle 11g R2, BigOWLIM, YARS, Vertica, AllegroGraph, VectorWise and MonetDB to be deployed in the LOD2 benchmarking laboratory. As benchmarks we envision TPC-H, LUBM, UOMB, BSBM, SP2Bench and, SSBM; and as described above propose the creation of a new benchmark patterned after social networking data.
Task 2.1: State of the Art, Evaluation & Benchmarking D1.2   State of the Art Analysis (M3) Held a survey among RDF engine vendors (Jena TDB/SDB, 4Store, BigOWLIM,  OpenLink Virtuoso) Established contacts for future benchmarking activities. D2.1.2 State of the Art LOD Laboratory  (M6) Installed engines at two sites (FUB, CWI) Ran initial experiments on BSBMv3 To follow: ,[object Object]
 RDF-H  (TPC-H as RDF)
 Social Intelligence Benchmark		,[object Object]
 Benchmark Rules
 BSBM V3 ResultsBenchmarked at FUB: 4store 1.1.2 		Garlik 		http://4store.org/ BigData r4169 		SYSTAP LLC 	http://www.systap.com/bigdata.htm BigOwlim 3.4.3129	OntoText 		http://www.ontotext.com/owlim/ Jena TDB 0.8.9 	openjena.org 	http://www.openjena.org/TDB/ Fuseki 0.1.0 		openjena.org 	http://openjena.org/wiki/Fuseki Virtuoso 7.0 		OpenLink 	http://virtuoso.openlinksw.com/ Main new conclusions: we ran into several technical problems for BI. To give the store vendors time to fix and optimize their stores we considered running the tests again in about three or four months. For the next test runs we will also modify query 4, because of its quadratic complexity and therefore bad scalability characteristics.
LOD2 Benchmark Auditing Service  Benchmarking needs of SPARQL engine vendors: ,[object Object]
 using new or upcoming releases (not yet public)
 using properly tuned settings and hardware to their solution
 yet need credibility (is it fair)Tournaments organized by one institution have ,[object Object]
 not the right hardware or settings
 may become a legal liability once matters become more seriousLOD2 should reach out to the SPARQL technical community and provide independent benchmark auditing services ,[object Object]
 maybe other benchmarks later,[object Object]
 queries not always properly balanced / weights thought out

More Related Content

Similar to LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases

DBpedia Mobile Explorer
DBpedia Mobile ExplorerDBpedia Mobile Explorer
DBpedia Mobile Explorer
Iacopo Vagliano
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Querying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using FunqlQuerying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using Funql
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Raghunath A
 
The Rise of Nosql Databases
The Rise of Nosql DatabasesThe Rise of Nosql Databases
The Rise of Nosql Databases
JAMES NGONDO
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
Robert Grossman
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
IJCSIS Research Publications
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polaris
AyushBansal122
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
OllieShoresna
 
Spark
SparkSpark
Spark
newmooxx
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_SparkMat Keep
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integration
Raul Palma
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
Sumant Tambe
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Christophe Debruyne
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
PRBETTER
 
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge BasesLOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Creating Knowledge out of Interlinked Data
 

Similar to LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases (20)

DBpedia Mobile Explorer
DBpedia Mobile ExplorerDBpedia Mobile Explorer
DBpedia Mobile Explorer
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BD
 
Querying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using FunqlQuerying Mongo Without Programming Using Funql
Querying Mongo Without Programming Using Funql
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
The Rise of Nosql Databases
The Rise of Nosql DatabasesThe Rise of Nosql Databases
The Rise of Nosql Databases
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
 
disertation
disertationdisertation
disertation
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polaris
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
 
Spark
SparkSpark
Spark
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integration
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
 
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge BasesLOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
 

More from LOD2 Creating Knowledge out of Interlinked Data

LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia SpotlightLOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKANLOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industryLOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 General Presentation 2012
LOD2 General Presentation 2012LOD2 General Presentation 2012
LOD2 General Presentation 2012
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolPartyLOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolParty
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMESLOD2 Webinar Series: LIMES
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project ManagementLOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Creating Knowledge out of Interlinked Data
 
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data WebLOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Creating Knowledge out of Interlinked Data
 

More from LOD2 Creating Knowledge out of Interlinked Data (20)

LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViews
 
LOD2 Webinar Series FOX
LOD2 Webinar Series FOXLOD2 Webinar Series FOX
LOD2 Webinar Series FOX
 
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7
 
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia SpotlightLOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia Spotlight
 
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKANLOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKAN
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
 
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industryLOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industry
 
LOD2 General Presentation 2012
LOD2 General Presentation 2012LOD2 General Presentation 2012
LOD2 General Presentation 2012
 
LOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolPartyLOD2 Webinar Series: PoolParty
LOD2 Webinar Series: PoolParty
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
LOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMESLOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMES
 
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project ManagementLOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP12 - Project Management
 
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
 
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
 
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
 
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data WebLOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
 

LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases

  • 1. WP2Storing and Querying Very Large Knowledge Bases Peter Boncz
  • 2.
  • 3.
  • 4. WP2 Storing and Querying Very Large Knowledge Bases Goal: enabling large-scale, feature-rich & enterprise-ready Linked Data management solutions Database Partners in LOD2: CWI: Leading open sourceanalytics RDBMS OpenLink: LeadingLinked data deployment platform TechnologicalExcellence:  Creating and publishing metrics for choosing RDF solutions Bringing Column Store Technology for Business Intelligence on RDF Ground-breaking database innovations for RDF stores (Dynamic Query optimization, Adaptive Caching of Joins, Optimized Graph Processing, Cluster/Cloud scalability)
  • 5. WP2 Linked Open Data for real in your Apps Business Advantages: Enrichyourapplicationwith (free & rich) Linked Open Data RDF store technology has 10x lowerdeploymentcoststhanrelational for ragged data TechnologicalFlexibility: DeliverSchema-LastFlexibility and Inference at Relational Data WarehouseCost and Performance Grow as you go: the LOD2 platform dynamicallyadapts to yourusagepatterns and structure of your data Integrate, resolve, alignanything: Schema, instanceidentity Rich Features for complex Applications: Advanced SPARQL and SQL query processing SPARQL and SQL Federation Full Text, Geospatial, Text Search Scale-Outon Clusters, Replication
  • 6.
  • 7. Task 2.1: State of the Art, Evaluation & Benchmarking This task reviews the state of the art in RDF and relational analytics databases and creates a laboratory with the leading products of both categories installed. This can serve as a testing and benchmarking resource for constantly measuring the project's progress against the baseline of the best in the market. Benchmarking in LOD2 serves two purposes: measuring the relative cost of RDF versus equivalent relational functionality and measuring RDF performance in applications which are RDF's home terrain, e.g. integration of highly heterogeneous, "ragged" content with alignment at preprocessing/run time by rules and machine learning approaches. For the first case, we can use TPC H and its star schema derivative (SSBM). For the second case, new benchmarks need to be developed, encompassing different functionality.
  • 8. Task 2.1: State of the Art, Evaluation & Benchmarking The benchmarks will be developed primarily during the first year, with work on integration quality metrics extending over the second year. The benchmarks will be run and results published at each milestone of the project. Huge data size scalability (e.g. trillion triples) is expected to require a cluster, most feasibly temporary deployment in a cloud system, and the goal of the DB work in LOD2 is to reduce the cost of deployment as much as possible, by devising techniques that reduce the memory requirements of large RDF deployments. We currently envision Oracle 11g R2, BigOWLIM, YARS, Vertica, AllegroGraph, VectorWise and MonetDB to be deployed in the LOD2 benchmarking laboratory. As benchmarks we envision TPC-H, LUBM, UOMB, BSBM, SP2Bench and, SSBM; and as described above propose the creation of a new benchmark patterned after social networking data.
  • 9.
  • 10. RDF-H (TPC-H as RDF)
  • 11.
  • 13. BSBM V3 ResultsBenchmarked at FUB: 4store 1.1.2 Garlik http://4store.org/ BigData r4169 SYSTAP LLC http://www.systap.com/bigdata.htm BigOwlim 3.4.3129 OntoText http://www.ontotext.com/owlim/ Jena TDB 0.8.9 openjena.org http://www.openjena.org/TDB/ Fuseki 0.1.0 openjena.org http://openjena.org/wiki/Fuseki Virtuoso 7.0 OpenLink http://virtuoso.openlinksw.com/ Main new conclusions: we ran into several technical problems for BI. To give the store vendors time to fix and optimize their stores we considered running the tests again in about three or four months. For the next test runs we will also modify query 4, because of its quadratic complexity and therefore bad scalability characteristics.
  • 14.
  • 15. using new or upcoming releases (not yet public)
  • 16. using properly tuned settings and hardware to their solution
  • 17.
  • 18. not the right hardware or settings
  • 19.
  • 20.
  • 21. queries not always properly balanced / weights thought out
  • 22. BSBM is a relational benchmark anyway
  • 23.
  • 24. test areas where RDF/SPARQL can truly excel
  • 25. test areas addressed by our future work such as: * caching intermediates * graph path processing * entity ranking * geo
  • 26. Upcoming Work D2.2 dynamic cluster repartitioning M12 D2.3 integration MonetDB-OpenLink M12 D2.4 caching intermediates M18 D2.5 graph path processing M24 D2.6 bulk processing M24 D2.7 entity ranking M30 D2.8 geographical indexing M36
  • 27.
  • 28.
  • 29. Data correlations included, e.g., regional correlation
  • 30.
  • 31.
  • 33.
  • 34. 12. Return all posts about an event (e.g., Unrest in Tunisia) in 10 recent days. 13. Show all posts about a specific location, e.g., Egypt, in 10 recent days (use the information from dbpedia for identify the location, e.g., Cairo is the capital of Egypt, Tahrir square is in Cairo) 14. Find number of inactive users: all users activated for at least 30 days but did not have any post or all users that do not have any more post for 60 days. 15. Show all photos posted by friends of a user that she was tagged. 6. Show the list of a user's top-10 close friends: Sort the friendship according to the number of photos that a user and her friend are both tagged, then according to number of user’s tags for each friends 17. Find top-10 friends or all friends of friends of you that have common interest (Based on the similarity between the tags in your posts and tags in their posts) 18. What are the current hottest events/problems? (Get the hash tags from posts and order by the number of their appearances in 10 recent days) 19. Which area is the most active area? (Order by the total number of posts in each location in 5 recent days) 20. Return the top-10 locations that have the fastest growth in the number of users. (Count the number of people joined before 10 days and those joined during the 10 recent days, and then, compute the developing rate). Interactive Query Mix (2/2)
  • 35.
  • 36. Add/Remove tags
  • 37.
  • 38. Remove a user from a group
  • 39.
  • 40. Add/Remove tags in the photo
  • 41. Add/Remove a comment 6. Remove all the tags to a user from the pictures or posts of her friends 7. Remove all friends of a user who do not have any interaction with her 8. Send group invitation message to top-10 close friends of a user (Write a post of the wall of these users). Update query mix
  • 42. 1. The fastest propagating ideas The topic with the most users who have joined in the last day 2. Wildfire Find the first mentions of a concept in the last day such that it is not mentioned before and occurs in more than 10% of new posts in groups involved with politics. 3. Product advertisement Where and when to advertize Hello Kitty? 4. Challengers Which fictional entities are challengers of Hello Kitty? 5. Potential clients Who are iPhone users or potential iPhone clients? 6. Associated product People who consider/mention about iPhone also mention about which products? 7. Product lifetime When it is the time for releasing the information about the new iPhone version? 8. Troublemakers and duplicates Finding troublemakers and duplicated identities based on behavior patterns 9. Application accounts Accounts created only for applications, e.g., for playing games, in SNs 10. Expert finding Find a user who is expert in computer science and have friends who are expert in Maths and Physics 11. New idol User whose fan page has the fastest increment on the number of members during the last 30 days. Analysis Query Mix
  • 43.
  • 44.
  • 45.
  • 46. References [BNBM] BotnetBenchMark, http://www.w3.org/wiki/BotNetBenchmark