SlideShare a Scribd company logo
1 of 33
Virtuoso: 
The Prometheus of 
RDF-based Relational Data 
Management 
By Orri Erling 
Virtuoso Program Manager 
OpenLink Software
Linked Data at Dawn 
 The Promise and the Practice 
 The Science of Speed 
 The Structure which Is 
 Ongoing Research 
License CC-BY-SA 4.0 (International).
Linked Data Promises 
 RDF is a generic, minimalistic 
model for describing things 
 RDF has global identifiers and 
data is self-describing 
 URIs may be dereferenceable 
 RDF is flexible to query, 
does not force a single hierarchical view like XML 
License CC-BY-SA 4.0 (International).
Linked Data Scenarios 
 RDF is used because of 
 schema flexibility 
 global identifiers 
 Inference, if present, is usually trivial 
 Subclass 
 Sub-property 
License CC-BY-SA 4.0 (International).
Where Triples Come From 
 Relational extracts or web content is converted to 
and stored as triples 
 NLP extraction 
 New applications with RDF as primary data model 
 Doing SPARQL against data in RDBs is possible 
but is rare and does not deliver the flexibility 
License CC-BY-SA 4.0 (International).
Linked Data Verticals and Patterns 
 Publishing: tagging & annotations, evolving vocabularies 
 Archives: self description, long term identifiers, many 
versions of schema 
 Semantic search: structured, semi-structured, and 
full text, all in one 
 Business intelligence: many sources, ease of adding 
sources, no 6 month DW schema change cycle 
 E-science, often in life sciences: common interchange 
format, nano-publications, NLP extracts, different users 
cook their data differently, provenance 
License CC-BY-SA 4.0 (International).
The Hopes and Perceptions 
 The age of ad hoc 
 Find insight in any data, when you need it, 
from any source, any format 
 No data warehouse planning cycles; make your own 
from the pieces you need, when you need it 
 Still, data integration remains hard work; quality and 
coverage of sources vary 
 Flexibility may be there, but is performance and 
scalability on the level? 
License CC-BY-SA 4.0 (International).
Yes, But ... 
 Web and Big Data: Everybody reinvents the triple. 
Self-description, long term identifiers, key-value pairs 
in many non-RDF use cases 
 SPARQL and RDF would be the natural, standards-compliant 
choice if did beat SQL, information retrieval, 
custom big data, key value, map reduce solutions 
Is this intrinsic to linked data or is this lack of engineering? 
 Linked data has unique advantages in breadth of 
coverage and expressivity but performance must not lag 
behind. 
License CC-BY-SA 4.0 (International).
What is the RDF Tax? 
 90% of bad performance comes from 
non-optimal query plans 
 Some comes from indexing too much 
(e.g., SQL bulk load with no indices is 50x faster 
than the equivalent in RDF with all indexed) 
 Some comes from string ops on URIs, literals 
 Some comes from having a join for every attribute. 
Vectoring and right plans help, though 
License CC-BY-SA 4.0 (International).
The Bane of the Triple 
When data is stored as triples: 
 There is structure still but it is harder to exploit. 
Schema re-emerges as correlations 
 More joins make more possible query plans, 
bigger errors in plan cost estimation 
 More joining reduces locality 
 Lack of schema causes needless indexing; 
data takes more space 
 A URI for everything takes space and time 
For the same workload, Virtuoso SQL can also be 2–20x 
faster than Virtuoso SPARQL 
License CC-BY-SA 4.0 (International).
The Question is Raised 
 LOD2 FP7, now ending: RDF Performance parity with 
relational? 
 SQL is the senior science. Who ignores history is bound 
to repeat it 
 Integral mastery of RDB science is a prerequisite, but 
do not forget the subtle twists of schema-less-ness 
License CC-BY-SA 4.0 (International).
Virtuoso RDF Relational DBMS Leadership 
 2000–2006, v1.x–4.x: SQL row store with SQL 
federation and XML 
 2007–2008, v5.x–6.x: SPARQL, adapted for RDF quads 
with more compression, bitmap indices, special data 
types, RDF awareness in query optimization 
 2009, v6.x: Scale-out cluster-capable 
 2010–2013, v7.x: Column store, vectored execution, 3x 
more space efficient, 10+x more speed 
 2013: Star Schema benchmark with SPARQL, 100x 
MySQL SQL, 0.8x MonetDB SQL 
 2014: Top of the line SQL analytics, 500 Gtriples, 
Structure Awareness 
License CC-BY-SA 4.0 (International).
Triples Done Right, so? 
 Column-store techniques are a good fit; index-based 
triple storage does not get much better 
 RAM-only pointer-based techniques can be faster but 
cost 10–100x more to scale up 
 To take RDF to SQL parity, Virtuoso must first be on the 
level with the best in SQL 
 TPC-H is the checklist for mastery of DW and query 
optimization; who survives shall not fear 
 Parity is achieved when running with triples, just like 
with tables 
License CC-BY-SA 4.0 (International).
Structure is Everywhere 
CWI in LOD2: 
 90% of triples in Common Crawl fall into 20 tables 
 All relational extractions are 100% tables 
 Even DBpedia is 90% covered by 500 tables, but is 
unusually heterogeneous, albeit not very large 
License CC-BY-SA 4.0 (International).
The Glorious Dawn: 
Structure is the Servant, not the Tyrant 
 A set of subjects with all the same single-valued 
properties is in fact a table. 
 So, store it as a table 
 Allow exceptions, e.g., sometimes multiple values, 
different values in different graphs, extra properties, etc. 
 If it is big, it has repeating structure 
 All RDF semantics are preserved; any triple is possible, 
but the common ones are SQL compact and SQL fast 
 With tables, query optimization returns to SQL 
complexity and is much more reliable 
 So, more tricks from the SQL analytics bag become 
safe and applicable 
License CC-BY-SA 4.0 (International).
Gains from Structure Awareness 
 3+x Load Speed 
 2x more space efficiency 
 SPARQL queries against regular data within 10–20% of 
SQL speeds 
 Just declare which properties tend to occur together; no 
strict schema-first like with SQL 
 Later, self configuration 
License CC-BY-SA 4.0 (International).
The Cycle of Adventure 
 Rebels: SQL not cool, too rigid, 
drop ACID, go key-value, map-reduce, 
the triple is all there is, 
semantic web 
 Pioneers: Life on the frontier is 
hard, infrastructure missing or bad 
 Same everyday problems also in 
Utopia 
 Recognizing the objective values, 
e.g., schema freedom and 
identifiers, no AI. Do the job, 
forget dogma 
 Reconciliation: schema-first and 
schema-last converge in structure 
awareness 
License CC-BY-SA 4.0 (International).
Present FP7 Research 
 LDBC — Transparency and Relevance for 
Graph DB, RDF performance 
 GeoKnow — GeoData is everywhere, 
how to carry the planet in your pocket 
 LOD2 — Where no triple has gone before 
(and come back) 
 Open PHACTs — A Data Platform for 
Drug Discovery 
License CC-BY-SA 4.0 (International).
LDBC - Linked Data Benchmark Council 
 Rebels: SQL not cool, too rigid, drop ACID, 
go key-value, map-reduce, the triple is all 
there is, semantic web 
 Pioneers: Life on the frontier is hard, 
infrastructure missing or bad 
 Same everyday problems also in Utopia 
 Recognizing the objective values, e.g., 
schema freedom and identifiers, no AI. 
Do the job, forget dogma 
 Reconciliation: Some of the rebel thinking 
becomes mainstream, e.g., schema-first and 
schema-last converge in structure awareness 
License CC-BY-SA 4.0 (International).
LDBC, Independent Industry Forum for 
Benchmarking 
 The TPC for the frontiers of database 
 Bootstrapped in the LDBC FP7, continues 
as independent industry association 
 OpenLink, Ontotext, Neo Technologies, 
Sparsity as founding members 
 IBM, Oracle Labs, Systap, SPARQL City 
already joined 
 DB superstars Peter Boncz and Thomas 
Neumann as founders and scientific lead 
License CC-BY-SA 4.0 (International).
LDBC Benchmarks 
Social Network 
 Online — Lookups, updates, analysis of 
social environment 
 Business Intelligence — Spotting trends, 
key players, big query 
 Graph analytics — Community detection, 
Page rank, graph metrics 
Semantic Publishing 
 Modeled after the BBC linked data portal, 
online lookups, drill downs and updates 
License CC-BY-SA 4.0 (International).
GeoKnow - The Planet in your Pocket 
Ms. Globe and Mr. Cube have a thing 
going on: 
 Mr. Cube: Desiloization ... integrated 
metadata ... Explicit semantics . 
 Ms. Globe: I can feel it ... but are you 
man enough? ... you need to show me. 
License CC-BY-SA 4.0 (International).
Planet Scale Roadmap 
Jan 2014: 
 Virtuoso SPARQL outperforms PostGIS in map lookups 
with planet-wide Open Street Map 
 Virtuoso SQL adds 5x more power 
License CC-BY-SA 4.0 (International).
Next: Jan 2015 
 Parity between SPARQL and SQL via structure 
awareness 
 Geospatial data clustering 
 Graph analytics close to the data — Pregel, Giraph, 
etc., in the DB itself 
 Adding fine-grained geo dimension to LDBC social 
network benchmark 
License CC-BY-SA 4.0 (International).
The LOD2 scaling adventures 
Experiments at CWI’s Scilens cluster 
 Jan 2013: 150 Gtriples (8 x 256GB 
RAM) 
 Aug 2014: 500 Gtriples (12 x 256GB 
RAM) 
 Some trillion-triple claims exist, but 
do not detail any query workload 
BSBM explore and BI workloads 
 10x speed gains for BI queries 
between 2013 and 2014 
Bulk load at 6M triples/s 
 All done in triples, structure 
awareness will go further still 
License CC-BY-SA 4.0 (International).
Open PHACTs 
Partners: 
License CC-BY-SA 4.0 (International).
Virtuoso Now 
Snapshot of RDF Linked Data customers in the Enterprise: 
 Data.Gov (U.S. Govt. Open 
Linked Data initiative) 
 Bank of America 
 Booz Allen Hamilton 
 Northrop Grumman 
 Elsevier 
 French National Library 
 Samsung 
 Globo 
 Daimler Benz 
 Johnson & Johnson 
 Bayer 
 St Jude's Medical 
 Fuijitsu 
 Syngenta 
 and many more 
License CC-BY-SA 4.0 (International).
Virtuoso Availability 
 Most capabilities as open source 
 Commercial adds 
 Cluster scale-out 
 SQL Federation 
 Replication (SQL & RDF) 
 Advanced RDF security; ABAC & RBAC (ACLs) 
 Wide tables 
 and more 
 Up to the minute tech previews via v7fasttrack on 
github, e.g., superfast TPC-H implementation 
License CC-BY-SA 4.0 (International).
Virtuoso Future 
 Preview of structure-aware RDF store 
in fall 2014 via v7fasttrack 
Integrated graph analytics framework 
 Embed complex graph algorithms, e.g., community 
detection, shortest path inside SPARQL/SQL 
 Comparison of SQL and SPARQL for big data analytics 
License CC-BY-SA 4.0 (International).
Linked Data Now 
 Adoption across major industries 
 Superior flexibility and time to solution 
 Dramatic performance gains in the last 5 years 
 Benchmarking will continue to drive progress, to the 
benefit of users and vendors alike 
 Run circles around most open source SQL in SPARQL: 
Virtuoso SPARQL beats MySQL in SSB by 100x 
 With structure awareness, SPARQL to match the best in 
SQL for data warehousing, OLTP 
 Linked Data no longer a long shot but a technology that 
makes sense 
License CC-BY-SA 4.0 (International).
About OpenLink Software 
OpenLink Software is a privately-held company founded in 1992 by its 
President & CEO, Kingsley Idehen. The company is an industry acclaimed 
technology innovator in the following areas: 
 ODBC, JDBC, ADO.NET, and 
OLE DB compliant Data Access 
Drivers for Oracle, Microsoft SQL 
Server, Informix, Ingres, Sybase, 
Progress, MySQL, and PostgreSQL 
 High-Performance & Scalable Multi- 
License CC-BY-SA 4.0 (International). 
Model (Relational & Graph) 
Database Technology 
 Data Integration Middleware (Data 
Virtualization Technology across a 
wide variety of Protocols & Formats) 
 Socially-enhanced Distributed 
Collaborative Applications Platforms 
(Weblogs, Wikis, Feed Aggregation 
and Syndication, Web File Systems, 
Discussion Forums, etc.) 
 Web Application Server Technology 
 Linked Data Deployment & 
Management 
 Identity Management
Office Locations 
USA 
OpenLink Software, Inc 
10 Burlington Mall Road 
Suite 265 
Burlington, MA 01803 
Tel.: +1 781 273 0900 
Fax: +1 781 229 8030 
UK 
OpenLink Software Ltd. 
Airport House 
Purley Way 
Croydon, Surrey CR0 0XZ 
Tel.: +44 (0)20 8681 7701 
Fax: +44 (0)20 8681 7702 
License CC-BY-SA 4.0 (International).
Additional Information 
Web Sites 
OpenLink Software 
YouID – Digital Identity Card (Certificate) Generator 
OpenLink Data Spaces – Semantically enhanced Personal & Enterprise Data Spaces & 
Collaboration Platform 
OpenLink Virtuoso - Hybrid Data Management, Integration, Application, and Identity Server 
Universal Data Access Drivers - High-Performance ODBC, JDBC, ADO.NET, and OLE-DB Drivers 
LDAP and NetID-TLS – How to use LDAP scheme URIs with NetID-TLS Authentication 
Social Media Data spaces 
http://www.openlinksw.com/weblog/oerling/ (Orri Erling weblog) 
http://kidehen.blogspot.com (Kingsley Idehen weblog) 
http://www.openlinksw.com/blog/~kidehen/ (Kingsley Idehen weblog) 
https://twitter.com/OpenLink (Twitter) 
Hashtags: #LinkedData #SemanticWeb #BigData #RDF (Anywhere). 
License CC-BY-SA 4.0 (International).

More Related Content

What's hot

Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-endUsing Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-endKingsley Uyi Idehen
 
Understanding Linked Data via EAV Model based Structured Descriptions
Understanding Linked Data via EAV Model based Structured DescriptionsUnderstanding Linked Data via EAV Model based Structured Descriptions
Understanding Linked Data via EAV Model based Structured DescriptionsKingsley Uyi Idehen
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integrationrumito
 
Solving Real Problems Using Linked Data
Solving Real Problems Using Linked DataSolving Real Problems Using Linked Data
Solving Real Problems Using Linked Datarumito
 
Virtuoso Universal Server Overview
Virtuoso Universal Server OverviewVirtuoso Universal Server Overview
Virtuoso Universal Server Overviewrumito
 
Tableau Desktop as a Linked (Open) Data Front-End via ODBC
Tableau Desktop as a Linked (Open) Data Front-End via ODBCTableau Desktop as a Linked (Open) Data Front-End via ODBC
Tableau Desktop as a Linked (Open) Data Front-End via ODBCKingsley Uyi Idehen
 
Exploiting Linked (Open) Data via Microsoft Access using ODBC File DSNs
Exploiting Linked (Open) Data via Microsoft Access using ODBC  File DSNsExploiting Linked (Open) Data via Microsoft Access using ODBC  File DSNs
Exploiting Linked (Open) Data via Microsoft Access using ODBC File DSNsKingsley Uyi Idehen
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Noterumito
 
Linked Open Data (LOD) Cloud & Ontology Life Cycles
Linked Open Data (LOD) Cloud & Ontology Life Cycles Linked Open Data (LOD) Cloud & Ontology Life Cycles
Linked Open Data (LOD) Cloud & Ontology Life Cycles Kingsley Uyi Idehen
 
Accessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCAccessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCKingsley Uyi Idehen
 
Virtuoso ODBC Driver Configuration & Usage (Mac OS X)
Virtuoso ODBC Driver Configuration & Usage (Mac OS X)Virtuoso ODBC Driver Configuration & Usage (Mac OS X)
Virtuoso ODBC Driver Configuration & Usage (Mac OS X)Kingsley Uyi Idehen
 
Intro to the Semantic Web Landscape - 2011
Intro to the Semantic Web Landscape - 2011Intro to the Semantic Web Landscape - 2011
Intro to the Semantic Web Landscape - 2011LeeFeigenbaum
 
Exploiting Linked Data via Filemaker
Exploiting Linked Data via FilemakerExploiting Linked Data via Filemaker
Exploiting Linked Data via FilemakerKingsley Uyi Idehen
 
What “Model” DITA Specializations Can Teach About Information Modelinc
What “Model” DITA Specializations Can Teach About Information ModelincWhat “Model” DITA Specializations Can Teach About Information Modelinc
What “Model” DITA Specializations Can Teach About Information ModelincDon Day
 
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBCUsing SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBCKingsley Uyi Idehen
 
Exploiting Linked (Open) Data via Microsoft Access
Exploiting Linked (Open) Data via Microsoft AccessExploiting Linked (Open) Data via Microsoft Access
Exploiting Linked (Open) Data via Microsoft AccessKingsley Uyi Idehen
 
Virtuoso ODBC Driver Configuration & Usage (Windows)
Virtuoso ODBC Driver Configuration & Usage (Windows)Virtuoso ODBC Driver Configuration & Usage (Windows)
Virtuoso ODBC Driver Configuration & Usage (Windows)Kingsley Uyi Idehen
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
Solving Real Problems Using Linked Data
Solving Real Problems Using Linked DataSolving Real Problems Using Linked Data
Solving Real Problems Using Linked DataKingsley Uyi Idehen
 

What's hot (20)

Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-endUsing Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
 
Understanding Linked Data via EAV Model based Structured Descriptions
Understanding Linked Data via EAV Model based Structured DescriptionsUnderstanding Linked Data via EAV Model based Structured Descriptions
Understanding Linked Data via EAV Model based Structured Descriptions
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integration
 
Solving Real Problems Using Linked Data
Solving Real Problems Using Linked DataSolving Real Problems Using Linked Data
Solving Real Problems Using Linked Data
 
Virtuoso Universal Server Overview
Virtuoso Universal Server OverviewVirtuoso Universal Server Overview
Virtuoso Universal Server Overview
 
Tableau Desktop as a Linked (Open) Data Front-End via ODBC
Tableau Desktop as a Linked (Open) Data Front-End via ODBCTableau Desktop as a Linked (Open) Data Front-End via ODBC
Tableau Desktop as a Linked (Open) Data Front-End via ODBC
 
Exploiting Linked (Open) Data via Microsoft Access using ODBC File DSNs
Exploiting Linked (Open) Data via Microsoft Access using ODBC  File DSNsExploiting Linked (Open) Data via Microsoft Access using ODBC  File DSNs
Exploiting Linked (Open) Data via Microsoft Access using ODBC File DSNs
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Note
 
ISWC 2012 - Linked Data Meetup
ISWC 2012 - Linked Data MeetupISWC 2012 - Linked Data Meetup
ISWC 2012 - Linked Data Meetup
 
Linked Open Data (LOD) Cloud & Ontology Life Cycles
Linked Open Data (LOD) Cloud & Ontology Life Cycles Linked Open Data (LOD) Cloud & Ontology Life Cycles
Linked Open Data (LOD) Cloud & Ontology Life Cycles
 
Accessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCAccessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBC
 
Virtuoso ODBC Driver Configuration & Usage (Mac OS X)
Virtuoso ODBC Driver Configuration & Usage (Mac OS X)Virtuoso ODBC Driver Configuration & Usage (Mac OS X)
Virtuoso ODBC Driver Configuration & Usage (Mac OS X)
 
Intro to the Semantic Web Landscape - 2011
Intro to the Semantic Web Landscape - 2011Intro to the Semantic Web Landscape - 2011
Intro to the Semantic Web Landscape - 2011
 
Exploiting Linked Data via Filemaker
Exploiting Linked Data via FilemakerExploiting Linked Data via Filemaker
Exploiting Linked Data via Filemaker
 
What “Model” DITA Specializations Can Teach About Information Modelinc
What “Model” DITA Specializations Can Teach About Information ModelincWhat “Model” DITA Specializations Can Teach About Information Modelinc
What “Model” DITA Specializations Can Teach About Information Modelinc
 
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBCUsing SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
Using SAP Crystal Reports as a Linked (Open) Data Front-End via ODBC
 
Exploiting Linked (Open) Data via Microsoft Access
Exploiting Linked (Open) Data via Microsoft AccessExploiting Linked (Open) Data via Microsoft Access
Exploiting Linked (Open) Data via Microsoft Access
 
Virtuoso ODBC Driver Configuration & Usage (Windows)
Virtuoso ODBC Driver Configuration & Usage (Windows)Virtuoso ODBC Driver Configuration & Usage (Windows)
Virtuoso ODBC Driver Configuration & Usage (Windows)
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Solving Real Problems Using Linked Data
Solving Real Problems Using Linked DataSolving Real Problems Using Linked Data
Solving Real Problems Using Linked Data
 

Similar to Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote

Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFOpenLink Software
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysDemi Ben-Ari
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationRichard Cyganiak
 
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...OpenLink Software
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013Facundo Farias
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options ComparedSergey Bushik
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Serverwebhostingguy
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesLinkurious
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open PlatformJongwook Woo
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkDatabricks
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 

Similar to Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote (20)

Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
 
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
ESWC 2013 Panel - Semantic Technologies for Big Data Analytics: Opportunities...
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Server
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph Databases
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open Platform
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 

More from Kingsley Uyi Idehen

HTML5 based PivotViewer for Visualizing LInked Data
HTML5 based PivotViewer for Visualizing LInked Data HTML5 based PivotViewer for Visualizing LInked Data
HTML5 based PivotViewer for Visualizing LInked Data Kingsley Uyi Idehen
 
Sigma Knowledge Engineering Environment
Sigma Knowledge Engineering EnvironmentSigma Knowledge Engineering Environment
Sigma Knowledge Engineering EnvironmentKingsley Uyi Idehen
 
Knowledge Design Patterns (by John F. Sowa)
Knowledge Design Patterns (by John F. Sowa)Knowledge Design Patterns (by John F. Sowa)
Knowledge Design Patterns (by John F. Sowa)Kingsley Uyi Idehen
 
Meet Charlie What Is Enterprise 3.0
Meet Charlie What Is Enterprise 3.0Meet Charlie What Is Enterprise 3.0
Meet Charlie What Is Enterprise 3.0Kingsley Uyi Idehen
 
Linked Data Spaces, Data Portability & Access
Linked Data Spaces, Data Portability & AccessLinked Data Spaces, Data Portability & Access
Linked Data Spaces, Data Portability & AccessKingsley Uyi Idehen
 
Data Portability And Data Spaces 2
Data Portability And Data Spaces 2Data Portability And Data Spaces 2
Data Portability And Data Spaces 2Kingsley Uyi Idehen
 

More from Kingsley Uyi Idehen (6)

HTML5 based PivotViewer for Visualizing LInked Data
HTML5 based PivotViewer for Visualizing LInked Data HTML5 based PivotViewer for Visualizing LInked Data
HTML5 based PivotViewer for Visualizing LInked Data
 
Sigma Knowledge Engineering Environment
Sigma Knowledge Engineering EnvironmentSigma Knowledge Engineering Environment
Sigma Knowledge Engineering Environment
 
Knowledge Design Patterns (by John F. Sowa)
Knowledge Design Patterns (by John F. Sowa)Knowledge Design Patterns (by John F. Sowa)
Knowledge Design Patterns (by John F. Sowa)
 
Meet Charlie What Is Enterprise 3.0
Meet Charlie What Is Enterprise 3.0Meet Charlie What Is Enterprise 3.0
Meet Charlie What Is Enterprise 3.0
 
Linked Data Spaces, Data Portability & Access
Linked Data Spaces, Data Portability & AccessLinked Data Spaces, Data Portability & Access
Linked Data Spaces, Data Portability & Access
 
Data Portability And Data Spaces 2
Data Portability And Data Spaces 2Data Portability And Data Spaces 2
Data Portability And Data Spaces 2
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote

  • 1. Virtuoso: The Prometheus of RDF-based Relational Data Management By Orri Erling Virtuoso Program Manager OpenLink Software
  • 2. Linked Data at Dawn  The Promise and the Practice  The Science of Speed  The Structure which Is  Ongoing Research License CC-BY-SA 4.0 (International).
  • 3. Linked Data Promises  RDF is a generic, minimalistic model for describing things  RDF has global identifiers and data is self-describing  URIs may be dereferenceable  RDF is flexible to query, does not force a single hierarchical view like XML License CC-BY-SA 4.0 (International).
  • 4. Linked Data Scenarios  RDF is used because of  schema flexibility  global identifiers  Inference, if present, is usually trivial  Subclass  Sub-property License CC-BY-SA 4.0 (International).
  • 5. Where Triples Come From  Relational extracts or web content is converted to and stored as triples  NLP extraction  New applications with RDF as primary data model  Doing SPARQL against data in RDBs is possible but is rare and does not deliver the flexibility License CC-BY-SA 4.0 (International).
  • 6. Linked Data Verticals and Patterns  Publishing: tagging & annotations, evolving vocabularies  Archives: self description, long term identifiers, many versions of schema  Semantic search: structured, semi-structured, and full text, all in one  Business intelligence: many sources, ease of adding sources, no 6 month DW schema change cycle  E-science, often in life sciences: common interchange format, nano-publications, NLP extracts, different users cook their data differently, provenance License CC-BY-SA 4.0 (International).
  • 7. The Hopes and Perceptions  The age of ad hoc  Find insight in any data, when you need it, from any source, any format  No data warehouse planning cycles; make your own from the pieces you need, when you need it  Still, data integration remains hard work; quality and coverage of sources vary  Flexibility may be there, but is performance and scalability on the level? License CC-BY-SA 4.0 (International).
  • 8. Yes, But ...  Web and Big Data: Everybody reinvents the triple. Self-description, long term identifiers, key-value pairs in many non-RDF use cases  SPARQL and RDF would be the natural, standards-compliant choice if did beat SQL, information retrieval, custom big data, key value, map reduce solutions Is this intrinsic to linked data or is this lack of engineering?  Linked data has unique advantages in breadth of coverage and expressivity but performance must not lag behind. License CC-BY-SA 4.0 (International).
  • 9. What is the RDF Tax?  90% of bad performance comes from non-optimal query plans  Some comes from indexing too much (e.g., SQL bulk load with no indices is 50x faster than the equivalent in RDF with all indexed)  Some comes from string ops on URIs, literals  Some comes from having a join for every attribute. Vectoring and right plans help, though License CC-BY-SA 4.0 (International).
  • 10. The Bane of the Triple When data is stored as triples:  There is structure still but it is harder to exploit. Schema re-emerges as correlations  More joins make more possible query plans, bigger errors in plan cost estimation  More joining reduces locality  Lack of schema causes needless indexing; data takes more space  A URI for everything takes space and time For the same workload, Virtuoso SQL can also be 2–20x faster than Virtuoso SPARQL License CC-BY-SA 4.0 (International).
  • 11. The Question is Raised  LOD2 FP7, now ending: RDF Performance parity with relational?  SQL is the senior science. Who ignores history is bound to repeat it  Integral mastery of RDB science is a prerequisite, but do not forget the subtle twists of schema-less-ness License CC-BY-SA 4.0 (International).
  • 12. Virtuoso RDF Relational DBMS Leadership  2000–2006, v1.x–4.x: SQL row store with SQL federation and XML  2007–2008, v5.x–6.x: SPARQL, adapted for RDF quads with more compression, bitmap indices, special data types, RDF awareness in query optimization  2009, v6.x: Scale-out cluster-capable  2010–2013, v7.x: Column store, vectored execution, 3x more space efficient, 10+x more speed  2013: Star Schema benchmark with SPARQL, 100x MySQL SQL, 0.8x MonetDB SQL  2014: Top of the line SQL analytics, 500 Gtriples, Structure Awareness License CC-BY-SA 4.0 (International).
  • 13. Triples Done Right, so?  Column-store techniques are a good fit; index-based triple storage does not get much better  RAM-only pointer-based techniques can be faster but cost 10–100x more to scale up  To take RDF to SQL parity, Virtuoso must first be on the level with the best in SQL  TPC-H is the checklist for mastery of DW and query optimization; who survives shall not fear  Parity is achieved when running with triples, just like with tables License CC-BY-SA 4.0 (International).
  • 14. Structure is Everywhere CWI in LOD2:  90% of triples in Common Crawl fall into 20 tables  All relational extractions are 100% tables  Even DBpedia is 90% covered by 500 tables, but is unusually heterogeneous, albeit not very large License CC-BY-SA 4.0 (International).
  • 15. The Glorious Dawn: Structure is the Servant, not the Tyrant  A set of subjects with all the same single-valued properties is in fact a table.  So, store it as a table  Allow exceptions, e.g., sometimes multiple values, different values in different graphs, extra properties, etc.  If it is big, it has repeating structure  All RDF semantics are preserved; any triple is possible, but the common ones are SQL compact and SQL fast  With tables, query optimization returns to SQL complexity and is much more reliable  So, more tricks from the SQL analytics bag become safe and applicable License CC-BY-SA 4.0 (International).
  • 16. Gains from Structure Awareness  3+x Load Speed  2x more space efficiency  SPARQL queries against regular data within 10–20% of SQL speeds  Just declare which properties tend to occur together; no strict schema-first like with SQL  Later, self configuration License CC-BY-SA 4.0 (International).
  • 17. The Cycle of Adventure  Rebels: SQL not cool, too rigid, drop ACID, go key-value, map-reduce, the triple is all there is, semantic web  Pioneers: Life on the frontier is hard, infrastructure missing or bad  Same everyday problems also in Utopia  Recognizing the objective values, e.g., schema freedom and identifiers, no AI. Do the job, forget dogma  Reconciliation: schema-first and schema-last converge in structure awareness License CC-BY-SA 4.0 (International).
  • 18. Present FP7 Research  LDBC — Transparency and Relevance for Graph DB, RDF performance  GeoKnow — GeoData is everywhere, how to carry the planet in your pocket  LOD2 — Where no triple has gone before (and come back)  Open PHACTs — A Data Platform for Drug Discovery License CC-BY-SA 4.0 (International).
  • 19. LDBC - Linked Data Benchmark Council  Rebels: SQL not cool, too rigid, drop ACID, go key-value, map-reduce, the triple is all there is, semantic web  Pioneers: Life on the frontier is hard, infrastructure missing or bad  Same everyday problems also in Utopia  Recognizing the objective values, e.g., schema freedom and identifiers, no AI. Do the job, forget dogma  Reconciliation: Some of the rebel thinking becomes mainstream, e.g., schema-first and schema-last converge in structure awareness License CC-BY-SA 4.0 (International).
  • 20. LDBC, Independent Industry Forum for Benchmarking  The TPC for the frontiers of database  Bootstrapped in the LDBC FP7, continues as independent industry association  OpenLink, Ontotext, Neo Technologies, Sparsity as founding members  IBM, Oracle Labs, Systap, SPARQL City already joined  DB superstars Peter Boncz and Thomas Neumann as founders and scientific lead License CC-BY-SA 4.0 (International).
  • 21. LDBC Benchmarks Social Network  Online — Lookups, updates, analysis of social environment  Business Intelligence — Spotting trends, key players, big query  Graph analytics — Community detection, Page rank, graph metrics Semantic Publishing  Modeled after the BBC linked data portal, online lookups, drill downs and updates License CC-BY-SA 4.0 (International).
  • 22. GeoKnow - The Planet in your Pocket Ms. Globe and Mr. Cube have a thing going on:  Mr. Cube: Desiloization ... integrated metadata ... Explicit semantics .  Ms. Globe: I can feel it ... but are you man enough? ... you need to show me. License CC-BY-SA 4.0 (International).
  • 23. Planet Scale Roadmap Jan 2014:  Virtuoso SPARQL outperforms PostGIS in map lookups with planet-wide Open Street Map  Virtuoso SQL adds 5x more power License CC-BY-SA 4.0 (International).
  • 24. Next: Jan 2015  Parity between SPARQL and SQL via structure awareness  Geospatial data clustering  Graph analytics close to the data — Pregel, Giraph, etc., in the DB itself  Adding fine-grained geo dimension to LDBC social network benchmark License CC-BY-SA 4.0 (International).
  • 25. The LOD2 scaling adventures Experiments at CWI’s Scilens cluster  Jan 2013: 150 Gtriples (8 x 256GB RAM)  Aug 2014: 500 Gtriples (12 x 256GB RAM)  Some trillion-triple claims exist, but do not detail any query workload BSBM explore and BI workloads  10x speed gains for BI queries between 2013 and 2014 Bulk load at 6M triples/s  All done in triples, structure awareness will go further still License CC-BY-SA 4.0 (International).
  • 26. Open PHACTs Partners: License CC-BY-SA 4.0 (International).
  • 27. Virtuoso Now Snapshot of RDF Linked Data customers in the Enterprise:  Data.Gov (U.S. Govt. Open Linked Data initiative)  Bank of America  Booz Allen Hamilton  Northrop Grumman  Elsevier  French National Library  Samsung  Globo  Daimler Benz  Johnson & Johnson  Bayer  St Jude's Medical  Fuijitsu  Syngenta  and many more License CC-BY-SA 4.0 (International).
  • 28. Virtuoso Availability  Most capabilities as open source  Commercial adds  Cluster scale-out  SQL Federation  Replication (SQL & RDF)  Advanced RDF security; ABAC & RBAC (ACLs)  Wide tables  and more  Up to the minute tech previews via v7fasttrack on github, e.g., superfast TPC-H implementation License CC-BY-SA 4.0 (International).
  • 29. Virtuoso Future  Preview of structure-aware RDF store in fall 2014 via v7fasttrack Integrated graph analytics framework  Embed complex graph algorithms, e.g., community detection, shortest path inside SPARQL/SQL  Comparison of SQL and SPARQL for big data analytics License CC-BY-SA 4.0 (International).
  • 30. Linked Data Now  Adoption across major industries  Superior flexibility and time to solution  Dramatic performance gains in the last 5 years  Benchmarking will continue to drive progress, to the benefit of users and vendors alike  Run circles around most open source SQL in SPARQL: Virtuoso SPARQL beats MySQL in SSB by 100x  With structure awareness, SPARQL to match the best in SQL for data warehousing, OLTP  Linked Data no longer a long shot but a technology that makes sense License CC-BY-SA 4.0 (International).
  • 31. About OpenLink Software OpenLink Software is a privately-held company founded in 1992 by its President & CEO, Kingsley Idehen. The company is an industry acclaimed technology innovator in the following areas:  ODBC, JDBC, ADO.NET, and OLE DB compliant Data Access Drivers for Oracle, Microsoft SQL Server, Informix, Ingres, Sybase, Progress, MySQL, and PostgreSQL  High-Performance & Scalable Multi- License CC-BY-SA 4.0 (International). Model (Relational & Graph) Database Technology  Data Integration Middleware (Data Virtualization Technology across a wide variety of Protocols & Formats)  Socially-enhanced Distributed Collaborative Applications Platforms (Weblogs, Wikis, Feed Aggregation and Syndication, Web File Systems, Discussion Forums, etc.)  Web Application Server Technology  Linked Data Deployment & Management  Identity Management
  • 32. Office Locations USA OpenLink Software, Inc 10 Burlington Mall Road Suite 265 Burlington, MA 01803 Tel.: +1 781 273 0900 Fax: +1 781 229 8030 UK OpenLink Software Ltd. Airport House Purley Way Croydon, Surrey CR0 0XZ Tel.: +44 (0)20 8681 7701 Fax: +44 (0)20 8681 7702 License CC-BY-SA 4.0 (International).
  • 33. Additional Information Web Sites OpenLink Software YouID – Digital Identity Card (Certificate) Generator OpenLink Data Spaces – Semantically enhanced Personal & Enterprise Data Spaces & Collaboration Platform OpenLink Virtuoso - Hybrid Data Management, Integration, Application, and Identity Server Universal Data Access Drivers - High-Performance ODBC, JDBC, ADO.NET, and OLE-DB Drivers LDAP and NetID-TLS – How to use LDAP scheme URIs with NetID-TLS Authentication Social Media Data spaces http://www.openlinksw.com/weblog/oerling/ (Orri Erling weblog) http://kidehen.blogspot.com (Kingsley Idehen weblog) http://www.openlinksw.com/blog/~kidehen/ (Kingsley Idehen weblog) https://twitter.com/OpenLink (Twitter) Hashtags: #LinkedData #SemanticWeb #BigData #RDF (Anywhere). License CC-BY-SA 4.0 (International).