• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
LOD2 Webinar Series: Virtuoso 7
 

LOD2 Webinar Series: Virtuoso 7

on

  • 3,044 views

This webinar in the course of the LOD2 webinar series will present Virtuoso 7. Virtuoso Column Store, Adaptive Techniques for RDF Graph Databases. In this webinar we shall discuss the application of ...

This webinar in the course of the LOD2 webinar series will present Virtuoso 7. Virtuoso Column Store, Adaptive Techniques for RDF Graph Databases. In this webinar we shall discuss the application of column store techniques to both graph (RDF) and relational data for mixed work-loads ranging from lookup to analytics.

Virtuoso is an innovative enterprise grade multi-model data server for agile enterprises & individuals. It delivers an unrivaled platform agnostic solution for data management, access, and integration. The unique hybrid server architecture of Virtuoso enables it to offer traditionally distinct server functionality within a single product

If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!

http://lod2.eu/BlogPost/webinar-series

Statistics

Views

Total Views
3,044
Views on SlideShare
3,042
Embed Views
2

Actions

Likes
1
Downloads
21
Comments
0

1 Embed 2

http://stataccess.blogspot.se 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NoDerivs LicenseCC Attribution-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    LOD2 Webinar Series: Virtuoso 7 LOD2 Webinar Series: Virtuoso 7 Presentation Transcript

    • LOD2 Webinar . 29.11.2011 . Page 1 http://lod2.euCreating Knowledge out of Interlinked Data
    • http://lod2.euLOD2 is a large-scale integrating project co-funded by the European Commissionwithin the FP7 Information and Communication Technologies Work Programme.This 4-year project comprises leading Linked Open Data technology researchers,companies, and service providers. Coming from across 12 countries the partnersare coordinated by the Agile Knowledge Engineering and Semantic Web ResearchGroup at the University of Leipzig, Germany.LOD2 will integrate and syndicate Linked Data with existing large-scaleapplications. The project shows the benefits in the scenarios of Media andPublishing, Corporate Data intranets and eGovernment.
    • http://lod2.euOnce  per  month  the  LOD2  webinar  series  offer  a  free  webinar  about  tools  and  services  along  the  Linked  Open  Data  Life  Cycle.    Stay  with  us  and  learn  more  about  acquisiAon,  ediAng,  composing,  connected  applicaAons  –  and  finally  publishing  Linked  Open  Data.  
    • © 2012 OpenLink Software, All rights reserved.Virtuoso 7.0Enabling Massively Scalable Big Data Analyticsfor RDF & SQL Data ManagementBy Orri Erling, Virtuoso Program Manager& Hugh Williams, Professional Services ManagerMaking Technology Work ForYou
    • © 2012 OpenLink Software, All rights reserved.Company Overview
    • OpenLink Company Overviewn  OpenLink Software is a privately-held company founded in 1992 by its President &CEO, Kingsley Idehen. The company is an industry acclaimed technology innovatorin the following areas:§  ODBC, JDBC, ADO.NET, and OLE-DB compliant Data Access Drivers for Oracle,SQL Server, Informix, Ingres, Sybase, Progress, MySQL, and PostgreSQL§  High-Performance & Scalable Multi-Model (Relational & Graph) DatabaseTechnology§  Data Integration Middleware (Data Virtualization Technology across a wide variety ofProtocols & Formats)§  Web Application Server Technology§  Linked Data Deployment & Management§  Socially-enhanced Distributed Collaborative Applications Platforms (Weblogs, Wikis,Feed Aggregation and Syndication, Web File Systems, Discussion Forums, etc.)§  Identity Management.© 2012 OpenLink Software, All rights reserved.
    • Products & ServicesSoftware Products•  OpenLink Universal Data Access Drivers (UDA) -High-performance data access drivers for ODBC, JDBC, ADO.NET, and OLE DB that providetransparent access to enterprise databases.•  OpenLink Virtuoso - available in single server and cluster editions that are deployed in cloudand/or enterprise modes.•  OpenLink Data Spaces Platform and Applications•  OpenLink Ajax Toolkit•  OpenLink Data Explorer•  An Open Source Data Access SDK for ODBCAll OpenLink products are delivered by download from the Internet (http, ftp, etc.).Temporary licenses are issued upon download and may be extended as needed, on a case-by-case basis. Permanent licenses are issued once payment is received.© 2012 OpenLink Software, All rights reserved.
    • Products & ServicesProfessional and Support Services•  OpenLink Product Support provides front-line email and phone support,web-based online support, and a variety of premium services such as phone,emergency, and onsite support.•  Our Support staff is comprised of individuals with extensive knowledge of dataaccess, data migration, database administration, programming APIs, and otherrelevant skills.•  Services are sold in either Standard "Bronze" or Premium "Platinum" Supportpackages, with varying hours of availability, response times, etc.•  We also offer Custom Development, Training, and other Consultancy services.These services can be offered on- or off-site. Expenses for travel, accommodations,food, etc., associated with on-site services are charged separately.© 2012 OpenLink Software, All rights reserved.
    • CustomersOpenLinks installed base is in excess of 10,000 customers worldwide.Examples include:© 2012 OpenLink Software, All rights reserved.n  Data.Gov (U.S. Govt. OpenLinked Data initiative)n  Verizonn  Raytheonn  Bank of American  CGI Federaln  Elseviern  French National Libraryn  Globon  Scottish Governmentn  St Judes Medicaln  Barclays Bankn  Wells Fargon  and many more
    • Office LocationsUSAOpenLink Software, Inc10 Burlington Mall RoadSuite 265Burlington, MA 01803Tel.: +1 781 273 0900Fax: +1 781 229 8030© 2012 OpenLink Software, All rights reserved.UKOpenLink Software Ltd.Airport HousePurley WayCroydon, Surrey CR0 0XZTel.: +44 (0)20 8681 7701Fax: +44 (0)20 8681 7702
    • © 2012 OpenLink Software, All rights reserved.Virtuoso Universal ServerOverview
    • Situation Analysis© 2012 OpenLink Software, All rights reserved.Data is growing exponentiallyalong the following dimensions:n Volumen Velocityn VarietyAll of this happens while the totalhours in day remains 24 hrs.
    • Product Value Proposition© 2012 OpenLink Software, All rights reserved.Enterprise and Individual Agilityvia Data Access, Integration, andManagement, withoutcompromising performance,scalability, security, and platformindependence.Virtuoso locks you into an experience(openness, performance, and scale) notthe platform itself.-- Kingsley Idehen, Founder & CEO, OpenLinkSoftware
    • Product Architecture© 2012 OpenLink Software, All rights reserved.A high-performance, scalable,secure, and operating-system-independent server designedto handle contemporarychallenges associated withstandards compliant dataaccess, data integration, anddata management.
    • Data Virtualization Middleware© 2012 OpenLink Software, All rights reserved.An in-built middleware layer(“Sponger”) for creatingTransient & PersistentViews over HeterogeneousData Sources.
    • Sophisticated Content Crawler© 2012 OpenLink Software, All rights reserved.DBMS hosted ContentCrawler that’s leveragesloosely coupled binding tothe Sponger Middlewarecomponent fortransformation ofunstructured and semi-structured data into LinkedData.
    • Core Platform behind LOD Cloud© 2010 OpenLink Software, All rights reserved.Core Platform (Graph DBMS and Linked Data Deployment) behind DBpedia, manybubbles in the LOD Cloud, and the LOD Cloud cache itself.
    • Virtuoso Linked Data projects•  DBpedia - public SPARQL endpoint over the DBpedia data(and international Chapters)•  LOD Cloud Cache - public server hosting LOD cloud datasets•  URIBurner - Linked Data generation & transformation service•  Linked Geo Data - OpenStreetMap Spatial data as Linked Data•  Sindice - SPARQL endpoint behind its Semantic Web Index•  Data.gov - US Government Linked Data•  Health.data.gov - Clinical Quality Linked Data on health.data.gov•  Seevl - Linked Data music discovery service•  Bio2RDF - Life science data mapped to Linked Data•  Neurocommons - Life science data mapped to Linked Data•  Musicbrainz - MusicBrainz database published as Linked Data•  Open PHACTS - DBpedia-like Linked Data Space for Pharma•  Others - Many others …© 2012 OpenLink Software, All rights reserved.
    • Powerful Standards Support© 2012 OpenLink Software, All rights reserved.ODBC compliance enables use of client applications (e.g. Microsoft Access) as front-ends for Virtuoso, 3rd party RDBMS engines, and the World Wide Web hosted LinkedOpen Data Cloud.
    • Powerful Standards Support Cont’d© 2012 OpenLink Software, All rights reserved.ODBC & HTML5 compliance enables development of rich client apps. thatleverage the WebDB-ODBC bridge for accessing data across: Virtuoso, 3rd partyRDBMS engines, and the World Wide Web hosted Linked Open Data Cloud.
    • Insight Discovery & Exploration© 2012 OpenLink Software, All rights reserved.Native Faceted Browsing that enables multi-dimensional drill-downs via any browser
    • Insight Discovery & Exploration© 2012 OpenLink Software, All rights reserved.Microsoft Silverlight or HTML5 based PivotViewer Front-End for SPARQL and SPARQL-FEDQueries
    • Powerful SPARQL Query Service© 2012 OpenLink Software, All rights reserved.Basic SPARQL Endpoint for Creating Query Definitions & Sharing Query Results.Example: health.data.gov data directly from a Web Browser.
    • Powerful SPARQL Query Builder© 2012 OpenLink Software, All rights reserved.Use Query By Example (QBE) Patterns to Construct & Share QueryResults.
    • How Do I Get Going?n  Download, install, and experience the power of coherentintegration of disparate data sources, data access protocols,and data representation formats.n  In an nutshell, commence exploitation of powerful businessintelligence, socially enhanced collaboration, data virtualization,and entity analytics without writing a line of code!n  Turn "Big Data" into exploitable "Smart Data"without compromise!n  Will be integrated into the next release of the LOD2 Stack© 2012 OpenLink Software, All rights reserved.
    • © 2012 OpenLink Software, All rights reserved.Virtuoso 7.0
    • 27 © 2012 OpenLink Software, All rights reserved.Flexible Big Data Challengen  Data Agility is challenged by Volume, Velocity,and Varietyn  “Schema Last” is great - if the price is rightn  RDF, graphs promise powerful querying with theflexibility and scale of NoSQL key-value storesn  Inference may be good for integration, if canexpress the right things, beyond OWLn  RDF data management technology must learnfrom the lessons of SQL RDBMS, everythingapplies
    • 28 © 2012 OpenLink Software, All rights reserved.Virtuoso 7.0 Mission StatementDestruction of the following items as impediments toBig (Open) Linked Data exploitation:n Performancen Scalabilityn Platform Independencen Security & Privacyn Price
    • 29 © 2012 OpenLink Software, All rights reserved.Virtuoso 7.0 & Big Data MythsMyths put to rest:n Scalable Open Ended SPARQL Endpointsn Scalable Open Ended Read-Write SPARQLEndpointsn Fine-grained Access Controls underlying Read-Only or Read-Write endpoints.
    • 30 © 2012 OpenLink Software, All rights reserved.Virtuoso Column Store Featuresn  Supports SQL and SPARQL query languagesn  Compact column-wise storagen  Vectored execution of commandsn  Shared nothing scale out for clustersn  Powerful procedure language with parallel,distributed control structuresn  Full-text and geospatial indexes
    • 31 © 2012 OpenLink Software, All rights reserved.Storage Enginen  Freely mix column-, and row-wise indicesn  All SQL and RDF data types natively supported , singleexecution engine for SQL/SPARQLn  Column compression 3x more space efficient than row-wise compression for RDFn  Column stores are not only for big scans, random accesssurpasses rows as as soon as there is some localityn  9 B/quad with DBpedia, 7 B/quad with BSBM or RDF-H,14 B/quad with web crawls (PSOG, POSG, SP, OP, GS,excluding literals)
    • 32 © 2012 OpenLink Software, All rights reserved.Execution Enginen  Vectoring is not only for column storesn  Vectoring makes a random access into a linear mergejoin if there is any locality: Always a win, mileagedepends on run time factorsn  Vectoring eliminates interpretation overhead andmakes CPU friendly code possiblen  Even with run time data typing, vectoring allows use oftype-specific operators on homogenous data, e.g.arithmeticn  Dynamically adjust vector size: Larger vector may notfit in cache but will get better locality for random access
    • 33 © 2012 OpenLink Software, All rights reserved.Graph operationsn  Run time computation plus caching instead ofmaterializationn  SPARQL/SQL extension for arbitrary transitive subqueries:n  Flexible options for returning shortest paths, all paths, all /distinct reachable, attributes of steps on paths etc.n  Efficient execution, searching the graph from both ends iflooking for a path with ends givenn  Query operators for RDF hierarchy traversaln  Special query operator for OWL sameAs and IFP basedidentityn  Taking OWL sameAs / IFP identity into account forDISTINCT /GROUP BY
    • 34 © 2012 OpenLink Software, All rights reserved.Query Optimization Challengesn  Typical SQL stats do not helpn  Need to measure data cardinalities starting fromconstants in the queryn  Need to sample fanout predicate by predicate, asneededn  Predicate and class hierarchies are easy tohandle in samplingn  sameAs or IFP inference voids all guessesn  Is hash join worthwhile? High setup cost meansthat one must be sure of cardinalities first
    • 35 © 2012 OpenLink Software, All rights reserved.Deep Samplingn  Everything is a join -> sampling must also do joinsn  As the candidate plan grows, the cost modelexecutes all the ops on a sample of the datan  Actual cardinality and locality are known, also whensearch conditions are correlatedn  Having high confidence in the cost model, hash joinplans become safe and attractiven  Even though there is an indexed access path for all,a scan can be better because it produces results inorder. Need to be sure of selectivity before taking therisk
    • 36 © 2012 OpenLink Software, All rights reserved.Elastic Clustern  Data is partitioned by key, different indices mayhave different partition keysn  Partitions may split and migrate between serversn  Partitions may be kept in duplicate for faulttolerance/load balancingn  Actual access stats drive partition split andplacement
    • 37 © 2012 OpenLink Software, All rights reserved.Optimizing for Clustern  Vectored execution is natural in a cluster since single-tuplemessages are not an optionn  Keep max ops in flight at all times, always send long messagesn  Fully distributed query coordination:¡  Any node can service a client request. Correlated subqueries, storedprocedures may execute anywhere, arbitrary parallelism and recursionbetween partitions¡  On single shared memory box, cluster is approximately even with singleprocess multithreading, low overhead¡  1.8x more throughput in BSBM BI when going from 1 to 2 machines¡  Distributed stored procedures, send the proc to the data, as in map-reduce, except that there are no limits on cross partition calling/recursion¡  Choice of transactional and auto-commit update semantics, can haveatomic ops without global transaction
    • 38 © 2012 OpenLink Software, All rights reserved.Cluster Architecture Diagrams
    • 39 © 2012 OpenLink Software, All rights reserved.n  55 billion triples in LOD cache, only 384 GB ofRAM, 2TB diskn  2 x 384 GB of RAM, 4TB SSDn  Most of Linked Open Data and Web Crawlsn  http://lod.openlinksw.comn  http://lod.openlinksw.com/sparqlLOD Cache
    • 40 © 2012 OpenLink Software, All rights reserved.Independent Benchmark Report from CWI:Berlin SPARQL Benchmark#Triples Source FileSizeCompressed SourceFile SizeSourceData FilesPer LoaderNodeFinalDatabaseFile SizeLoad Time50 Billion 2.8 TB 240 GB 30 GB 1.8 TB 10h 54s150 Billion 8.5 TB 728 GB 91 GB 5.6 TB n/a
    • 41 © 2012 OpenLink Software, All rights reserved.Store Comparisons Summary:Exploration oriented queries (QMpH)Berlin SPARQL Benchmark100 MillionTriples200 MillionTriples1 Billion TriplesVirtuoso 6 37,678.319 32,969.006 8,984.789Virtuoso 7 47,178.820 27,933.682
    • 42 © 2012 OpenLink Software, All rights reserved.Store Comparisons Summary:Business Intelligence oriented queries (QMpH)Berlin SPARQL Benchmark10 Million Triples 100 MillionTriples1 Billion TriplesVirtuoso 6 431.465 35.342 2.383Virtuoso 7 996.795 75.236
    • 43 © 2012 OpenLink Software, All rights reserved.Store Comparisons Summary:Exploration oriented queries (Cluster Edition)(QMpH)Berlin SPARQL Benchmark10 Billion Triples 50 Billion Triples 150 BillionTriplesVirtuoso 7 2,360.210 4,253.157 2,090.574
    • 44 © 2012 OpenLink Software, All rights reserved.Store Comparisons Summary:Business Intelligence oriented queries (ClusterEdition) (QMpH)Berlin SPARQL Benchmark10 Billion Triples 50 Billion Triples 150 BillionTriplesVirtuoso 7 13.078 0.964 0.285
    • 45 © 2012 OpenLink Software, All rights reserved.Future Workn  Complete deep sampling: enhanced queryoptimization plansn  Run TPC-H and TPC-DS in SQL and their 1:1translation in SPARQL, demonstrating SPARQLperformance as near to SQL as possible
    • Additional Informationn  OpenLink Software¡  OpenLink Software - www.openlinksw.com¡  OpenLink Virtuoso - virtuoso.openlinksw.com¡  Universal Data Access - uda.openlinksw.comn  Social Media Data spaces¡  http://virtuoso.openlinksw.com/blog/ (weblog)¡  https://plus.google.com/112399767740508618350/posts (Google+)¡  https://twitter.com/OpenLink (Twitter)¡  http://www.linkedin.com/company/openlink-software(LinkedIn)¡  Hashtag: #LinkedData (Anywhere)© 2012 OpenLink Software, All rights reserved.
    • EU-FP7 LOD2 WP6 – 25.-26.03.2013. Page 47 http://lod2.euCreating Knowledge out of Interlinked DataLOD2 Stack Usability Survey 2013w.surveygizmo.com/s3/1188229/LOD2-Stack-Usability-Survey-2013