Your SlideShare is downloading. ×
Stardog 1.1: Easier, Smarter, Faster RDF Database
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Stardog 1.1: Easier, Smarter, Faster RDF Database


Published on

A talk from Semtech NYC 2012 about Stardog 1.1, the forthcoming release that adds SPARQL 1.1 and user-defined rules.

A talk from Semtech NYC 2012 about Stardog 1.1, the forthcoming release that adds SPARQL 1.1 and user-defined rules.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Stardog 1.1 An Easier, Smarter,Faster RDF Database Michael Grove, Clark & Parsia LLC @mikegrovesoft, @stardog_db, @candp 1
  • 2. About C&P• We build semantic technology tools for enterprise solutions• Proud bootstrappers since 2005• Offices in DC and Cambridge, MA• Government & enterprise customers 2
  • 3. What is Stardog?• a pure Java RDF database• full-service, feature rich• focus on query performance• standards compliant• scalable (up first, out next) 3
  • 4. History• Development started summer 2010• Stardog 0.5 alpha - 2 May 2011• Stardog 1.0 final - 19 June 2012 • Total of 32 releases, ~500 tickets, 100s of email on the mailing list• Stardog 1.0.7 presently• Stardog 1.1 real soon now... 4
  • 5. stardog.comEasier. 5
  • 6. What is easy?• What’s “easy” in an RDF database? • Configuration • Maintenance • User Experience • i.e., rationally predictable• Easier for whom? Not a simple question. 6
  • 7. Configuration• Convention, not configuration• “Quick Start” is shortest page in the docs• 4 steps to querying• Predictable, sane defaults throughout• Adapted to Java, Unix, Semtech cultures • Culture is key to convention• Very good (!) documentation 7
  • 8. Maintenance• Nothing is easier than doing nothing • RDF & OWL are ideally schema flexible • Job scheduler: search, indexes, etc. • Data migration tools since < 1.0 • Multi-tenancy, online & offline DBs • Just add data...Automatic data quality*• NoSQL == Anti-jobs program for DBAs 8
  • 9. Except that...• Every DB has to be admin’d & maintained• Matter of degree, not kind• Stardog Enterprise Server Management • audit logging • JMX monitoring • web console • online backups (coming soon!) 9
  • 10. User Experience• Client-server & Embeddable• Jena, Sesame, SNARL, HTTP• SPARQL query simplifications• ACID transactions• Idiomatic Java & Unix interfaces • Great CLI & shell… • Windows has gotten much better! :>• Rich security model 10
  • 11. stardog.comSmarter. 11
  • 12. Okay...that’s BS.• “Smarter” is market speak• But Stardog 1.1 has rich feature set • Reasoning, including UDR • Integrity Constraint Validation (ICV) • Semantic Search • Security • Spring • Linked Data Platform 12
  • 13. Reasoning• OWL 2 DL, QL, EL, and RL• Query-time, no materialization• Only pay for what you eat• Embarrassingly parallel in part• Pellet 3 embedded for OWL 2 DL schema reasoning only• Very flexible re: NGs & schemas 13
  • 14. User-defined Rules• New in 1.1!• Using SWRL syntax • Including all SWRL builtins • Which are also available to SPARQL• Recently added new individual builtin • Create new individuals in your rules • Beware of non-termination!• Executed at query time like everything else 14
  • 15. ICV?• Integrity Constraint Validation• Automated data quality• Closed world semantics• Transactional• High-level & declarative• ICs can be OWL, SWRL, or SPARQL 15
  • 16. Example...Only employees who are US citizens canwork on projects that receive funding from aUS government agency.Class: Project and (receivesFundsFrom some USGovAgency)SubClassOf: inverse(worksOn) only (Employee and nationality value "US") More examples: 16
  • 17. Semantic Search• Uses Waldo, our deep adaptation of Lucene• Text index from RDF literals• Search for resources or literals• Integrated with SPARQL query evaluation• Auto-managed search indexes 17
  • 18. Security• Rich security model• Based on standard RBAC model• Applies at database-level• Will extend to Named Graphs in 1.x• Easy CLI admin tools (& Java API) 18
  • 19. Spring• Love it or not, Spring isn’t going away• Support Batch, Data Import, etc.• Open Source: clark-parsia/spring-stardog• Developed by an early adopter who needed it; supported/maintained by C&P 19
  • 20. Linked Data• Stardog fills a hole in our Linked Data Platform• HTML5, pure JS, client side web framework (based on backbone.js)• Linked Data publishing suite• Stardog Linked Data Catalog...Enterprise Linked Data management app 20
  • 21. stardog.comFaster. 21
  • 22. Finally...• Now we can talk about something that’s objective, context-free, and measurable• Yes!• But no…#include <std_disclaim.h> • Your data & your queries are the only things that really matter 22
  • 23. That said...• Two de facto benchmarks for SPARQL: • BSBM, OLTP-style, query mixes per hour (QMpH · 25) • SP2B, OLAP-style (torture test), set of queries within a timeout, T, at a data size D 23
  • 24. SP2B• Stardog completes SP2B at 5M, 10M, and 25M (except q5a)• No other RDF database completes > 5M. (As of the most recent report. Things change.)• Considerable performance differential• Pushing this out to 100M+ in 1.x 24
  • 25. BSBM• A throughput test, primarily. Not necessarily simple queries• On modest machine, 255 clients, 10M triples, we sustain 7m queries per hour (277k QMpH)• At 100M, 255 clients, sustain 3m queries per hour (125k QMpH)• Among the top 2 or 3 RDF DBs for BSBM performance• We will tackle BSBM BI next... 25
  • 26. Data Loading• Two indexing modes• Triples only indexing • Faster loading, slower NG query • Up to 250,000 triples per second• Quads indexing • Slower loading, faster NG query • Up to 150,000 triples per second• More improvements coming in the future • Customized RDF parser • Will look at user-defined index subsets 26
  • 27. What’s new in 1.1• Aforementioned user defined rules• But most notably, SPARQL 1.1 • Our most requested feature in a survey• Oh, we also made it faster 27
  • 28. SPARQL 1.1• Latest revision of the SPARQL query language• Put off implementing until spec finalized • It’s still in flux, but we decided to go for it• Adds useful new features to SPARQL • Aggregates, grouping, sub-query, negation • Oh, and the entailment regimes 28
  • 29. SPARQL 1.1• Rewrite of query planner & engine for 1.0.5 • Changes needed to support SPARQL 1.1 • Tested by users for the past 3 releases• With great power comes great responsibility... • New features are not without cost • Query planning & optimization more crucial than ever • Majority of development time 29
  • 30. Roadmap1. Transitivity & 6. “Stardocs”: doc/blob equality storage & NLP analytics2. GeoSPARQL 7. Graph Traversals,3. Web Console Algorithms & query langs4. Statement identifiers 8. Statistical inference5. Stored procedures & & machine learning database triggers 9. Stardog 2.0: Distributed Cluster Super Cloud Thingie! 30
  • 31. Summary Easier.Smarter. Faster.Pick all three! 31
  • 32. stardog.comThanks! 32
  • 33. stardog.comLicensing 33
  • 34. Feature Rich• Support for RDFS, OWL2 profiles (EL, RL, QL) & OWL2 DL via schema only queries• Semantic Search• ICV• Transactions• Rich security model• Support for major APIs • Jena & Sesame, and our own SNARL • SPARQL HTTP protocol, Graph Store protocol • Also includes a CLI & Shell environment 34