Stardog talk-dc-march-17

  • 3,398 views
Uploaded on

Stardog is a fast, scalable, lightweight RDF database for complex SPARQL queries. It features OWL 2 reasoning, transactions, a robust security layer, integrity constraint validation via Pellet 3, and …

Stardog is a fast, scalable, lightweight RDF database for complex SPARQL queries. It features OWL 2 reasoning, transactions, a robust security layer, integrity constraint validation via Pellet 3, and world-class support.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,398
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Kendall Clark, CEO Clark & Parsia, LLCThursday, March 17, 2011 1
  • 2. About C&P • We build semantic technology infrastructure and enterprise solutions • Pellet, the leading OWL reasoner • POPS Expertise Location system • Bootstrapped since 2005 • Offices in DC and Cambridge, MA • Government & enterprise customers • First talk ever was at LOC in 2005 :)Thursday, March 17, 2011 2
  • 3. Thursday, March 17, 2011 3
  • 4. TLDR? • Java RDF database (“quad store”) (no native code) • Freemium model: • enterprise & community editions • OEM • Performance for complex SPARQL queries • Best available reasoning supportThursday, March 17, 2011 4
  • 5. NoSQL and SemWeb • Semweb is schemaless and schema-rich • As agile as NoSQL stores • More expressive than SQL • Standards based • Graph DBs are all ad hoc • Query Language and, you know, joins • Do you really want to write map-reduce programs...only?! We sure don’t...!Thursday, March 17, 2011 5
  • 6. Why another RDF DB? • We’re scratching our itch for fast query for integration & decision support apps • aimed at db-reasoner “tweener” space • operationally agile • There’s a hole in the market; or: markets are normal distributions (probably) • Gives us a complete semantic application platformThursday, March 17, 2011 6
  • 7. Commercial Market • 6 products • Technically homogenous: • Sagan-like scale obsession • Mostly ad hoc reasoning • Weak perf on complex queries • Ho-hum feature sets & integrations • See http://bit.ly/92P8eN for moreThursday, March 17, 2011 7
  • 8. Stardog1.0: Overview • Fast • Lightweight • Rich API support • Logical & statistical inference • Transactions • Full-text search • Graph algorithms and path language • awesome mascot!Thursday, March 17, 2011 8
  • 9. Fast? No, Really Fast! • First design goal in Stardog is performance of complex SPARQL query eval on single machine in the default configuration • Next, total total queries per second • In-memory mode available, when needed • Early testing is promising: fastest RDF DB on SP2B benchmark. Often several times faster.Thursday, March 17, 2011 9
  • 10. Performance • Do yr own testing; the only queries that matter are yours; don’t trust, test. • It’s not ready till it’s very, very fast. • Flatten the RDF performance tax • About 256 GB for ~2B triples in main- memory mode, i.e., $20k Dell box. • When in doubt: Add. More. RAM.Thursday, March 17, 2011 10
  • 11. Scalability • Stardog 1.0: scale up • Disk-based joins for very large intermediate structures • Triples compression • Ideally efficient on-disk indices • Stardog 2.0: scale out (shared-disk cluster) • We think it’s easier to scale a fast DB than to speed up a scalable one...Thursday, March 17, 2011 11
  • 12. Lightweight • ~34 KLOC for core system, ~10 KLOC of tests (1034 unit tests) • Trivially simple installation: • copy JAR & restart servlet container • If you’ve ever used Sesame... • May run: embedded, client-server; main memory or disk-backed modes; any combination of theseThursday, March 17, 2011 12
  • 13. Interfaces • SNARL (Stardog Native API for RDF Language) • Avro RPC—esp. the low-level TCP transport (coming soon...)—for Java & non- Java • Sesame & Jena • SPARQL Protocol (HTTP)Thursday, March 17, 2011 13
  • 14. Logical Inference 1. OWL 2 QL, EL, and RL “query-time” reasoning • No materialization (so: fast bulk loading) • reasoning enabled per-query 2. OWL 2 DL reasoning via Pellet 3.0 • in-memory, schema reasoning 3. Integrity Constraint Validation via OWL2 4. user-defined & SWRL rulesThursday, March 17, 2011 14
  • 15. OWL validation of RDF • Use OWL ontologies to validate RDF instance data in Stardog. • May be used as a guard to database modifications (so, if resulting data is invalid, transaction fails). • W3C Member Submission to formalize this approach; stay tuned for details. • See http://clarkparsia.com/pellet/icv/ for detailsThursday, March 17, 2011 15
  • 16. OWL 2 Support • Stardog 1.0: query-time, query rewriting reasoner for SPARQL entailment regimes • It will support all of OWL 2 QL, EL, and RL, with exceptions: • limited support for datatypes reasoning • i.e., won’t support user-defined datatypes • will depend on customer demandThursday, March 17, 2011 16
  • 17. Statistical Inference • Corleone is a machine learning system for RDF and OWL • Optimized for Stardog • Multiple classifier & cluster algorithms • Clusters (similarity) and classifies (predicts) by RDF class & individual • Machine learning must still be tuned; no magic bulletsThursday, March 17, 2011 17
  • 18. Transactions • Supports optional ACID transactions on database mutations • 2-phase commit based on Java Transaction API • Tx’d writes 2x to 8x slower, depending on lots of variables • Writes may be asynchronous & queuedThursday, March 17, 2011 18
  • 19. Search • Indexes RDF individuals and literals • Results are 2-tuples (url|value, score) • Based on Lucene: very fast, very scalable • Can use 1 of 6 algorithms to partition RDF individuals from a graph • via SPARQL DESCRIBE hook • Will be integrated with SPARQL syntax...Thursday, March 17, 2011 19
  • 20. RDF as Graph • SPARQL isn’t ideal for every use case • Graph algorithm processing on RDF purely as a graph • Stardog supports Gremlin, the ad hoc standard for graph database query languages • Gremlin makes graph algorithms easy to write • More optimized Gremlin support for 1.0Thursday, March 17, 2011 20
  • 21. Implementations Sesame Jena Empire Stardog API HTTP API Native API Avro API Stardog Core SPI Runtime Transactions Stardog RDF Query Exec Plan API Query Rewriting/ Optimizer Reasoning Plan Filter API Index API SPI CP Util IO Util Stardog Util Sesame ExtThursday, March 17, 2011 21
  • 22. Status • Stardog 0.4.6 alpha release to alpha testers on 15 March 2011 • It feels damn good to ship code, even if it’s just an alpha! :) • Weekly updates till beta period starts, then bimonthly updates till 1.0 releaseThursday, March 17, 2011 22
  • 23. The Private Beta • Doin’ it old school: private beta, invitation only • Helps us keep commercial focus • ~1 April to 30 May • kendall@clarkparsia.com if yr interested: give name, org, area of interest, etc. • Rolling releases, new features, bug fixes, etc • ~90 organizations signed up for beta so farThursday, March 17, 2011 23
  • 24. Roadmap • 1.0 in mid-Summer • SPARQL 1.1, MRMW • stored procedures in any JVM lang • Shiro-based security layer • native OWL 2 RL reasoner • provenance API • graph algorithms & an RDF path language • performance improvements continuouslyThursday, March 17, 2011 24
  • 25. Thanks! Questions? • http://stardog.com/ • http://twitter.com/ stardog_db • http://clarkparsia.com/ • http://twitter.com/candpThursday, March 17, 2011 25