Kendall Clark, CEO                           Clark & Parsia, LLCThursday, March 17, 2011                         1
About C&P                    • We build semantic technology infrastructure                           and enterprise soluti...
Thursday, March 17, 2011   3
TLDR?                    • Java RDF database (“quad store”) (no                           native code)                    ...
NoSQL and SemWeb                    • Semweb is schemaless and schema-rich                     • As agile as NoSQL stores ...
Why another RDF DB?                    • We’re scratching our itch for fast query for                           integratio...
Commercial Market                    • 6 products                    • Technically homogenous:                     • Sagan...
Stardog1.0: Overview                    • Fast                    • Lightweight                    • Rich API support     ...
Fast? No, Really Fast!                    • First design goal in Stardog is performance                           of compl...
Performance                    • Do yr own testing; the only queries that                           matter are yours; don’...
Scalability                    • Stardog 1.0: scale up                     • Disk-based joins for very large              ...
Lightweight                    • ~34 KLOC for core system, ~10 KLOC of                           tests (1034 unit tests)  ...
Interfaces                    • SNARL (Stardog Native API for RDF                           Language)                    •...
Logical Inference                    1. OWL 2 QL, EL, and RL “query-time”                       reasoning                 ...
OWL validation of RDF                    • Use OWL ontologies to validate RDF                           instance data in S...
OWL 2 Support                    • Stardog 1.0: query-time, query rewriting                           reasoner for SPARQL ...
Statistical Inference                    • Corleone is a machine learning system for                           RDF and OWL...
Transactions                    • Supports optional ACID transactions on                           database mutations     ...
Search                    • Indexes RDF individuals and literals                    • Results are 2-tuples (url|value, sco...
RDF as Graph                    • SPARQL isn’t ideal for every use case                    • Graph algorithm processing on...
Implementations                            Sesame                         Jena                           Empire           ...
Status                    • Stardog 0.4.6 alpha release to alpha testers                           on 15 March 2011       ...
The Private Beta                    • Doin’ it old school: private beta, invitation                           only        ...
Roadmap                    • 1.0 in mid-Summer                    • SPARQL 1.1, MRMW                    • stored procedure...
Thanks! Questions?                      •    http://stardog.com/        •   http://twitter.com/                           ...
Upcoming SlideShare
Loading in...5
×

Stardog talk-dc-march-17

3,569

Published on

Stardog is a fast, scalable, lightweight RDF database for complex SPARQL queries. It features OWL 2 reasoning, transactions, a robust security layer, integrity constraint validation via Pellet 3, and world-class support.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,569
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Stardog talk-dc-march-17

  1. 1. Kendall Clark, CEO Clark & Parsia, LLCThursday, March 17, 2011 1
  2. 2. About C&P • We build semantic technology infrastructure and enterprise solutions • Pellet, the leading OWL reasoner • POPS Expertise Location system • Bootstrapped since 2005 • Offices in DC and Cambridge, MA • Government & enterprise customers • First talk ever was at LOC in 2005 :)Thursday, March 17, 2011 2
  3. 3. Thursday, March 17, 2011 3
  4. 4. TLDR? • Java RDF database (“quad store”) (no native code) • Freemium model: • enterprise & community editions • OEM • Performance for complex SPARQL queries • Best available reasoning supportThursday, March 17, 2011 4
  5. 5. NoSQL and SemWeb • Semweb is schemaless and schema-rich • As agile as NoSQL stores • More expressive than SQL • Standards based • Graph DBs are all ad hoc • Query Language and, you know, joins • Do you really want to write map-reduce programs...only?! We sure don’t...!Thursday, March 17, 2011 5
  6. 6. Why another RDF DB? • We’re scratching our itch for fast query for integration & decision support apps • aimed at db-reasoner “tweener” space • operationally agile • There’s a hole in the market; or: markets are normal distributions (probably) • Gives us a complete semantic application platformThursday, March 17, 2011 6
  7. 7. Commercial Market • 6 products • Technically homogenous: • Sagan-like scale obsession • Mostly ad hoc reasoning • Weak perf on complex queries • Ho-hum feature sets & integrations • See http://bit.ly/92P8eN for moreThursday, March 17, 2011 7
  8. 8. Stardog1.0: Overview • Fast • Lightweight • Rich API support • Logical & statistical inference • Transactions • Full-text search • Graph algorithms and path language • awesome mascot!Thursday, March 17, 2011 8
  9. 9. Fast? No, Really Fast! • First design goal in Stardog is performance of complex SPARQL query eval on single machine in the default configuration • Next, total total queries per second • In-memory mode available, when needed • Early testing is promising: fastest RDF DB on SP2B benchmark. Often several times faster.Thursday, March 17, 2011 9
  10. 10. Performance • Do yr own testing; the only queries that matter are yours; don’t trust, test. • It’s not ready till it’s very, very fast. • Flatten the RDF performance tax • About 256 GB for ~2B triples in main- memory mode, i.e., $20k Dell box. • When in doubt: Add. More. RAM.Thursday, March 17, 2011 10
  11. 11. Scalability • Stardog 1.0: scale up • Disk-based joins for very large intermediate structures • Triples compression • Ideally efficient on-disk indices • Stardog 2.0: scale out (shared-disk cluster) • We think it’s easier to scale a fast DB than to speed up a scalable one...Thursday, March 17, 2011 11
  12. 12. Lightweight • ~34 KLOC for core system, ~10 KLOC of tests (1034 unit tests) • Trivially simple installation: • copy JAR & restart servlet container • If you’ve ever used Sesame... • May run: embedded, client-server; main memory or disk-backed modes; any combination of theseThursday, March 17, 2011 12
  13. 13. Interfaces • SNARL (Stardog Native API for RDF Language) • Avro RPC—esp. the low-level TCP transport (coming soon...)—for Java & non- Java • Sesame & Jena • SPARQL Protocol (HTTP)Thursday, March 17, 2011 13
  14. 14. Logical Inference 1. OWL 2 QL, EL, and RL “query-time” reasoning • No materialization (so: fast bulk loading) • reasoning enabled per-query 2. OWL 2 DL reasoning via Pellet 3.0 • in-memory, schema reasoning 3. Integrity Constraint Validation via OWL2 4. user-defined & SWRL rulesThursday, March 17, 2011 14
  15. 15. OWL validation of RDF • Use OWL ontologies to validate RDF instance data in Stardog. • May be used as a guard to database modifications (so, if resulting data is invalid, transaction fails). • W3C Member Submission to formalize this approach; stay tuned for details. • See http://clarkparsia.com/pellet/icv/ for detailsThursday, March 17, 2011 15
  16. 16. OWL 2 Support • Stardog 1.0: query-time, query rewriting reasoner for SPARQL entailment regimes • It will support all of OWL 2 QL, EL, and RL, with exceptions: • limited support for datatypes reasoning • i.e., won’t support user-defined datatypes • will depend on customer demandThursday, March 17, 2011 16
  17. 17. Statistical Inference • Corleone is a machine learning system for RDF and OWL • Optimized for Stardog • Multiple classifier & cluster algorithms • Clusters (similarity) and classifies (predicts) by RDF class & individual • Machine learning must still be tuned; no magic bulletsThursday, March 17, 2011 17
  18. 18. Transactions • Supports optional ACID transactions on database mutations • 2-phase commit based on Java Transaction API • Tx’d writes 2x to 8x slower, depending on lots of variables • Writes may be asynchronous & queuedThursday, March 17, 2011 18
  19. 19. Search • Indexes RDF individuals and literals • Results are 2-tuples (url|value, score) • Based on Lucene: very fast, very scalable • Can use 1 of 6 algorithms to partition RDF individuals from a graph • via SPARQL DESCRIBE hook • Will be integrated with SPARQL syntax...Thursday, March 17, 2011 19
  20. 20. RDF as Graph • SPARQL isn’t ideal for every use case • Graph algorithm processing on RDF purely as a graph • Stardog supports Gremlin, the ad hoc standard for graph database query languages • Gremlin makes graph algorithms easy to write • More optimized Gremlin support for 1.0Thursday, March 17, 2011 20
  21. 21. Implementations Sesame Jena Empire Stardog API HTTP API Native API Avro API Stardog Core SPI Runtime Transactions Stardog RDF Query Exec Plan API Query Rewriting/ Optimizer Reasoning Plan Filter API Index API SPI CP Util IO Util Stardog Util Sesame ExtThursday, March 17, 2011 21
  22. 22. Status • Stardog 0.4.6 alpha release to alpha testers on 15 March 2011 • It feels damn good to ship code, even if it’s just an alpha! :) • Weekly updates till beta period starts, then bimonthly updates till 1.0 releaseThursday, March 17, 2011 22
  23. 23. The Private Beta • Doin’ it old school: private beta, invitation only • Helps us keep commercial focus • ~1 April to 30 May • kendall@clarkparsia.com if yr interested: give name, org, area of interest, etc. • Rolling releases, new features, bug fixes, etc • ~90 organizations signed up for beta so farThursday, March 17, 2011 23
  24. 24. Roadmap • 1.0 in mid-Summer • SPARQL 1.1, MRMW • stored procedures in any JVM lang • Shiro-based security layer • native OWL 2 RL reasoner • provenance API • graph algorithms & an RDF path language • performance improvements continuouslyThursday, March 17, 2011 24
  25. 25. Thanks! Questions? • http://stardog.com/ • http://twitter.com/ stardog_db • http://clarkparsia.com/ • http://twitter.com/candpThursday, March 17, 2011 25
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×