• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Stardog talk-dc-march-17
 

Stardog talk-dc-march-17

on

  • 3,112 views

Stardog is a fast, scalable, lightweight RDF database for complex SPARQL queries. It features OWL 2 reasoning, transactions, a robust security layer, integrity constraint validation via Pellet 3, and ...

Stardog is a fast, scalable, lightweight RDF database for complex SPARQL queries. It features OWL 2 reasoning, transactions, a robust security layer, integrity constraint validation via Pellet 3, and world-class support.

Statistics

Views

Total Views
3,112
Views on SlideShare
3,112
Embed Views
0

Actions

Likes
1
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Stardog talk-dc-march-17 Stardog talk-dc-march-17 Presentation Transcript

    • Kendall Clark, CEO Clark & Parsia, LLCThursday, March 17, 2011 1
    • About C&P • We build semantic technology infrastructure and enterprise solutions • Pellet, the leading OWL reasoner • POPS Expertise Location system • Bootstrapped since 2005 • Offices in DC and Cambridge, MA • Government & enterprise customers • First talk ever was at LOC in 2005 :)Thursday, March 17, 2011 2
    • Thursday, March 17, 2011 3
    • TLDR? • Java RDF database (“quad store”) (no native code) • Freemium model: • enterprise & community editions • OEM • Performance for complex SPARQL queries • Best available reasoning supportThursday, March 17, 2011 4
    • NoSQL and SemWeb • Semweb is schemaless and schema-rich • As agile as NoSQL stores • More expressive than SQL • Standards based • Graph DBs are all ad hoc • Query Language and, you know, joins • Do you really want to write map-reduce programs...only?! We sure don’t...!Thursday, March 17, 2011 5
    • Why another RDF DB? • We’re scratching our itch for fast query for integration & decision support apps • aimed at db-reasoner “tweener” space • operationally agile • There’s a hole in the market; or: markets are normal distributions (probably) • Gives us a complete semantic application platformThursday, March 17, 2011 6
    • Commercial Market • 6 products • Technically homogenous: • Sagan-like scale obsession • Mostly ad hoc reasoning • Weak perf on complex queries • Ho-hum feature sets & integrations • See http://bit.ly/92P8eN for moreThursday, March 17, 2011 7
    • Stardog1.0: Overview • Fast • Lightweight • Rich API support • Logical & statistical inference • Transactions • Full-text search • Graph algorithms and path language • awesome mascot!Thursday, March 17, 2011 8
    • Fast? No, Really Fast! • First design goal in Stardog is performance of complex SPARQL query eval on single machine in the default configuration • Next, total total queries per second • In-memory mode available, when needed • Early testing is promising: fastest RDF DB on SP2B benchmark. Often several times faster.Thursday, March 17, 2011 9
    • Performance • Do yr own testing; the only queries that matter are yours; don’t trust, test. • It’s not ready till it’s very, very fast. • Flatten the RDF performance tax • About 256 GB for ~2B triples in main- memory mode, i.e., $20k Dell box. • When in doubt: Add. More. RAM.Thursday, March 17, 2011 10
    • Scalability • Stardog 1.0: scale up • Disk-based joins for very large intermediate structures • Triples compression • Ideally efficient on-disk indices • Stardog 2.0: scale out (shared-disk cluster) • We think it’s easier to scale a fast DB than to speed up a scalable one...Thursday, March 17, 2011 11
    • Lightweight • ~34 KLOC for core system, ~10 KLOC of tests (1034 unit tests) • Trivially simple installation: • copy JAR & restart servlet container • If you’ve ever used Sesame... • May run: embedded, client-server; main memory or disk-backed modes; any combination of theseThursday, March 17, 2011 12
    • Interfaces • SNARL (Stardog Native API for RDF Language) • Avro RPC—esp. the low-level TCP transport (coming soon...)—for Java & non- Java • Sesame & Jena • SPARQL Protocol (HTTP)Thursday, March 17, 2011 13
    • Logical Inference 1. OWL 2 QL, EL, and RL “query-time” reasoning • No materialization (so: fast bulk loading) • reasoning enabled per-query 2. OWL 2 DL reasoning via Pellet 3.0 • in-memory, schema reasoning 3. Integrity Constraint Validation via OWL2 4. user-defined & SWRL rulesThursday, March 17, 2011 14
    • OWL validation of RDF • Use OWL ontologies to validate RDF instance data in Stardog. • May be used as a guard to database modifications (so, if resulting data is invalid, transaction fails). • W3C Member Submission to formalize this approach; stay tuned for details. • See http://clarkparsia.com/pellet/icv/ for detailsThursday, March 17, 2011 15
    • OWL 2 Support • Stardog 1.0: query-time, query rewriting reasoner for SPARQL entailment regimes • It will support all of OWL 2 QL, EL, and RL, with exceptions: • limited support for datatypes reasoning • i.e., won’t support user-defined datatypes • will depend on customer demandThursday, March 17, 2011 16
    • Statistical Inference • Corleone is a machine learning system for RDF and OWL • Optimized for Stardog • Multiple classifier & cluster algorithms • Clusters (similarity) and classifies (predicts) by RDF class & individual • Machine learning must still be tuned; no magic bulletsThursday, March 17, 2011 17
    • Transactions • Supports optional ACID transactions on database mutations • 2-phase commit based on Java Transaction API • Tx’d writes 2x to 8x slower, depending on lots of variables • Writes may be asynchronous & queuedThursday, March 17, 2011 18
    • Search • Indexes RDF individuals and literals • Results are 2-tuples (url|value, score) • Based on Lucene: very fast, very scalable • Can use 1 of 6 algorithms to partition RDF individuals from a graph • via SPARQL DESCRIBE hook • Will be integrated with SPARQL syntax...Thursday, March 17, 2011 19
    • RDF as Graph • SPARQL isn’t ideal for every use case • Graph algorithm processing on RDF purely as a graph • Stardog supports Gremlin, the ad hoc standard for graph database query languages • Gremlin makes graph algorithms easy to write • More optimized Gremlin support for 1.0Thursday, March 17, 2011 20
    • Implementations Sesame Jena Empire Stardog API HTTP API Native API Avro API Stardog Core SPI Runtime Transactions Stardog RDF Query Exec Plan API Query Rewriting/ Optimizer Reasoning Plan Filter API Index API SPI CP Util IO Util Stardog Util Sesame ExtThursday, March 17, 2011 21
    • Status • Stardog 0.4.6 alpha release to alpha testers on 15 March 2011 • It feels damn good to ship code, even if it’s just an alpha! :) • Weekly updates till beta period starts, then bimonthly updates till 1.0 releaseThursday, March 17, 2011 22
    • The Private Beta • Doin’ it old school: private beta, invitation only • Helps us keep commercial focus • ~1 April to 30 May • kendall@clarkparsia.com if yr interested: give name, org, area of interest, etc. • Rolling releases, new features, bug fixes, etc • ~90 organizations signed up for beta so farThursday, March 17, 2011 23
    • Roadmap • 1.0 in mid-Summer • SPARQL 1.1, MRMW • stored procedures in any JVM lang • Shiro-based security layer • native OWL 2 RL reasoner • provenance API • graph algorithms & an RDF path language • performance improvements continuouslyThursday, March 17, 2011 24
    • Thanks! Questions? • http://stardog.com/ • http://twitter.com/ stardog_db • http://clarkparsia.com/ • http://twitter.com/candpThursday, March 17, 2011 25