MPTStore: A Fast, Scalable, and Stable Resource Index

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

5 comments

Comments 1 - 5 of 5 previous next Post a comment

  • + guest5428eb guest5428eb 2 years ago
    man why couldn’t you just post this as a text document with illustrations? powerpoint is not a first-line communication medium!
  • + cwilper cwilper 3 years ago
    This is another case where the slide looks ok in powerpoint, but impress and slideshare conversion seem to get confused. The white box should actually be a text box akin to 't1' above but with 'tmap' as the text. Hmmm.
  • + cwilper cwilper 3 years ago
    This slide didn’t convert properly. Opening the original in powerpoint, it looks ok (text not pushed together), but in impress, it looks like it does here.
  • + rashmi Rashmi Sinha 3 years ago
    On which slide? The first one? (Just post a comment at the particular slide that did not convert properly).

    Thanks for using Slideshare!

    rashmi
    the slideshare team
  • + cwilper cwilper 3 years ago
    Hmm.. looks like the tables didn’t convert properly. Some of the text is mashed together/obscured.
Post a comment
Embed Video
Edit your comment Cancel

Favorites, Groups & Events

MPTStore: A Fast, Scalable, and Stable Resource Index - Presentation Transcript

  1. MPTStore: A Fast, Scalable, and Stable Resource Index Aaron Birkland and Chris Wilper Open Repositories 2007 San Antonio, TX
  2. Background: RDF in Fedora
    • A natural fit:
    • Object-object relationships
    • Object properties
    • Exposure to services (as a graph)
    • Resource Index introduced:
    • Fedora 2.0 (January ‘05)
  3. Background: RDF in Fedora
    • Challenges
    • Scalability
      • Few triplestores designed for 100M+
    • Performance
      • Jena vs. Kowari (Jena: OOM)
      • Kowari vs. Sesame Native (Sesame: slow complex queries)
    • Stability
      • Frequent “rebuilds”
  4. Motivation: The NSDL Use Case
    • The NSDL has a moderately large repository
      • 4.7 million objects
      • 250 million triples
  5. Motivation: The NSDL Use Case
    • The NSDL has a moderately large repository
      • 4.7 million objects
      • 250 million triples
    • ..and has a large volume of writes
      • Driven by periodic OAI harvests
      • Primarily mixed ingests and datastream mods
      • Highly concurrent reads and writes
  6. Motivation: The NSDL Use Case
    • Additionally, NSDL has data model constraints that must be enforced
      • Existential/referential constraints on objects (e.g. “foreign key” constraints)
      • Uniqueness constraints on some object properties
  7. Motivation: The NSDL Use Case
    • These constraints primarily center around RELS-EXT content:
      • Relationships to other NSDL objects (forming a graph)
      • Literal value properties for a particular object itself
  8. <foxml:datastream ID=”RELS-EXT” ...> ... <example:id>PLUGH-XYZZY</example:id> <example:memberOf rdf:resource=”info:fedora/demo:73” /> </foxml: datastream > ... Must be globally unique <example:objectType>Resource</example:objectType> This object... 1) Must exist 2) Must be 'Active' 3) Must be objectType 'Aggregation'
  9. Motivation: The NSDL Use Case
    • No suitable constraint enforcement mechanisms exist in Fedora itself
    • Our approach:
      • Enforce content model in middleware
      • Serialize access where we have to
      • Query RI before ingest or modify
  10. The Challenge
    • Querying the RI to determine correct repository state proved to be the most difficult aspect.
      • To achieve acceptable performance with Kowari, triple writes are buffered and executed in large, infrequent chunks
      • Triples waiting in these buffers are invisible to outside queries
  11. The Challenge
    • Possible solution:
      • Flush the buffer after every write operation
    • New problem:
      • Flushed updates with Kowari are very expensive -- Multiple seconds per operation. This was incompatible with NSDL processing volume
    • This was a real showstopper...
  12. The Challenge
    • Other difficulties the NSDL had with Kowari:
      • RI corruption under concurrent use
      • RI corruption with abnormal shutdowns
      • Scalability. Performance became noticeably worse with increasing repository size
      • Steep memory requirements
  13. The Challenge
    • Searching for a solution..
      • Other triple stores (e.g. Jena, Sesame) were considered for Fedora in the past, rejected for various reasons
      • RDBMS seemed attractive – efficient transactions, very stable, generally speedy
      • “ One big table” paradigm did not seem to give us desired scalability in initial tests
  14. Our Solution
    • Mapped predicate tables
      • One table per predicate, containing indexed 'subject' and 'object' values
      • Mapping table containing metadata correlating predicate URI to a particular db table
  15. <info:fedora/demo:1> <info:fedora/demo:2> <info:fedora/demo:3> <info:fedora/demo:4> s o t1 <info:fedora/fedora-def:model#disseminates> <http://ns.example.org/rels#memberOf> 1 2 p pkey tmap Triples Predicate Mapping
  16. Our Solution
    • Benefits:
      • Low cost adds and deletes
      • Queries with known predicates are very fast
      • Complex queries benefit due to RDBMS planner having finer-grained statistics and query plans
      • Flexible data partitioning
  17. Our Solution
    • Disadvantages:
      • Need to manage predicate to table mapping
      • Complex queries require more effort to formulate
      • With a naïve approach, simple unbound queries scale linearly with the number of predicates
  18. Our Solution
    • Observations:
      • Total number of distinct predicates is much lower than predicates or objects. NSDL has ~ 50
      • Unbound predicate queries are less common
      • NSDL is heavily biased towards a high volume of writes and simple queries
  19. Our Solution
    • Enter MPTStore
      • Java library that handles all mapping and accounting behind the scenes
      • API for performing triple writes and queries
      • Translates queries from a particular language (e.g. SPO, SPARQL) into SQL statements
  20. Our Solution
    • Designed to expose transaction/connection semantics
      • Calling code has to provide jdbc connection for adding, querying triples
      • Thus, clear path to use advanced transactional capabilities offered by jdbc driver (such as XA)
  21. Results
    • MPTStore performance well suited to NSDL use case
      • Adds or modifies were significantly faster than Kowari case, and were unaffected by database size
      • SPO queries were on-par with Kowari in unbound(common) case
  22. Results
    • Bonus
      • NSDL team was very familiar with operation of RDBMS administration: performance tuning, backups, etc
      • Stored data is transparent and “hackable”: Ad-hoc SQL queries and analysis are relatively simple
  23. Results
    • Fedora Bonus
      • Ability to easily analyze the database: helped us track down our own middleware bugs (improved Kowari Performance).
  24. Fast, Immediate Updates
    • Graph shows average ms. per datastream modification
    • MPTStore achieves virtually same performance whether buffering or not
    • Complete test detail in Fedora 2.2 docs
  25. RI: Future Directions
    • External Resource Index
      • Event-based (JMS) updates to external triplestore
        • Analogous to GSearch index updates
        • May be asynchronous
        • May index other datastreams
      • Make full use of triplestore capabilities without compromising the core repository
        • Inference (e.g. krule, RACER)
        • Native APIs
  26. RI: Future Directions
    • Internal (Synchronous) Resource Index
      • Assumption: XA Transactions.
      • Option A: MPTStore Only
        • Pro: Simple, synchronous, JDBC (no need for middleware)
        • Con: Basic queries (no iTQL, maybe SPARQL-Lite)
      • Option B: Mulgara or MPTStore
        • Pro: Richer queries when using Mulgara (iTQL)
        • Con: Complexity (need for XA-aware middleware?)
  27. Thank You
    • More Information
      • http://mptstore.sourceforge.net/
      • http://www.fedora.info/download/2.2/
      • http://tripletest.sourceforge.net/

+ cwilpercwilper, 3 years ago

custom

3732 views, 0 favs, 2 embeds more stats

Describes and motivates the creation of MPTStore wi more

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 3732
    • 3610 on SlideShare
    • 122 from embeds
  • Comments 5
  • Favorites 0
  • Downloads 0
Most viewed embeds
  • 121 views on http://dltj.org
  • 1 views on http://64.233.167.104

more

All embeds
  • 121 views on http://dltj.org
  • 1 views on http://64.233.167.104

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories