Oak
the architecture of Apache Jackrabbit 3
Subsection Title
• Text
• Text
Resources
• http://jackrabbit.apache.org/oak/
• Docs
• http://jackrabbit.apache.org/oak/docs/
• Code
• https://svn.apache....
Outline
• Tree model
• Updating the tree
• Refresh and garbage collection
• Concurrency and conflicts
• Interlude: Impleme...
Tree model
a d
b c
Paths as identifiers
/
/a
/a/b
/a/c
/d
a d
b c
Paths as identifiers
/
/a
/a/b
/a/c
/d
Updating the tree
?
r1 r2
HEAD
r1: /d r2: /d
r1: /a/c
r2: /a/c
Refresh and garbage collection
refresh
garbage
Concurrency and conflicts
r1 r2br2a
r1
r2b
r2a
r3
merge
Conflict handling strategies
a. Fully serialized commits
• fail on conflict, no concurrent updates
b. Partially serialized...
Interlude: implementations
MicroKernel/NodeStore
• Implementation of the tree/revision model
Responsible for
Clustering
Sharding
Caching
Conflict han...
Current implementations
DocumentMK TarMK (SegmentMK)
Persistence backends MongoDB, JDBC (WIP) Local FS (tar files)
Conflic...
Replicas and sharding
master copy full replica cache
Replicas and caches
by path by level by hash
Sharding strategies
with caching
Access control
Accessible paths
/
/a/b
/d
Existentialism
• All (syntactically valid) paths can be traversed
• But the identified node might not exist
• For example:...
Comparing revisions
What changed?
Content diff
• Tells what changed between two content trees
• Cornerstone of most higher-level functionality
• validation
...
r1
r2b
r2a
r3
Examples
r1 -> r3
“a” modified
“b” removed
“d” modified
“e” added
r1 -> r2a
“a” modified
“b” removed
r1 -> r...
Commit hooks
If this changed, commit this instead
Commit hooks
• Based on given before and after states, a hook can:
• fail the commit, or
• pass the commit unmodified, or
...
Examples
• All kinds of validation
• node types, access control, references, etc.
• Trigger-like functionality
• autocreat...
Types of hooks
CommitHook Editor Validator
Content diff Optional Always Always
Can modify commit Yes Yes No
Programming
mo...
Observers
Observers
• Based on given before and after states, an observer can:
• observe what changed in the content tree
• Invoked ...
Examples
• JCR Observation
• External index updates
• Cache invalidation
• Logging
• etc.
Search
SELECT
WHERE x=y
/a//*
Parser
Parser
Parser
Index
Index
Index
Parser Index
Query engine
Query processing steps
1. Parsing
a. Select matching parser
b. Parse the query string
2. Execution
a. Estimate cost per in...
Index implementations
• Property index
• Reference index
• Lucene index
• in-content
• local file system
• Solr index
• em...
Big picture
MicroKernel
Oak Core
Oak JCR
Oak API
NodeStore API
JCR API
Plugins
Questions?
Oak, the architecture of Apache Jackrabbit 3
Upcoming SlideShare
Loading in...5
×

Oak, the architecture of Apache Jackrabbit 3

7,831

Published on

Apache Jackrabbit is just about to reach the 3.0 milestone based on a new architecture called Oak. Based on concepts like eventual consistency and multi-version concurrency control, and borrowing ideas from distributed version control systems and cloud-scale databases, the Oak architecture is a major leap ahead for Jackrabbit. This presentation describes the Oak architecture and shows what it means for the scalability and performance of modern content applications. Changes to existing Jackrabbit functionality are described and the migration process is explained.

Published in: Software, Technology
3 Comments
18 Likes
Statistics
Notes
No Downloads
Views
Total Views
7,831
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
249
Comments
3
Likes
18
Embeds 0
No embeds

No notes for slide

Oak, the architecture of Apache Jackrabbit 3

  1. 1. Oak the architecture of Apache Jackrabbit 3
  2. 2. Subsection Title • Text • Text
  3. 3. Resources • http://jackrabbit.apache.org/oak/ • Docs • http://jackrabbit.apache.org/oak/docs/ • Code • https://svn.apache.org/repos/asf/jackrabbit/oak/trunk/ • https://github.com/apache/jackrabbit-oak • Builds • http://ci.apache.org/builders/oak-trunk/ • https://travis-ci.org/apache/jackrabbit-oak
  4. 4. Outline • Tree model • Updating the tree • Refresh and garbage collection • Concurrency and conflicts • Interlude: Implementations • Replicas and sharding • Access control • Comparing revisions • Commit hooks • Observers • Search • Big picture
  5. 5. Tree model
  6. 6. a d b c Paths as identifiers / /a /a/b /a/c /d
  7. 7. a d b c Paths as identifiers / /a /a/b /a/c /d
  8. 8. Updating the tree
  9. 9. ?
  10. 10. r1 r2 HEAD r1: /d r2: /d r1: /a/c r2: /a/c
  11. 11. Refresh and garbage collection
  12. 12. refresh
  13. 13. garbage
  14. 14. Concurrency and conflicts
  15. 15. r1 r2br2a
  16. 16. r1 r2b r2a r3 merge
  17. 17. Conflict handling strategies a. Fully serialized commits • fail on conflict, no concurrent updates b. Partially serialized commits • fail on conflict, concurrent conflict-free updates c. Partial merge logic • conflict markers, manual conflict resolution d. Full merge logic • conflicting changes may be lost
  18. 18. Interlude: implementations
  19. 19. MicroKernel/NodeStore • Implementation of the tree/revision model Responsible for Clustering Sharding Caching Conflict handling etc. Not responsible for Type validation Access control Search Versioning etc.
  20. 20. Current implementations DocumentMK TarMK (SegmentMK) Persistence backends MongoDB, JDBC (WIP) Local FS (tar files) Conflict handling Partial serialization Full serialization Clustering MongoDB clustering Simple failover Sharding MongoDB sharding N/A Single-node performance Moderate High Key use cases Large deployments (>1TB), concurrent writes Small/medium deployments, mostly read
  21. 21. Replicas and sharding
  22. 22. master copy full replica cache Replicas and caches
  23. 23. by path by level by hash Sharding strategies with caching
  24. 24. Access control
  25. 25. Accessible paths / /a/b /d
  26. 26. Existentialism • All (syntactically valid) paths can be traversed • But the identified node might not exist • For example: root.getChildNode(“a”).exists() -> false root.getChildNode(“a”).getChildNode(“b”).exists() -> true! • Implemented as a decorator over the MK
  27. 27. Comparing revisions
  28. 28. What changed?
  29. 29. Content diff • Tells what changed between two content trees • Cornerstone of most higher-level functionality • validation • indexing • observation • etc.
  30. 30. r1 r2b r2a r3 Examples r1 -> r3 “a” modified “b” removed “d” modified “e” added r1 -> r2a “a” modified “b” removed r1 -> r2b “d” modified “e” added
  31. 31. Commit hooks
  32. 32. If this changed, commit this instead
  33. 33. Commit hooks • Based on given before and after states, a hook can: • fail the commit, or • pass the commit unmodified, or • pass the commit with modifications • Key plugin mechanism in Oak • All configured hooks are applied in sequence • Used for much higher level functionality • Often implemented using a content diff
  34. 34. Examples • All kinds of validation • node types, access control, references, etc. • Trigger-like functionality • autocreated content, default values, etc. • In-content index updates • etc.
  35. 35. Types of hooks CommitHook Editor Validator Content diff Optional Always Always Can modify commit Yes Yes No Programming model Simple Callbacks Callbacks Performance impact High Medium Low
  36. 36. Observers
  37. 37. Observers • Based on given before and after states, an observer can: • observe what changed in the content tree • Invoked after the commit, unlike commit hooks • Always asynchronous for changes from other cluster nodes • Depending on backend, can be synchronous for changes on the local cluster node • Often implemented using a content diff
  38. 38. Examples • JCR Observation • External index updates • Cache invalidation • Logging • etc.
  39. 39. Search
  40. 40. SELECT WHERE x=y /a//* Parser Parser Parser Index Index Index Parser Index Query engine
  41. 41. Query processing steps 1. Parsing a. Select matching parser b. Parse the query string 2. Execution a. Estimate cost per index b. Select index with the least cost estimate c. Execute the query against the index 3. Post-processing a. Filter results on access control and additional constraints b. Apply sorting, grouping, faceting, etc.
  42. 42. Index implementations • Property index • Reference index • Lucene index • in-content • local file system • Solr index • embedded • external
  43. 43. Big picture
  44. 44. MicroKernel Oak Core Oak JCR Oak API NodeStore API JCR API Plugins
  45. 45. Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×