/path/to/content - the Apache Jackrabbit content repository


Published on

Looking for a database where user profiles and image galleries are equally at home? That comes with built-in full text search, fine-grained access control, flexible schemas, versioning and many more advanced features? Take a look at Apache Jackrabbit, the Java-based content repository that combines the best parts of file systems and databases. This introductory presentation covers Apache Jackrabbit and its hierarchical content model, and shows how it can be used as a powerful foundation of modern content-based applications.

Published in: Software
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

/path/to/content - the Apache Jackrabbit content repository

  1. 1. /path/to/content the Apache Jackrabbit content repository
  2. 2. Outline • Repository model • Property and node types • Sessions and namespaces • References and versioning • Search and observation • Access control • Persistence and clustering • Deployment and configuration • Questions?
  3. 3. Repository model
  4. 4. Repository structure Repository Workspace A Workspace B Workspace C /jcr:system
  5. 5. … or more commonly Repository default workspace /jcr:system
  6. 6. Workspace structure Root node / /jcr:systemNode /a Node /a/cNode /a/b a b c jcr:system
  7. 7. Node structure Property name Type Value jcr:primaryType Name nt:unstructured jcr:mixinTypes Name[] mix:referenceable jcr:uuid String c6d27a10-bf23-11e3-b… title String My new node author String Jukka Zitting Child nodes foo, bar, baz[1], baz[2]
  8. 8. Property and node types
  9. 9. Common property types Property type Used for Examples String Short to medium-sized text “foo”, “This paragraph…” Binary Binary data and long text PNG, PDF, “This book…” Name Node and property names “nt:folder”, “content” Path Node and property paths “/jcr:system”, “/etc/map” Boolean, Long, Double Scalar data true, 0, -2846, 3.14, NaN Date ISO 8601 timestamp 2014-04-08T12:00:00.000Z Reference Graph structures c6d27a10-bf23-11e3-b…
  10. 10. Multi-valued properties • Zero or more values • Limit at around 10-100k values, depending on size of values • All values must be of the same type • Duplicates allowed • No “null” values • Automatically removed • Order is preserved
  11. 11. Common node types nt:base - jcr:primaryType: Name - jcr:mixinTypes: Name[] nt:unstructured - * (any properties OK) + * (any child nodes OK) oak:Unstructured (w/o order) - * (any properties OK) + * (any child nodes OK) mix:referenceable - jcr:uuid: String mix:versionable - … mix:lockable - …
  12. 12. Common node types, cont. nt:hierarchyNode (abstract) - jcr:created: Date nt:file + jcr:content nt:folder + *: nt:hierarchyNode nt:resource - jcr:data: Binary - jcr:mimeType: String - jcr:lastModified: Date
  13. 13. Example Site nt:unstructured form nt:folder style.css nt:file logo.png nt:file function nt:folder jquery.js nt:file Blog nt:unstructured Post 1 nt:unstructured attachment.pdf nt:file Post 2 nt:unstructured Comment 1 nt:unstructured
  14. 14. Sessions and namespaces
  15. 15. Sessions workspace Session Session Session Session Session
  16. 16. Session • All content access goes through a session • Sessions are created with an authenticated login() call • Session-based authorization of reads, writes and other operations • Tracking of transient changes • Atomic save() • Not thread-safe! • for concurrent operations, use multiple sessions
  17. 17. Namespaces • The repository has a set of prefix -> URI namespace mappings jcr: http://www.jcp.org/jcr/1.0 nt: http://www.jcp.org/jcr/nt/1.0 mix: http://www.jcp.org/jcr/mix/1.0 xml: http://www.w3.org/XML/1998/namespace etc. • Used to prevent naming conflicts between different clients • Each session can override (non-default) mappings locally • designed for cases like XML imports, etc. • in practice seldom used, and often not recommended
  18. 18. References and versioning
  19. 19. mix:referenceable - jcr:uuid = c6d27a10-bf23-11e3-b1b6-0800200c9a66 References - seeAlso = c6d27a10-bf23-11e3-b…
  20. 20. References, cont. • hard references • enforced integrity; target can not be removed • least flexibility; think twice before using • weak references • remains valid across moves/renames • paths, names, URLs, etc. • no backreferences
  21. 21. mix:versionable Versioning checkin
  22. 22. Versioning, cont. • To make a node versionable, add the mix:versionable mixin • scope of “versionability” determined by node types (OPV) • A checkin freezes a piece of content and makes a copy of it in the version history • A checkout unfreezes the content and allows it be modified • A restore goes back in time to a previously checked in version • A merge combines changes from another workspace to those made in this workspace
  23. 23. Search and observation
  24. 24. Search examples // find all PDF files within this workspace, most recent first SELECT * FROM [nt:file] WHERE [jcr:mimeType] = ‘application/pdf’ ORDER BY [jcr:lastModified] DESC // find all content about Christmas within my blog /jcr:root/sites/myblog//*[jcr:contains(., ‘Christmas’)]
  25. 25. Search • By default all content is indexed • Configurable per repository • Support for full text search • Also binaries indexed with automatic text extraction • Full access control of search results • However: • Limited join support/performance • No facets or aggregate queries
  26. 26. Observation • An observation listener can select to receive events • on changes of specified types • on changes at or below a specified path • on changes at nodes with specified identifiers • on changes at nodes of specified types • The events are delivered in asynchronous callbacks • Remember the non-thread-safety of sessions! • Often used to maintain a cache of expensive-to-compute data
  27. 27. Access control
  28. 28. Access control • Fine-grained, ACL-based access control • Applies to all content accesses • Writes • Reads • Search • Observation • etc. • Support for custom privileges • e.g. an “execute” privilege
  29. 29. Persistence and clustering
  30. 30. Persistence managers Repository Workspace A Workspace B Workspace C /jcr:system Persistence Manager 1 Persistence Manager 2 Persistence Manager 3 Persistence Manager 4
  31. 31. Persistence alternatives Embedded Database PM Derby, H2 External Database PM PostgreSQL, Oracle, etc.
  32. 32. Data store Repository Workspace A /jcr:system Persistence Manager 1 Persistence Manager 2 Data Store
  33. 33. Data store alternatives File Data Store Local FS, NFS Database Data Store S3 Data Store PostgreSQL, Oracle, etc. S3
  34. 34. Clustering Persistence Manager PostgreSQL, Oracle, etc. Repository Persistence Manager Repository Persistence Manager Repository
  35. 35. Deployment and configuration
  36. 36. Deployment packages • jackrabbit-webapp • basic web interface (still no content browser/editor) • exposes the repository through JNDI, WebDAV, RMI • jackrabbit-standalone • runnable jar • jackrabbit-webapp plus embedded Jetty • basic tooling: backup/migration, CLI, etc. • jackrabbit-jca • designed for full J2EE environments • support for managed transactions
  37. 37. Embedded deployment • jackrabbit-core plus all dependencies • Maven recommended • slf4j used for logging • Full control over the repository • Extra work to make the repository externally manageable
  38. 38. Repository configuration • repository.xml • main repository configuration file • security, clustering, data store, /jcr:system, etc. • workspace.xml • configuration of each workspace • persistence manager, search index, etc. • automatically created based on template in repository.xml • indexing_configuration.xml • optional, customizes the search index • see http://jackrabbit.apache.org/jackrabbit-configuration.html
  39. 39. Questions?