/path/to/content
the Apache Jackrabbit content repository
Outline
• Repository model
• Property and node types
• Sessions and namespaces
• References and versioning
• Search and observation
• Access control
• Persistence and clustering
• Deployment and configuration
• Questions?
Repository model
Repository structure
Repository
Workspace A Workspace B Workspace C /jcr:system
… or more commonly
Repository
default workspace /jcr:system
Workspace structure
Root node /
/jcr:systemNode /a
Node /a/cNode /a/b
a
b c
jcr:system
Node structure
Property name Type Value
jcr:primaryType Name nt:unstructured
jcr:mixinTypes Name[] mix:referenceable
jcr:uuid String c6d27a10-bf23-11e3-b…
title String My new node
author String Jukka Zitting
Child nodes
foo, bar, baz[1], baz[2]
Property and node types
Common property types
Property type Used for Examples
String Short to medium-sized text “foo”, “This paragraph…”
Binary Binary data and long text PNG, PDF, “This book…”
Name Node and property names “nt:folder”, “content”
Path Node and property paths “/jcr:system”, “/etc/map”
Boolean, Long, Double Scalar data true, 0, -2846, 3.14, NaN
Date ISO 8601 timestamp 2014-04-08T12:00:00.000Z
Reference Graph structures c6d27a10-bf23-11e3-b…
Multi-valued properties
• Zero or more values
• Limit at around 10-100k values, depending on size of values
• All values must be of the same type
• Duplicates allowed
• No “null” values
• Automatically removed
• Order is preserved
Common node types
nt:base
- jcr:primaryType: Name
- jcr:mixinTypes: Name[]
nt:unstructured
- * (any properties OK)
+ * (any child nodes OK)
oak:Unstructured (w/o order)
- * (any properties OK)
+ * (any child nodes OK)
mix:referenceable
- jcr:uuid: String
mix:versionable
- …
mix:lockable
- …
Common node types, cont.
nt:hierarchyNode (abstract)
- jcr:created: Date
nt:file
+ jcr:content
nt:folder
+ *: nt:hierarchyNode
nt:resource
- jcr:data: Binary
- jcr:mimeType: String
- jcr:lastModified: Date
Example
Site
nt:unstructured
form
nt:folder
style.css
nt:file
logo.png
nt:file
function
nt:folder
jquery.js
nt:file
Blog
nt:unstructured
Post 1
nt:unstructured
attachment.pdf
nt:file
Post 2
nt:unstructured
Comment 1
nt:unstructured
Sessions and namespaces
Sessions
workspace
Session
Session
Session Session
Session
Session
• All content access goes through a session
• Sessions are created with an authenticated login() call
• Session-based authorization of reads, writes and other operations
• Tracking of transient changes
• Atomic save()
• Not thread-safe!
• for concurrent operations, use multiple sessions
Namespaces
• The repository has a set of prefix -> URI namespace mappings
jcr: http://www.jcp.org/jcr/1.0
nt: http://www.jcp.org/jcr/nt/1.0
mix: http://www.jcp.org/jcr/mix/1.0
xml: http://www.w3.org/XML/1998/namespace
etc.
• Used to prevent naming conflicts between different clients
• Each session can override (non-default) mappings locally
• designed for cases like XML imports, etc.
• in practice seldom used, and often not recommended
References and versioning
mix:referenceable
- jcr:uuid = c6d27a10-bf23-11e3-b1b6-0800200c9a66
References
- seeAlso = c6d27a10-bf23-11e3-b…
References, cont.
• hard references
• enforced integrity; target can not be removed
• least flexibility; think twice before using
• weak references
• remains valid across moves/renames
• paths, names, URLs, etc.
• no backreferences
mix:versionable
Versioning
checkin
Versioning, cont.
• To make a node versionable, add the mix:versionable mixin
• scope of “versionability” determined by node types (OPV)
• A checkin freezes a piece of content and makes a copy of it in the
version history
• A checkout unfreezes the content and allows it be modified
• A restore goes back in time to a previously checked in version
• A merge combines changes from another workspace to those made
in this workspace
Search and observation
Search examples
// find all PDF files within this workspace, most recent first
SELECT * FROM [nt:file]
WHERE [jcr:mimeType] = ‘application/pdf’
ORDER BY [jcr:lastModified] DESC
// find all content about Christmas within my blog
/jcr:root/sites/myblog//*[jcr:contains(., ‘Christmas’)]
Search
• By default all content is indexed
• Configurable per repository
• Support for full text search
• Also binaries indexed with automatic text extraction
• Full access control of search results
• However:
• Limited join support/performance
• No facets or aggregate queries
Observation
• An observation listener can select to receive events
• on changes of specified types
• on changes at or below a specified path
• on changes at nodes with specified identifiers
• on changes at nodes of specified types
• The events are delivered in asynchronous callbacks
• Remember the non-thread-safety of sessions!
• Often used to maintain a cache of expensive-to-compute data
Access control
Access control
• Fine-grained, ACL-based access control
• Applies to all content accesses
• Writes
• Reads
• Search
• Observation
• etc.
• Support for custom privileges
• e.g. an “execute” privilege
Persistence and clustering
Persistence managers
Repository
Workspace A Workspace B Workspace C /jcr:system
Persistence
Manager 1
Persistence
Manager 2
Persistence
Manager 3
Persistence
Manager 4
Persistence alternatives
Embedded Database PM
Derby, H2
External Database PM
PostgreSQL,
Oracle, etc.
Data store
Repository
Workspace A /jcr:system
Persistence
Manager 1
Persistence
Manager 2
Data Store
Data store alternatives
File Data Store
Local FS, NFS
Database Data Store S3 Data Store
PostgreSQL,
Oracle, etc.
S3
Clustering
Persistence
Manager
PostgreSQL, Oracle, etc.
Repository
Persistence
Manager
Repository
Persistence
Manager
Repository
Deployment and configuration
Deployment packages
• jackrabbit-webapp
• basic web interface (still no content browser/editor)
• exposes the repository through JNDI, WebDAV, RMI
• jackrabbit-standalone
• runnable jar
• jackrabbit-webapp plus embedded Jetty
• basic tooling: backup/migration, CLI, etc.
• jackrabbit-jca
• designed for full J2EE environments
• support for managed transactions
Embedded deployment
• jackrabbit-core plus all dependencies
• Maven recommended
• slf4j used for logging
• Full control over the repository
• Extra work to make the repository externally manageable
Repository configuration
• repository.xml
• main repository configuration file
• security, clustering, data store, /jcr:system, etc.
• workspace.xml
• configuration of each workspace
• persistence manager, search index, etc.
• automatically created based on template in repository.xml
• indexing_configuration.xml
• optional, customizes the search index
• see http://jackrabbit.apache.org/jackrabbit-configuration.html
Questions?

/path/to/content - the Apache Jackrabbit content repository