2. Features
Current status & roadmap
Design (how we use Infinispan)
Best practices
Q&A
2
3. ModeShape 3
An elastic in-memory hierarchical database
with queries, transactions, events & more
3
4. Elastic
• Add more processes to increase storage
capacity and/or throughput
– No master, no slaves
– Data is rebalanced as needed
– Optionally separate database engine from storage
processes
• Fault tolerant
– Processes can fail without loss of data
– Cross-data center distribution (in near future)
4
5. Hierarchical
• Organize the data into a tree structure that
reflects how the data is accessed & used
– Navigation to related data
– Still have references and queries
• Many scenarios have natural hierarchies
5
6. Strongly consistent
• ACID
– Atomic, Consistent, Isolated, Durable
– Already familiar to most developers
– Easy to reason about code
• XA-aware
– Participate in user transactions
– Work with Java EE
6
7. Why not eventually-consistent?
• In eventually-consistent databases
– changes made by one client will eventually (but not
immediately) be propagated to all processes
– other clients won’t see latest data right away, yet can still
make other changes
– there may be multiple versions of a particular piece of data
• Can be ideal for some scenarios
– read-heavy and/or best-effort
• Applications that update data may need to
– expect inconsistencies (and/or multiple versions)
– specify conflict strategies
– resolve conflicts (inconsistencies)
7
8. In-memory
• Memory is really fast (and cheap)
• Why not keep all data in memory?
– practical limits to memory on particular machines
– memory isn’t shared between machines
– data stored in memory isn’t durable
– no queries, structure, or transactions
• ModeShape
– distributes multiple copies of data across the
combined memory of many machines
– can even persist data to disk or DB (if really needed)
– can still use queries, structure and transactions
– is fast
8
9. Queries
• Find the data independently of the hierarchy
• Use SQL-like language
SELECT * FROM [car:Car] WHERE [car:model] LIKE ‘%Toyota%’ AND [car:year] >= 2006
SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file]
WHERE PATH() LIKE $path
SELECT file.*,content.* FROM [nt:file] AS file
JOIN [nt:resource] AS content ON ISCHILDNODE(content,file)
WHERE file.[jcr:path] LIKE '/files/q*.2.vdb'
SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file]
WHERE PATH() IN (
SELECT [vdb:originalFile] FROM [vdb:virtualDatabase]
WHERE [vdb:version] <= $maxVersion
AND CONTAINS([vdb:description],'xml OR xml maybe')
)
9
10. With or without schema
STRICT NO
ENFORCEMENT ENFORCEMENT
• Choose how much schema is enforced
– define patterns for values and structure
– use different patterns for different parts of the database
– change the patterns over time
– use the “best” levels of schema validation
– evolve as necessary
10
11. Binary storage
• Separate storage for BINARY values
– content keyed by SHA-1
– property value stored with node
contains SHA-1 and resolved
as needed
– content always buffered
• Option per repository
– File system
– Transient (temp directory)
– JDBC database
– MongoDB
– Infinispan (separate caches) Binary Storage
– custom
11
12. Sequencing
• Automatically extract structured content
– save BINARY or STRING values
– path rules & MIME types determine which sequencer is run
– output stored in repository at configurable location
• Sequencers
4) navigate
– CND or query
– DDL
– text (fixed width, delimited)
– Microsoft Office™
– Java (source & class) 1) upload
– ZIP (and JAR/WAR/EAR)
– XML, XSD, and WSDL
2) notify
– Teiid VDBs
– audio (MP3)
– images 3) derive
and store
– custom Sequencers
12
13. Federation
(reintroduced in 3.1)
• Access data in external systems
– external data projected as nodes
with properties and node types
– supports read and optional write
– same validation rules
• Connector options
– File system (3.1)
– Local git (3.2)
– Database (3.2)
– Database metadata (3.2) External source A
– Local repository (3.2) External source B
– External JCR repository (3.2)
– custom
13
14. Monitoring
• Measure statistics for a variety of metrics
– Total counts: active sessions, queries, workspaces,
locks, listeners, events in queue, sequencing
operations in queue
– Increment counts: events sent to listeners, nodes
changed, saves, nodes sequenced
• Results
– Metrics measured every 5 seconds
– Results are aggregated into windows that show
statistics (min, max, median, variance, stdev, sample
count) during last minute, hour, day, week, year
14
16. JCR 2.0
• Standard Java API (JSR-283)
– javax.jcr packages
– programmatically access, find, update, query content
– commonly needed features: events, versioning, etc.
– hierarchical tree of nodes, nodes have properties,
property values can reference other nodes
databases file systems
query read streams
integrity locking
schema write hierarchy
transactions access control
events search
unstructured
versioning
content repositories
16
17. Extended JCR API
• Extended JCR interfaces
– additional node type management methods
– additional event types
– additional Binary value methods (hash)
– additional JCR-QOM language objects
– cancel queries
– sequencer and text SPIs
– monitoring API
17
18. JDBC API
• Use JDBC driver or data source
– connect to local or remote repository
– issue JCR-SQL2 queries
– access database metadata
• Enables existing applications to access content
– ad hoc query tools
– reporting systems
18
19. RESTful API
• Access content over HTTP
– POST, PUT, GET, DELETE methods
– JSON representations
– Single or subtree of nodes with properties
– Streams large BINARY values
– Register node types
– Execute queries
• Deployed as WAR file
– Same app server in which ModeShape is deployed
– Handles multiple repositories
19
20. WebDAV API
• Exposes content as files and directories
– nt:file nodes exposed as files
– nt:folder nodes exposed as directories
– other nodes exposed as directories
• Mount repository on file system
– Treat as external drive
– Upload files and folders into repository
• Deployed as WAR file
– Same app server in which ModeShape is deployed
– Handles multiple repositories
20
22. ModeShape 3 and Infinispan
Single process
ModeShape
...
...
Infinispan cache
(local)
data
Persistent Store
22
23. ModeShape 3 and Infinispan
Small cluster
ModeShape ModeShape ModeShape
... ... ...
events events
... ... ...
Infinispan cache Infinispan cache Infinispan cache
(replicated) (replicated) (replicated)
data data
data
data data
Persistent Store
23
24. ModeShape 3 and Infinispan
Moderate single- or multi-site cluster
ModeShape ModeShape ModeShape ModeShape
... ... ... ...
...
events ...
events ...
events ...
Infinispan Infinispan Infinispan
... Infinispan
(distributed) (distributed) (distributed) (distributed)
data data data
24
25. ModeShape 3 and Infinispan
Large single- or multi-site cluster
ModeShape ModeShape ModeShape ModeShape
... ... ... ... ...
...
events ...
events ...
events ...
data data data data
Infinispan data grid
25
27. Deploying ModeShape in AS7
• Simple installation
– simply unzip into existing AS7 installation
– includes “standalone-modeshape.xml” that contains a a
variety of ready-to-run sample repositories
• ModeShape subsystem for AS7
– use AS7 tools to define 1+ repositories
– each repository is independently configured
– update repository configuration while running
– (re)uses Infinispan and JGroups subsystems
– clustering is built-in
– perform management and monitoring operations
27
28. Sample AS7 configuration
<subsystem xmlns="urn:jboss:domain:modeshape:3.0">
<repository name="sample" />
</subsystem>
– Each “repository” fragment defines a repository
– Multiple are supported
28
29. Sample AS7 configuration
(more thorough)
<subsystem xmlns="urn:jboss:domain:modeshape:3.0">
<!-- Multiple 'repository' elements are allowed -->
<repository name="sample"
cache-name="sample" cache-container="modeshape"
jndi-name="jcr/local/sample"
enable-monitoring="true"
default-workspace="default" allow-workspace-creation="true"
security-domain="modeshape-security"
anonymous-roles="readonly,readwrite,admin" anonymous-username="<anonymous>"
use-anonymous-upon-failed-authentication="false">
<workspaces>
<!-- 0 or more workspaces can be predefined. At the moment, these are just names.
But we may want to specify content or something else, so create element for each. -->
<workspace name="predefinedWorkspace1" />
<workspace name="predefinedWorkspace2" />
<workspace name="predefinedWorkspace3" />
</workspaces>
<indexing thread-pool="modeshape-workers" batch-size="-1" reader-strategy="shared" mode="sync"
async-thread-pool-size="1" async-max-queue-size="1" >
<analyzer classname="org.apache.lucene.analysis.standard.StandardAnalyzer" module="" />
<jms-master-backend connection-factory-jndi-name="" queue-jndi-name=""/>
</indexing>
<file-master-index-storage rebuild-upon-startup="ifMissing" format="LUCENE_CURRENT"
path="modeshape/sample/indexes" relative-to="jboss.server.data.dir"
access-type="auto" locking-strategy="native"
source-path="/var/lib/modeshape/index/" refresh-in-seconds="3600"/>
<file-binary-storage min-value-size="4096" path="modeshape/sample/binaries"
relative-to="jboss.server.data.dir" />
<sequencers>
<!-- 0 or more sequencers -->
<sequencer name="Java Source" classname="java"
path-expression="/files/(*.java)[/jcr:content] => /java/$1"/>
</sequencers>
</repository>
</subsystem>
29
30. ModeShape repositories in JBoss AS7
Web Apps, EJBs, MDBs, etc
HTT
JDBC
JDBC
JDBC
JCR
JCR
JCR
P/RE
ST
PHP
Ruby
JavaScript HTTP/REST
Java Python
WebD
AV Repositories
JBoss AS 7
30
33. ModeShape 3 (part 1 of 2)
• Much faster
– Order of magnitude faster, or more
– Way higher write concurrency (equivalent to node-level locking)
– Thread-safe implementations
– Memory is the new disk
– Internal caches and lazy loading
– Faster resolution of references and back-references
• Massively larger repository sizes
– Millions of nodes, or more
– Flat hierarchies (>>10K children under 1 parent)
– Very large files, without consuming heap
• More deployment options
– Large clusters
– High availability
– Multiple sites
33 – Cloud
34. ModeShape 3 (part 2 of 2)
• Easily embedded
– Lightweight, multi-repository engine
– Hot deployment and configuration of repositories
– Windowed metrics
• JBoss AS 7 integration
– Provides lightweight, on-demand JCR subsystem
– Hot deployment and configuration of repositories
– Management of domains (clusters and groups)
– Monitoring and alerting (via RHQ/JON)
• Participate in JTA transactions
– Enabling easy use of JCR in EJB, MDB, CDI, etc.
• Simpler SPIs
– Sequencers, text extractors, security providers, binary stores, and
connectors
34
35. Under the ModeShape 3 covers
• Use best-of-breed technology
– Infinispan: cache, key-value store and data grid
– Hibernate Search: indexing
– JGroups: clustering events
– JBoss AS7: small, fast, clusterable, manageable, cloud
– Others: RESTEasy, PicketLink, etc.
• Design techniques
– Simplify, simplify, simplify!
– Use immutability first, otherwise write concurrent code
– Cache data (especially immutable)
– Share more data between sessions
– Plan for eventual consistency
– Remove layers
– Use sequences (lazily load data, benefits large collections)
– JSON/BSON documents optimized for in-memory usage
35
38. ModeShape 3 and Infinispan
Using different caches for different purposes
ModeShape Repository
JCR sessions hold their changes
in memory;; will use Infinispan
...
caches that (can) overflow to disk
Shared, transient Infinispan
... (Infinispan) caches for each workspace,
(Infinispan) (Infinispan)
caching node representations and
expiring entries based on events
Binary Content Storage External
Storage (Infinispan) Systems
(Infinispan)
Each node state stored in
Infinispan cache as 1 or more
JSON/BSON documents
38
Configure Infinispan store as needed
40. Best practices (1 of 2)
• Build structure first, then node types
– most important to get your node structure right
– it will change over time anyway, so don’t define the node types too soon
• Use mixin node types and mixins
– where possible define sets of properties as mixins
– use in primary types and dynamically add to nodes
• Limit use of same-name-siblings
– useful when required, but can be expensive and difficult to use (i.e., paths change)
• Prefer hierarchies
– moderate numbers of child nodes, use multiple levels if necessary
• Store files and folders with ‘nt:file’ and ‘nt:folder’
– use it wherever appropriate; not for all binary data, though!
• Verify features are enabled
– improves portability and safety with configuration changes
• Import and export
– avoid document view; use system view wherever possible
40
41. Best practices (2 of 2)
• Prefer JCR-SQL2 and JCR-QOM over other query languages
– by far the richest and most useful
– do this even when it appears the queries are more complicated
• Only Repository is thread-safe; no other APIs are
– don’t share sessions
– don’t share anything between sessions
• Register all listeners in special long-lived sessions
– do nothing else with these sessions, however (Session is not threadsafe)
– get off the notification thread ASAP, using work queues where necessary
– Session is not threadsafe
• Create new sessions rather than reusing a pool of sessions
– Sessions are intended to be lightweight as possible
– Create a session, use it, log out (even web applications and services!)
• Avoid deprecated APIs
– either perform poorly or are a bad idea; besides, they’ll be removed eventually
• Use Session.save() not Node.save()
41