ModeShape 3 overview

Introduction to
ModeShape 3
November 28, 2012

Randall Hauch
@rhauch

Features

Current status & roadmap

Design (how we use Inﬁnispan)

Best practices

Q&A

2

ModeShape 3
An elastic in-memory hierarchical database
with queries, transactions, events & more

3

Elastic
• Add more processes to increase storage
capacity and/or throughput
– No master, no slaves
– Data is rebalanced as needed
– Optionally separate database engine from storage
processes
• Fault tolerant
– Processes can fail without loss of data
– Cross-data center distribution (in near future)

4

Hierarchical
• Organize the data into a tree structure that
reﬂects how the data is accessed & used
– Navigation to related data
– Still have references and queries
• Many scenarios have natural hierarchies

5

Strongly consistent
• ACID
– Atomic, Consistent, Isolated, Durable
– Already familiar to most developers
– Easy to reason about code
• XA-aware
– Participate in user transactions
– Work with Java EE

6

Why not eventually-consistent?

• In eventually-consistent databases
– changes made by one client will eventually (but not
immediately) be propagated to all processes
– other clients won’t see latest data right away, yet can still
make other changes
– there may be multiple versions of a particular piece of data
• Can be ideal for some scenarios
– read-heavy and/or best-effort
• Applications that update data may need to
– expect inconsistencies (and/or multiple versions)
– specify conﬂict strategies
– resolve conﬂicts (inconsistencies)
7

In-memory
• Memory is really fast (and cheap)
• Why not keep all data in memory?
– practical limits to memory on particular machines
– memory isn’t shared between machines
– data stored in memory isn’t durable
– no queries, structure, or transactions
• ModeShape
– distributes multiple copies of data across the
combined memory of many machines
– can even persist data to disk or DB (if really needed)
– can still use queries, structure and transactions
– is fast
8

Queries
• Find the data independently of the hierarchy
• Use SQL-like language
SELECT * FROM [car:Car] WHERE [car:model] LIKE ‘%Toyota%’ AND [car:year] >= 2006

SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file]
WHERE PATH() LIKE $path

SELECT file.*,content.* FROM [nt:file] AS file
JOIN [nt:resource] AS content ON ISCHILDNODE(content,file)
WHERE file.[jcr:path] LIKE '/files/q*.2.vdb'

SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file]
WHERE PATH() IN (
SELECT [vdb:originalFile] FROM [vdb:virtualDatabase]
WHERE [vdb:version] <= $maxVersion
AND CONTAINS([vdb:description],'xml OR xml maybe')
)

9

With or without schema

STRICT NO
ENFORCEMENT ENFORCEMENT

• Choose how much schema is enforced
– deﬁne patterns for values and structure
– use different patterns for different parts of the database
– change the patterns over time
– use the “best” levels of schema validation
– evolve as necessary

10

Binary storage
• Separate storage for BINARY values
– content keyed by SHA-1
– property value stored with node
contains SHA-1 and resolved
as needed
– content always buffered
• Option per repository
– File system
– Transient (temp directory)
– JDBC database
– MongoDB
– Inﬁnispan (separate caches) Binary Storage
– custom
11

Sequencing
• Automatically extract structured content
– save BINARY or STRING values
– path rules & MIME types determine which sequencer is run
– output stored in repository at configurable location
• Sequencers
4) navigate
– CND or query
– DDL
– text (fixed width, delimited)
– Microsoft Office™
– Java (source & class) 1) upload
– ZIP (and JAR/WAR/EAR)
– XML, XSD, and WSDL
2) notify
– Teiid VDBs
– audio (MP3)
– images 3) derive
and store
– custom Sequencers
12

Federation
(reintroduced in 3.1)

• Access data in external systems
– external data projected as nodes
with properties and node types
– supports read and optional write
– same validation rules
• Connector options
– File system (3.1)
– Local git (3.2)
– Database (3.2)
– Database metadata (3.2) External source A

– Local repository (3.2) External source B
– External JCR repository (3.2)
– custom
13

Monitoring
• Measure statistics for a variety of metrics
– Total counts: active sessions, queries, workspaces,
locks, listeners, events in queue, sequencing
operations in queue
– Increment counts: events sent to listeners, nodes
changed, saves, nodes sequenced
• Results
– Metrics measured every 5 seconds
– Results are aggregated into windows that show
statistics (min, max, median, variance, stdev, sample
count) during last minute, hour, day, week, year

14

JCR 2.0
• Standard Java API (JSR-283)
– javax.jcr packages
– programmatically access, ﬁnd, update, query content
– commonly needed features: events, versioning, etc.
– hierarchical tree of nodes, nodes have properties,
property values can reference other nodes

databases file systems
query read streams
integrity locking
schema write hierarchy
transactions access control

events search
unstructured
versioning
content repositories

16

Extended JCR API
• Extended JCR interfaces
– additional node type management methods
– additional event types
– additional Binary value methods (hash)
– additional JCR-QOM language objects
– cancel queries
– sequencer and text SPIs
– monitoring API

17

JDBC API
• Use JDBC driver or data source
– connect to local or remote repository

– issue JCR-SQL2 queries

– access database metadata

• Enables existing applications to access content
– ad hoc query tools

– reporting systems

18

RESTful API
• Access content over HTTP
– POST, PUT, GET, DELETE methods
– JSON representations
– Single or subtree of nodes with properties
– Streams large BINARY values
– Register node types
– Execute queries

• Deployed as WAR ﬁle
– Same app server in which ModeShape is deployed
– Handles multiple repositories

19

WebDAV API
• Exposes content as files and directories
– nt:file nodes exposed as files
– nt:folder nodes exposed as directories
– other nodes exposed as directories

• Mount repository on file system
– Treat as external drive
– Upload files and folders into repository

• Deployed as WAR file
– Same app server in which ModeShape is deployed
– Handles multiple repositories
20

ModeShape 3 and Inﬁnispan
Single process

ModeShape
...

...

Inﬁnispan cache
(local)

data

Persistent Store
22

Small cluster

ModeShape ModeShape ModeShape
... ... ...

events events
... ... ...

Infinispan cache Infinispan cache Infinispan cache
(replicated) (replicated) (replicated)
data data

data
data data

Persistent Store
23

Moderate single- or multi-site cluster

ModeShape ModeShape ModeShape ModeShape
... ... ... ...

...
events ...
events ...
events ...

Infinispan Infinispan Infinispan
... Infinispan
(distributed) (distributed) (distributed) (distributed)
data data data

24

Large single- or multi-site cluster

ModeShape ModeShape ModeShape ModeShape
... ... ... ... ...

...
events ...
events ...
events ...

data data data data

Inﬁnispan data grid
25

Deploying ModeShape in AS7
• Simple installation
– simply unzip into existing AS7 installation

– includes “standalone-modeshape.xml” that contains a a

variety of ready-to-run sample repositories

• ModeShape subsystem for AS7
– use AS7 tools to define 1+ repositories

– each repository is independently configured

– update repository configuration while running

– (re)uses Infinispan and JGroups subsystems

– clustering is built-in

– perform management and monitoring operations

27

Sample AS7 conﬁguration

    <subsystem xmlns="urn:jboss:domain:modeshape:3.0">
      <repository name="sample" />
    </subsystem>

– Each “repository” fragment deﬁnes a repository
– Multiple are supported

28

Sample AS7 conﬁguration
(more thorough)
    <subsystem xmlns="urn:jboss:domain:modeshape:3.0">
      
      <repository name="sample"
                  cache-name="sample" cache-container="modeshape"
                  jndi-name="jcr/local/sample"
                  enable-monitoring="true"
                  default-workspace="default" allow-workspace-creation="true"
                  security-domain="modeshape-security"
                  anonymous-roles="readonly,readwrite,admin" anonymous-username="<anonymous>"
                  use-anonymous-upon-failed-authentication="false">
        <workspaces>
          
          <workspace name="predefinedWorkspace1" />
        </workspaces>

        <indexing thread-pool="modeshape-workers" batch-size="-1" reader-strategy="shared" mode="sync"
                  async-thread-pool-size="1" async-max-queue-size="1" >
          <analyzer classname="org.apache.lucene.analysis.standard.StandardAnalyzer" module="" />
          <jms-master-backend connection-factory-jndi-name="" queue-jndi-name=""/>
        </indexing>
        <file-master-index-storage rebuild-upon-startup="ifMissing" format="LUCENE_CURRENT"
                            path="modeshape/sample/indexes" relative-to="jboss.server.data.dir"
                            access-type="auto" locking-strategy="native"
                            source-path="/var/lib/modeshape/index/" refresh-in-seconds="3600"/>
        <file-binary-storage min-value-size="4096" path="modeshape/sample/binaries"
                 relative-to="jboss.server.data.dir" />
        <sequencers>
          
          <sequencer name="Java Source" classname="java"
                     path-expression="/files/(*.java)[/jcr:content] => /java/$1"/>
        </sequencers>
      </repository>
    </subsystem>
29

ModeShape repositories in JBoss AS7

Web Apps, EJBs, MDBs, etc

HTT

JDBC

JDBC

JDBC
JCR

JCR

JCR
P/RE
ST

PHP
Ruby
JavaScript HTTP/REST
Java Python

WebD
AV Repositories

JBoss AS 7

30

Current Status & Roadmap

31

ModeShape releases

Development shifted to 3.x in October 2011

32

ModeShape 3 (part 1 of 2)

• Much faster
– Order of magnitude faster, or more
– Way higher write concurrency (equivalent to node-level locking)
– Thread-safe implementations
– Memory is the new disk
– Internal caches and lazy loading
– Faster resolution of references and back-references
• Massively larger repository sizes
– Millions of nodes, or more
– Flat hierarchies (>>10K children under 1 parent)
– Very large ﬁles, without consuming heap
• More deployment options
– Large clusters
– High availability
– Multiple sites
33 – Cloud

ModeShape 3 (part 2 of 2)

• Easily embedded
– Lightweight, multi-repository engine
– Hot deployment and conﬁguration of repositories
– Windowed metrics
• JBoss AS 7 integration
– Provides lightweight, on-demand JCR subsystem
– Hot deployment and conﬁguration of repositories
– Management of domains (clusters and groups)
– Monitoring and alerting (via RHQ/JON)
• Participate in JTA transactions
– Enabling easy use of JCR in EJB, MDB, CDI, etc.
• Simpler SPIs
– Sequencers, text extractors, security providers, binary stores, and
connectors

34

Under the ModeShape 3 covers
• Use best-of-breed technology
– Infinispan: cache, key-value store and data grid
– Hibernate Search: indexing
– JGroups: clustering events
– JBoss AS7: small, fast, clusterable, manageable, cloud
– Others: RESTEasy, PicketLink, etc.
• Design techniques
– Simplify, simplify, simplify!
– Use immutability first, otherwise write concurrent code
– Cache data (especially immutable)
– Share more data between sessions
– Plan for eventual consistency
– Remove layers
– Use sequences (lazily load data, benefits large collections)
– JSON/BSON documents optimized for in-memory usage

35

Design
(How ModeShape uses Inﬁnispan)

36

ModeShape 3
Basic architecture

ModeShape Repository

...
JCR layer

...

Binary External
Storage Content Storage Systems

Storage layer

37

Using different caches for different purposes

ModeShape Repository
JCR sessions hold their changes
in memory;; will use Infinispan
...
caches that (can) overflow to disk

Shared, transient Infinispan
... (Infinispan) caches for each workspace,
(Infinispan) (Infinispan)
caching node representations and
expiring entries based on events
Binary Content Storage External
Storage (Infinispan) Systems
(Infinispan)
Each node state stored in
Infinispan cache as 1 or more
JSON/BSON documents

38
Configure Infinispan store as needed

Best practices (1 of 2)
• Build structure first, then node types
– most important to get your node structure right
– it will change over time anyway, so don’t define the node types too soon
• Use mixin node types and mixins
– where possible define sets of properties as mixins
– use in primary types and dynamically add to nodes
• Limit use of same-name-siblings
– useful when required, but can be expensive and difficult to use (i.e., paths change)
• Prefer hierarchies
– moderate numbers of child nodes, use multiple levels if necessary
• Store files and folders with ‘nt:file’ and ‘nt:folder’
– use it wherever appropriate; not for all binary data, though!
• Verify features are enabled
– improves portability and safety with configuration changes
• Import and export
– avoid document view; use system view wherever possible

40

Best practices (2 of 2)
• Prefer JCR-SQL2 and JCR-QOM over other query languages
– by far the richest and most useful
– do this even when it appears the queries are more complicated
• Only Repository is thread-safe; no other APIs are
– don’t share sessions
– don’t share anything between sessions
• Register all listeners in special long-lived sessions
– do nothing else with these sessions, however (Session is not threadsafe)
– get off the notiﬁcation thread ASAP, using work queues where necessary
– Session is not threadsafe
• Create new sessions rather than reusing a pool of sessions
– Sessions are intended to be lightweight as possible
– Create a session, use it, log out (even web applications and services!)
• Avoid deprecated APIs
– either perform poorly or are a bad idea; besides, they’ll be removed eventually
• Use Session.save() not Node.save()
41

ModeShape 3 overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to ModeShape 3 overview

Similar to ModeShape 3 overview (20)

ModeShape 3 overview