SlideShare a Scribd company logo
1 of 42
Download to read offline
Introduction to
ModeShape 3
November 28, 2012

Randall Hauch
@rhauch
Features

    Current status & roadmap

    Design (how we use Infinispan)

    Best practices

    Q&A




2
ModeShape 3
    An elastic in-memory hierarchical database
     with queries, transactions, events & more




3
Elastic
    • Add more processes to increase storage
      capacity and/or throughput
      –   No master, no slaves
      –   Data is rebalanced as needed
      –   Optionally separate database engine from storage
          processes
    • Fault tolerant
      –   Processes can fail without loss of data
      –   Cross-data center distribution (in near future)




4
Hierarchical
    • Organize the data into a tree structure that
      reflects how the data is accessed & used
      –   Navigation to related data
      –   Still have references and queries
    • Many scenarios have natural hierarchies




5
Strongly consistent
    • ACID
     –   Atomic, Consistent, Isolated, Durable
     –   Already familiar to most developers
     –   Easy to reason about code
    • XA-aware
     –   Participate in user transactions
     –   Work with Java EE




6
Why not eventually-consistent?

    • In eventually-consistent databases
     –   changes made by one client will eventually (but not
         immediately) be propagated to all processes
     –   other clients won’t see latest data right away, yet can still
         make other changes
     –   there may be multiple versions of a particular piece of data
    • Can be ideal for some scenarios
     –   read-heavy and/or best-effort
    • Applications that update data may need to
     –   expect inconsistencies (and/or multiple versions)
     –   specify conflict strategies
     –   resolve conflicts (inconsistencies)
7
In-memory
    • Memory is really fast (and cheap)
    • Why not keep all data in memory?
     –   practical limits to memory on particular machines
     –   memory isn’t shared between machines
     –   data stored in memory isn’t durable
     –   no queries, structure, or transactions
    • ModeShape
     –   distributes multiple copies of data across the
         combined memory of many machines
     –   can even persist data to disk or DB (if really needed)
     –   can still use queries, structure and transactions
     –   is fast
8
Queries
    • Find the data independently of the hierarchy
    • Use SQL-like language
      SELECT * FROM [car:Car] WHERE [car:model] LIKE ‘%Toyota%’ AND [car:year] >= 2006


      SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file]
      WHERE PATH() LIKE $path


      SELECT file.*,content.* FROM [nt:file] AS file
      JOIN [nt:resource] AS content ON ISCHILDNODE(content,file)
      WHERE file.[jcr:path] LIKE '/files/q*.2.vdb'


      SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file]
      WHERE PATH() IN (
          SELECT [vdb:originalFile] FROM [vdb:virtualDatabase]
          WHERE [vdb:version] <= $maxVersion
          AND CONTAINS([vdb:description],'xml OR xml maybe')
      )

9
With or without schema

              STRICT                                NO
           ENFORCEMENT                          ENFORCEMENT


     • Choose how much schema is enforced
      –   define patterns for values and structure
      –   use different patterns for different parts of the database
      –   change the patterns over time
      –   use the “best” levels of schema validation
      –   evolve as necessary




10
Binary storage
     • Separate storage for BINARY values
       –   content keyed by SHA-1
       –   property value stored with node
           contains SHA-1 and resolved
           as needed
       –   content always buffered
     • Option per repository
       –   File system
       –   Transient (temp directory)
       –   JDBC database
       –   MongoDB
       –   Infinispan (separate caches)       Binary  Storage
       –   custom
11
Sequencing
     • Automatically extract structured content
       –   save BINARY or STRING values
       –   path rules & MIME types determine which sequencer is run
       –   output stored in repository at configurable location
     • Sequencers
                                                                               4)  navigate  
       –   CND                                                                 or  query
       –   DDL
       –   text (fixed width, delimited)
       –   Microsoft Office™
       –   Java (source & class)            1)  upload
       –   ZIP (and JAR/WAR/EAR)
       –   XML, XSD, and WSDL
                                                2)  notify
       –   Teiid VDBs
       –   audio (MP3)
       –   images                                               3)  derive  
                                                                and  store
       –   custom                                  Sequencers
12
Federation
                            (reintroduced in 3.1)

 • Access data in external systems
     –   external data projected as nodes
         with properties and node types
     –   supports read and optional write
     –   same validation rules
 • Connector options
     –   File system (3.1)
     –   Local git (3.2)
     –   Database (3.2)
     –   Database metadata (3.2)         External  source  A

     –   Local repository (3.2)                                External  source  B
     –   External JCR repository (3.2)
     –   custom
13
Monitoring
     • Measure statistics for a variety of metrics
       –   Total counts: active sessions, queries, workspaces,
           locks, listeners, events in queue, sequencing
           operations in queue
       –   Increment counts: events sent to listeners, nodes
           changed, saves, nodes sequenced
     • Results
       –   Metrics measured every 5 seconds
       –   Results are aggregated into windows that show
           statistics (min, max, median, variance, stdev, sample
           count) during last minute, hour, day, week, year


14
Public APIs




15
JCR 2.0
     • Standard Java API (JSR-283)
      –   javax.jcr packages
      –   programmatically access, find, update, query content
      –   commonly needed features: events, versioning, etc.
      –   hierarchical tree of nodes, nodes have properties,
          property values can reference other nodes

                          databases                      file  systems
                  query                      read                        streams
                          integrity                      locking
                 schema                      write               hierarchy
                      transactions                      access  control

                     events             search
                                                          unstructured
                               versioning
                                      content  repositories

16
Extended JCR API
     • Extended JCR interfaces
      –   additional node type management methods
      –   additional event types
      –   additional Binary value methods (hash)
      –   additional JCR-QOM language objects
      –   cancel queries
      –   sequencer and text SPIs
      –   monitoring API




17
JDBC API
     • Use JDBC driver or data source
       – connect to local or remote repository

       – issue JCR-SQL2 queries

       – access database metadata

     • Enables existing applications to access content
       – ad hoc query tools

       – reporting systems




18
RESTful API
     • Access content over HTTP
      –   POST, PUT, GET, DELETE methods
      –   JSON representations
      –   Single or subtree of nodes with properties
      –   Streams large BINARY values
      –   Register node types
      –   Execute queries


     • Deployed as WAR file
      –   Same app server in which ModeShape is deployed
      –   Handles multiple repositories

19
WebDAV API
     • Exposes content as files and directories
       –   nt:file nodes exposed as files
       –   nt:folder nodes exposed as directories
       –   other nodes exposed as directories

     • Mount repository on file system
       –   Treat as external drive
       –   Upload files and folders into repository

     • Deployed as WAR file
       –   Same app server in which ModeShape is deployed
       –   Handles multiple repositories
20
Deployment options




21
ModeShape 3 and Infinispan
            Single  process

               ModeShape
                     ...


                            ...



             Infinispan cache
                  (local)




                     data




                                  Persistent Store
22
ModeShape 3 and Infinispan
                                   Small  cluster

       ModeShape                       ModeShape                             ModeShape
              ...                             ...                                  ...

                          events                                events
                    ...                             ...                                  ...



     Infinispan cache                 Infinispan cache                     Infinispan cache
        (replicated)                    (replicated)                         (replicated)
                          data                                   data




                                              data
                             data                              data




                                                          Persistent Store
23
ModeShape 3 and Infinispan
                       Moderate  single-­  or  multi-­site  cluster

     ModeShape                    ModeShape                  ModeShape                  ModeShape
           ...                          ...                        ...                        ...



                 ...
                         events               ...
                                                    events               ...
                                                                               events               ...



     Infinispan                    Infinispan                  Infinispan
                                                                                ...     Infinispan
     (distributed)                (distributed)              (distributed)              (distributed)
                         data                       data                        data




24
ModeShape 3 and Infinispan
                     Large  single-­  or  multi-­site  cluster

     ModeShape                ModeShape                ModeShape                ModeShape
         ...                      ...                      ...          ...         ...



               ...
                     events             ...
                                              events             ...
                                                                       events             ...




           data                     data                    data                          data




                                    Infinispan data grid
25
ModeShape AS7 kit




26
Deploying ModeShape in AS7
     • Simple installation
       – simply unzip into existing AS7 installation

       – includes “standalone-modeshape.xml” that contains a a

         variety of ready-to-run sample repositories

     • ModeShape subsystem for AS7
       – use AS7 tools to define 1+ repositories

       – each repository is independently configured

       – update repository configuration while running

       – (re)uses Infinispan and JGroups subsystems

       – clustering is built-in

       – perform management and monitoring operations



27
Sample AS7 configuration

         <subsystem xmlns="urn:jboss:domain:modeshape:3.0">
           <repository name="sample" />
         </subsystem>



          –   Each “repository” fragment defines a repository
          –   Multiple are supported




28
Sample AS7 configuration
                                          (more thorough)
         <subsystem xmlns="urn:jboss:domain:modeshape:3.0">
           <!-- Multiple 'repository' elements are allowed -->
           <repository name="sample"
                       cache-name="sample" cache-container="modeshape"
                       jndi-name="jcr/local/sample"
                       enable-monitoring="true"
                       default-workspace="default" allow-workspace-creation="true"
                       security-domain="modeshape-security"
                       anonymous-roles="readonly,readwrite,admin" anonymous-username="<anonymous>"
                       use-anonymous-upon-failed-authentication="false">
             <workspaces>
               <!-- 0 or more workspaces can be predefined. At the moment, these are just names.
                    But we may want to specify content or something else, so create element for each. -->
               <workspace name="predefinedWorkspace1" />
               <workspace name="predefinedWorkspace2" />
               <workspace name="predefinedWorkspace3" />
             </workspaces>
             
             <indexing thread-pool="modeshape-workers" batch-size="-1" reader-strategy="shared" mode="sync"
                       async-thread-pool-size="1" async-max-queue-size="1" >
               <analyzer classname="org.apache.lucene.analysis.standard.StandardAnalyzer" module="" />
               <jms-master-backend connection-factory-jndi-name="" queue-jndi-name=""/>
             </indexing>
             <file-master-index-storage rebuild-upon-startup="ifMissing" format="LUCENE_CURRENT"
                                 path="modeshape/sample/indexes" relative-to="jboss.server.data.dir"
                                 access-type="auto" locking-strategy="native"
                                 source-path="/var/lib/modeshape/index/" refresh-in-seconds="3600"/>
             <file-binary-storage min-value-size="4096" path="modeshape/sample/binaries"
                      relative-to="jboss.server.data.dir" />
             <sequencers>
               <!-- 0 or more sequencers -->
               <sequencer name="Java Source" classname="java"
                          path-expression="/files/(*.java)[/jcr:content] => /java/$1"/>
             </sequencers>
           </repository>
         </subsystem>
29
ModeShape repositories in JBoss AS7

                                             Web Apps, EJBs, MDBs, etc




                       HTT




                                      JDBC




                                                      JDBC




                                                                   JDBC
                                             JCR




                                                             JCR




                                                                          JCR
                          P/RE
                                 ST


           PHP
     Ruby
         JavaScript   HTTP/REST
     Java Python




                       WebD
                            AV                     Repositories


                                                             JBoss AS 7


30
Current Status & Roadmap




31
ModeShape releases




     Development  shifted  to  3.x  in  October  2011

32
ModeShape 3                 (part 1 of 2)

     • Much faster
       –   Order of magnitude faster, or more
       –   Way higher write concurrency (equivalent to node-level locking)
       –   Thread-safe implementations
       –   Memory is the new disk
       –   Internal caches and lazy loading
       –   Faster resolution of references and back-references
     • Massively larger repository sizes
       –   Millions of nodes, or more
       –   Flat hierarchies (>>10K children under 1 parent)
       –   Very large files, without consuming heap
     • More deployment options
       –   Large clusters
       –   High availability
       –   Multiple sites
33     –   Cloud
ModeShape 3                    (part 2 of 2)

     • Easily embedded
       –   Lightweight, multi-repository engine
       –   Hot deployment and configuration of repositories
       –   Windowed metrics
     • JBoss AS 7 integration
       –   Provides lightweight, on-demand JCR subsystem
       –   Hot deployment and configuration of repositories
       –   Management of domains (clusters and groups)
       –   Monitoring and alerting (via RHQ/JON)
     • Participate in JTA transactions
       –   Enabling easy use of JCR in EJB, MDB, CDI, etc.
     • Simpler SPIs
       –   Sequencers, text extractors, security providers, binary stores, and
           connectors

34
Under the ModeShape 3 covers
     • Use best-of-breed technology
       –   Infinispan: cache, key-value store and data grid
       –   Hibernate Search: indexing
       –   JGroups: clustering events
       –   JBoss AS7: small, fast, clusterable, manageable, cloud
       –   Others: RESTEasy, PicketLink, etc.
     • Design techniques
       –   Simplify, simplify, simplify!
       –   Use immutability first, otherwise write concurrent code
       –   Cache data (especially immutable)
       –   Share more data between sessions
       –   Plan for eventual consistency
       –   Remove layers
       –   Use sequences (lazily load data, benefits large collections)
       –   JSON/BSON documents optimized for in-memory usage


35
Design
     (How ModeShape uses Infinispan)




36
ModeShape 3
               Basic  architecture

                ModeShape Repository

                            ...
                                                      JCR  layer


                                     ...


      Binary                               External
     Storage       Content Storage         Systems


                                                      Storage  layer




37
ModeShape 3 and Infinispan
         Using  different  caches  for  different  purposes

                   ModeShape Repository
                                                            JCR  sessions  hold  their  changes  
                                                            in  memory;;  will  use  Infinispan  
                                ...
                                                            caches  that  (can)  overflow  to  disk

                                                            Shared,  transient  Infinispan  
                                        ...   (Infinispan)   caches  for  each  workspace,  
        (Infinispan)     (Infinispan)
                                                            caching  node  representations  and  
                                                            expiring  entries  based  on  events
       Binary         Content Storage            External
      Storage           (Infinispan)              Systems
     (Infinispan)
                                                            Each  node  state  stored  in  
                                                            Infinispan  cache  as  1  or  more
                                                            JSON/BSON  documents



38
                   Configure  Infinispan  store  as  needed
Best Practices




39
Best practices (1 of 2)
     • Build structure first, then node types
       –   most important to get your node structure right
       –   it will change over time anyway, so don’t define the node types too soon
     • Use mixin node types and mixins
       –   where possible define sets of properties as mixins
       –   use in primary types and dynamically add to nodes
     • Limit use of same-name-siblings
       –   useful when required, but can be expensive and difficult to use (i.e., paths change)
     • Prefer hierarchies
       –   moderate numbers of child nodes, use multiple levels if necessary
     • Store files and folders with ‘nt:file’ and ‘nt:folder’
       –   use it wherever appropriate; not for all binary data, though!
     • Verify features are enabled
       –   improves portability and safety with configuration changes
     • Import and export
       –   avoid document view; use system view wherever possible

40
Best practices (2 of 2)
     • Prefer JCR-SQL2 and JCR-QOM over other query languages
       –   by far the richest and most useful
       –   do this even when it appears the queries are more complicated
     • Only Repository is thread-safe; no other APIs are
       –   don’t share sessions
       –   don’t share anything between sessions
     • Register all listeners in special long-lived sessions
       –   do nothing else with these sessions, however (Session is not threadsafe)
       –   get off the notification thread ASAP, using work queues where necessary
       –   Session is not threadsafe
     • Create new sessions rather than reusing a pool of sessions
       –   Sessions are intended to be lightweight as possible
       –   Create a session, use it, log out (even web applications and services!)
     • Avoid deprecated APIs
       –   either perform poorly or are a bad idea; besides, they’ll be removed eventually
     • Use Session.save() not Node.save()
41
Questions?




42

More Related Content

What's hot

Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingSergey Bushik
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
Real World Experience: Integrating DB2 with XPages
Real World Experience: Integrating DB2 with XPagesReal World Experience: Integrating DB2 with XPages
Real World Experience: Integrating DB2 with XPagesSteve_Zavocki
 
인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처Jaehong Cheon
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL DatabaseSteve Min
 
인피니스팬 데이터그리드 플랫폼
인피니스팬 데이터그리드 플랫폼인피니스팬 데이터그리드 플랫폼
인피니스팬 데이터그리드 플랫폼Jaehong Cheon
 
A comparison between several no sql databases with comments and notes
A comparison between several no sql databases with comments and notesA comparison between several no sql databases with comments and notes
A comparison between several no sql databases with comments and notesJoão Gabriel Lima
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Hibernate in XPages
Hibernate in XPagesHibernate in XPages
Hibernate in XPagesToby Samples
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss
 
Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3IMC Institute
 
On Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and FutureOn Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and Futurepcmanus
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...srisatish ambati
 
PostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLPostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLAlexei Krasner
 
Mongo DBを半年運用してみた
Mongo DBを半年運用してみたMongo DBを半年運用してみた
Mongo DBを半年運用してみたMasakazu Matsushita
 

What's hot (20)

Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
 
Real World Experience: Integrating DB2 with XPages
Real World Experience: Integrating DB2 with XPagesReal World Experience: Integrating DB2 with XPages
Real World Experience: Integrating DB2 with XPages
 
인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처인메모리 클러스터링 아키텍처
인메모리 클러스터링 아키텍처
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL Database
 
인피니스팬 데이터그리드 플랫폼
인피니스팬 데이터그리드 플랫폼인피니스팬 데이터그리드 플랫폼
인피니스팬 데이터그리드 플랫폼
 
A comparison between several no sql databases with comments and notes
A comparison between several no sql databases with comments and notesA comparison between several no sql databases with comments and notes
A comparison between several no sql databases with comments and notes
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Hibernate in XPages
Hibernate in XPagesHibernate in XPages
Hibernate in XPages
 
ORACLE 9i
ORACLE 9iORACLE 9i
ORACLE 9i
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3Java Web Programming Using Cloud Platform: Module 3
Java Web Programming Using Cloud Platform: Module 3
 
On Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and FutureOn Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and Future
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
 
PostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLPostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQL
 
Hibernate presentation
Hibernate presentationHibernate presentation
Hibernate presentation
 
Development of 8.3 In India
Development of 8.3 In IndiaDevelopment of 8.3 In India
Development of 8.3 In India
 
Mongo DBを半年運用してみた
Mongo DBを半年運用してみたMongo DBを半年運用してみた
Mongo DBを半年運用してみた
 
jboss.org-jboss.com
jboss.org-jboss.comjboss.org-jboss.com
jboss.org-jboss.com
 

Viewers also liked

Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiJukka Zitting
 
WatITis2013: Why API? (cpbell)
WatITis2013: Why API? (cpbell)WatITis2013: Why API? (cpbell)
WatITis2013: Why API? (cpbell)Colin Bell
 
The Zero Bullshit Architecture
The Zero Bullshit ArchitectureThe Zero Bullshit Architecture
The Zero Bullshit ArchitectureLars Trieloff
 
Welcome to the API Economy: Developing Your API Strategy
Welcome to the API Economy: Developing Your API StrategyWelcome to the API Economy: Developing Your API Strategy
Welcome to the API Economy: Developing Your API StrategyMuleSoft
 

Viewers also liked (6)

Introduction to JCR
Introduction to JCR Introduction to JCR
Introduction to JCR
 
Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache Jackrabbi
 
WatITis2013: Why API? (cpbell)
WatITis2013: Why API? (cpbell)WatITis2013: Why API? (cpbell)
WatITis2013: Why API? (cpbell)
 
The Zero Bullshit Architecture
The Zero Bullshit ArchitectureThe Zero Bullshit Architecture
The Zero Bullshit Architecture
 
JCR In 10 Minutes
JCR In 10 MinutesJCR In 10 Minutes
JCR In 10 Minutes
 
Welcome to the API Economy: Developing Your API Strategy
Welcome to the API Economy: Developing Your API StrategyWelcome to the API Economy: Developing Your API Strategy
Welcome to the API Economy: Developing Your API Strategy
 

Similar to ModeShape 3 overview

Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913jasonfrantz
 
hStorage-DB
hStorage-DBhStorage-DB
hStorage-DBJack_L
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillDataWorks Summit
 
Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-researchsaintdevil163
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Sandeep Kunkunuru
 
Everything You Need To Know About Persistent Storage in Kubernetes
Everything You Need To Know About Persistent Storage in KubernetesEverything You Need To Know About Persistent Storage in Kubernetes
Everything You Need To Know About Persistent Storage in KubernetesThe {code} Team
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Initial review of Firebird 3
Initial review of Firebird 3Initial review of Firebird 3
Initial review of Firebird 3Mind The Firebird
 
Database and Java Database Connectivity
Database and Java Database ConnectivityDatabase and Java Database Connectivity
Database and Java Database ConnectivityGary Yeh
 
Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011GlusterFS
 
Mac Memory Analysis with Volatility
Mac Memory Analysis with VolatilityMac Memory Analysis with Volatility
Mac Memory Analysis with VolatilityAndrew Case
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Jian Qin
 

Similar to ModeShape 3 overview (20)

Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
hStorage-DB
hStorage-DBhStorage-DB
hStorage-DB
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-research
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Sql Server2008
Sql Server2008Sql Server2008
Sql Server2008
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1
 
Everything You Need To Know About Persistent Storage in Kubernetes
Everything You Need To Know About Persistent Storage in KubernetesEverything You Need To Know About Persistent Storage in Kubernetes
Everything You Need To Know About Persistent Storage in Kubernetes
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Initial review of Firebird 3
Initial review of Firebird 3Initial review of Firebird 3
Initial review of Firebird 3
 
Database and Java Database Connectivity
Database and Java Database ConnectivityDatabase and Java Database Connectivity
Database and Java Database Connectivity
 
Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011
 
Mac Memory Analysis with Volatility
Mac Memory Analysis with VolatilityMac Memory Analysis with Volatility
Mac Memory Analysis with Volatility
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 

ModeShape 3 overview

  • 1. Introduction to ModeShape 3 November 28, 2012 Randall Hauch @rhauch
  • 2. Features Current status & roadmap Design (how we use Infinispan) Best practices Q&A 2
  • 3. ModeShape 3 An elastic in-memory hierarchical database with queries, transactions, events & more 3
  • 4. Elastic • Add more processes to increase storage capacity and/or throughput – No master, no slaves – Data is rebalanced as needed – Optionally separate database engine from storage processes • Fault tolerant – Processes can fail without loss of data – Cross-data center distribution (in near future) 4
  • 5. Hierarchical • Organize the data into a tree structure that reflects how the data is accessed & used – Navigation to related data – Still have references and queries • Many scenarios have natural hierarchies 5
  • 6. Strongly consistent • ACID – Atomic, Consistent, Isolated, Durable – Already familiar to most developers – Easy to reason about code • XA-aware – Participate in user transactions – Work with Java EE 6
  • 7. Why not eventually-consistent? • In eventually-consistent databases – changes made by one client will eventually (but not immediately) be propagated to all processes – other clients won’t see latest data right away, yet can still make other changes – there may be multiple versions of a particular piece of data • Can be ideal for some scenarios – read-heavy and/or best-effort • Applications that update data may need to – expect inconsistencies (and/or multiple versions) – specify conflict strategies – resolve conflicts (inconsistencies) 7
  • 8. In-memory • Memory is really fast (and cheap) • Why not keep all data in memory? – practical limits to memory on particular machines – memory isn’t shared between machines – data stored in memory isn’t durable – no queries, structure, or transactions • ModeShape – distributes multiple copies of data across the combined memory of many machines – can even persist data to disk or DB (if really needed) – can still use queries, structure and transactions – is fast 8
  • 9. Queries • Find the data independently of the hierarchy • Use SQL-like language SELECT * FROM [car:Car] WHERE [car:model] LIKE ‘%Toyota%’ AND [car:year] >= 2006 SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() LIKE $path SELECT file.*,content.* FROM [nt:file] AS file JOIN [nt:resource] AS content ON ISCHILDNODE(content,file) WHERE file.[jcr:path] LIKE '/files/q*.2.vdb' SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() IN ( SELECT [vdb:originalFile] FROM [vdb:virtualDatabase] WHERE [vdb:version] <= $maxVersion AND CONTAINS([vdb:description],'xml OR xml maybe') ) 9
  • 10. With or without schema STRICT NO ENFORCEMENT ENFORCEMENT • Choose how much schema is enforced – define patterns for values and structure – use different patterns for different parts of the database – change the patterns over time – use the “best” levels of schema validation – evolve as necessary 10
  • 11. Binary storage • Separate storage for BINARY values – content keyed by SHA-1 – property value stored with node contains SHA-1 and resolved as needed – content always buffered • Option per repository – File system – Transient (temp directory) – JDBC database – MongoDB – Infinispan (separate caches) Binary  Storage – custom 11
  • 12. Sequencing • Automatically extract structured content – save BINARY or STRING values – path rules & MIME types determine which sequencer is run – output stored in repository at configurable location • Sequencers 4)  navigate   – CND or  query – DDL – text (fixed width, delimited) – Microsoft Office™ – Java (source & class) 1)  upload – ZIP (and JAR/WAR/EAR) – XML, XSD, and WSDL 2)  notify – Teiid VDBs – audio (MP3) – images 3)  derive   and  store – custom Sequencers 12
  • 13. Federation (reintroduced in 3.1) • Access data in external systems – external data projected as nodes with properties and node types – supports read and optional write – same validation rules • Connector options – File system (3.1) – Local git (3.2) – Database (3.2) – Database metadata (3.2) External  source  A – Local repository (3.2) External  source  B – External JCR repository (3.2) – custom 13
  • 14. Monitoring • Measure statistics for a variety of metrics – Total counts: active sessions, queries, workspaces, locks, listeners, events in queue, sequencing operations in queue – Increment counts: events sent to listeners, nodes changed, saves, nodes sequenced • Results – Metrics measured every 5 seconds – Results are aggregated into windows that show statistics (min, max, median, variance, stdev, sample count) during last minute, hour, day, week, year 14
  • 16. JCR 2.0 • Standard Java API (JSR-283) – javax.jcr packages – programmatically access, find, update, query content – commonly needed features: events, versioning, etc. – hierarchical tree of nodes, nodes have properties, property values can reference other nodes databases file  systems query read streams integrity locking schema write hierarchy transactions access  control events search unstructured versioning content  repositories 16
  • 17. Extended JCR API • Extended JCR interfaces – additional node type management methods – additional event types – additional Binary value methods (hash) – additional JCR-QOM language objects – cancel queries – sequencer and text SPIs – monitoring API 17
  • 18. JDBC API • Use JDBC driver or data source – connect to local or remote repository – issue JCR-SQL2 queries – access database metadata • Enables existing applications to access content – ad hoc query tools – reporting systems 18
  • 19. RESTful API • Access content over HTTP – POST, PUT, GET, DELETE methods – JSON representations – Single or subtree of nodes with properties – Streams large BINARY values – Register node types – Execute queries • Deployed as WAR file – Same app server in which ModeShape is deployed – Handles multiple repositories 19
  • 20. WebDAV API • Exposes content as files and directories – nt:file nodes exposed as files – nt:folder nodes exposed as directories – other nodes exposed as directories • Mount repository on file system – Treat as external drive – Upload files and folders into repository • Deployed as WAR file – Same app server in which ModeShape is deployed – Handles multiple repositories 20
  • 22. ModeShape 3 and Infinispan Single  process ModeShape ... ... Infinispan cache (local) data Persistent Store 22
  • 23. ModeShape 3 and Infinispan Small  cluster ModeShape ModeShape ModeShape ... ... ... events events ... ... ... Infinispan cache Infinispan cache Infinispan cache (replicated) (replicated) (replicated) data data data data data Persistent Store 23
  • 24. ModeShape 3 and Infinispan Moderate  single-­  or  multi-­site  cluster ModeShape ModeShape ModeShape ModeShape ... ... ... ... ... events ... events ... events ... Infinispan Infinispan Infinispan ... Infinispan (distributed) (distributed) (distributed) (distributed) data data data 24
  • 25. ModeShape 3 and Infinispan Large  single-­  or  multi-­site  cluster ModeShape ModeShape ModeShape ModeShape ... ... ... ... ... ... events ... events ... events ... data data data data Infinispan data grid 25
  • 27. Deploying ModeShape in AS7 • Simple installation – simply unzip into existing AS7 installation – includes “standalone-modeshape.xml” that contains a a variety of ready-to-run sample repositories • ModeShape subsystem for AS7 – use AS7 tools to define 1+ repositories – each repository is independently configured – update repository configuration while running – (re)uses Infinispan and JGroups subsystems – clustering is built-in – perform management and monitoring operations 27
  • 28. Sample AS7 configuration     <subsystem xmlns="urn:jboss:domain:modeshape:3.0">       <repository name="sample" />     </subsystem> – Each “repository” fragment defines a repository – Multiple are supported 28
  • 29. Sample AS7 configuration (more thorough)     <subsystem xmlns="urn:jboss:domain:modeshape:3.0">       <!-- Multiple 'repository' elements are allowed -->       <repository name="sample"                   cache-name="sample" cache-container="modeshape"                   jndi-name="jcr/local/sample"                   enable-monitoring="true"                   default-workspace="default" allow-workspace-creation="true"                   security-domain="modeshape-security"                   anonymous-roles="readonly,readwrite,admin" anonymous-username="<anonymous>"                   use-anonymous-upon-failed-authentication="false">         <workspaces>           <!-- 0 or more workspaces can be predefined. At the moment, these are just names. But we may want to specify content or something else, so create element for each. -->           <workspace name="predefinedWorkspace1" />           <workspace name="predefinedWorkspace2" />           <workspace name="predefinedWorkspace3" />         </workspaces>                  <indexing thread-pool="modeshape-workers" batch-size="-1" reader-strategy="shared" mode="sync"                   async-thread-pool-size="1" async-max-queue-size="1" >           <analyzer classname="org.apache.lucene.analysis.standard.StandardAnalyzer" module="" />           <jms-master-backend connection-factory-jndi-name="" queue-jndi-name=""/>         </indexing>         <file-master-index-storage rebuild-upon-startup="ifMissing" format="LUCENE_CURRENT"                             path="modeshape/sample/indexes" relative-to="jboss.server.data.dir"                             access-type="auto" locking-strategy="native"                             source-path="/var/lib/modeshape/index/" refresh-in-seconds="3600"/>         <file-binary-storage min-value-size="4096" path="modeshape/sample/binaries"                  relative-to="jboss.server.data.dir" />         <sequencers>           <!-- 0 or more sequencers -->           <sequencer name="Java Source" classname="java"                      path-expression="/files/(*.java)[/jcr:content] => /java/$1"/>         </sequencers>       </repository>     </subsystem> 29
  • 30. ModeShape repositories in JBoss AS7 Web Apps, EJBs, MDBs, etc HTT JDBC JDBC JDBC JCR JCR JCR P/RE ST PHP Ruby JavaScript HTTP/REST Java Python WebD AV Repositories JBoss AS 7 30
  • 31. Current Status & Roadmap 31
  • 32. ModeShape releases Development  shifted  to  3.x  in  October  2011 32
  • 33. ModeShape 3 (part 1 of 2) • Much faster – Order of magnitude faster, or more – Way higher write concurrency (equivalent to node-level locking) – Thread-safe implementations – Memory is the new disk – Internal caches and lazy loading – Faster resolution of references and back-references • Massively larger repository sizes – Millions of nodes, or more – Flat hierarchies (>>10K children under 1 parent) – Very large files, without consuming heap • More deployment options – Large clusters – High availability – Multiple sites 33 – Cloud
  • 34. ModeShape 3 (part 2 of 2) • Easily embedded – Lightweight, multi-repository engine – Hot deployment and configuration of repositories – Windowed metrics • JBoss AS 7 integration – Provides lightweight, on-demand JCR subsystem – Hot deployment and configuration of repositories – Management of domains (clusters and groups) – Monitoring and alerting (via RHQ/JON) • Participate in JTA transactions – Enabling easy use of JCR in EJB, MDB, CDI, etc. • Simpler SPIs – Sequencers, text extractors, security providers, binary stores, and connectors 34
  • 35. Under the ModeShape 3 covers • Use best-of-breed technology – Infinispan: cache, key-value store and data grid – Hibernate Search: indexing – JGroups: clustering events – JBoss AS7: small, fast, clusterable, manageable, cloud – Others: RESTEasy, PicketLink, etc. • Design techniques – Simplify, simplify, simplify! – Use immutability first, otherwise write concurrent code – Cache data (especially immutable) – Share more data between sessions – Plan for eventual consistency – Remove layers – Use sequences (lazily load data, benefits large collections) – JSON/BSON documents optimized for in-memory usage 35
  • 36. Design (How ModeShape uses Infinispan) 36
  • 37. ModeShape 3 Basic  architecture ModeShape Repository ... JCR  layer ... Binary External Storage Content Storage Systems Storage  layer 37
  • 38. ModeShape 3 and Infinispan Using  different  caches  for  different  purposes ModeShape Repository JCR  sessions  hold  their  changes   in  memory;;  will  use  Infinispan   ... caches  that  (can)  overflow  to  disk Shared,  transient  Infinispan   ... (Infinispan) caches  for  each  workspace,   (Infinispan) (Infinispan) caching  node  representations  and   expiring  entries  based  on  events Binary Content Storage External Storage (Infinispan) Systems (Infinispan) Each  node  state  stored  in   Infinispan  cache  as  1  or  more JSON/BSON  documents 38 Configure  Infinispan  store  as  needed
  • 40. Best practices (1 of 2) • Build structure first, then node types – most important to get your node structure right – it will change over time anyway, so don’t define the node types too soon • Use mixin node types and mixins – where possible define sets of properties as mixins – use in primary types and dynamically add to nodes • Limit use of same-name-siblings – useful when required, but can be expensive and difficult to use (i.e., paths change) • Prefer hierarchies – moderate numbers of child nodes, use multiple levels if necessary • Store files and folders with ‘nt:file’ and ‘nt:folder’ – use it wherever appropriate; not for all binary data, though! • Verify features are enabled – improves portability and safety with configuration changes • Import and export – avoid document view; use system view wherever possible 40
  • 41. Best practices (2 of 2) • Prefer JCR-SQL2 and JCR-QOM over other query languages – by far the richest and most useful – do this even when it appears the queries are more complicated • Only Repository is thread-safe; no other APIs are – don’t share sessions – don’t share anything between sessions • Register all listeners in special long-lived sessions – do nothing else with these sessions, however (Session is not threadsafe) – get off the notification thread ASAP, using work queues where necessary – Session is not threadsafe • Create new sessions rather than reusing a pool of sessions – Sessions are intended to be lightweight as possible – Create a session, use it, log out (even web applications and services!) • Avoid deprecated APIs – either perform poorly or are a bad idea; besides, they’ll be removed eventually • Use Session.save() not Node.save() 41