ModeShape 3 overview


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

ModeShape 3 overview

  1. 1. Introduction toModeShape 3November 28, 2012Randall Hauch@rhauch
  2. 2. Features Current status & roadmap Design (how we use Infinispan) Best practices Q&A2
  3. 3. ModeShape 3 An elastic in-memory hierarchical database with queries, transactions, events & more3
  4. 4. Elastic • Add more processes to increase storage capacity and/or throughput – No master, no slaves – Data is rebalanced as needed – Optionally separate database engine from storage processes • Fault tolerant – Processes can fail without loss of data – Cross-data center distribution (in near future)4
  5. 5. Hierarchical • Organize the data into a tree structure that reflects how the data is accessed & used – Navigation to related data – Still have references and queries • Many scenarios have natural hierarchies5
  6. 6. Strongly consistent • ACID – Atomic, Consistent, Isolated, Durable – Already familiar to most developers – Easy to reason about code • XA-aware – Participate in user transactions – Work with Java EE6
  7. 7. Why not eventually-consistent? • In eventually-consistent databases – changes made by one client will eventually (but not immediately) be propagated to all processes – other clients won’t see latest data right away, yet can still make other changes – there may be multiple versions of a particular piece of data • Can be ideal for some scenarios – read-heavy and/or best-effort • Applications that update data may need to – expect inconsistencies (and/or multiple versions) – specify conflict strategies – resolve conflicts (inconsistencies)7
  8. 8. In-memory • Memory is really fast (and cheap) • Why not keep all data in memory? – practical limits to memory on particular machines – memory isn’t shared between machines – data stored in memory isn’t durable – no queries, structure, or transactions • ModeShape – distributes multiple copies of data across the combined memory of many machines – can even persist data to disk or DB (if really needed) – can still use queries, structure and transactions – is fast8
  9. 9. Queries • Find the data independently of the hierarchy • Use SQL-like language SELECT * FROM [car:Car] WHERE [car:model] LIKE ‘%Toyota%’ AND [car:year] >= 2006 SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() LIKE $path SELECT file.*,content.* FROM [nt:file] AS file JOIN [nt:resource] AS content ON ISCHILDNODE(content,file) WHERE file.[jcr:path] LIKE /files/q*.2.vdb SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() IN ( SELECT [vdb:originalFile] FROM [vdb:virtualDatabase] WHERE [vdb:version] <= $maxVersion AND CONTAINS([vdb:description],xml OR xml maybe) )9
  10. 10. With or without schema STRICT NO ENFORCEMENT ENFORCEMENT • Choose how much schema is enforced – define patterns for values and structure – use different patterns for different parts of the database – change the patterns over time – use the “best” levels of schema validation – evolve as necessary10
  11. 11. Binary storage • Separate storage for BINARY values – content keyed by SHA-1 – property value stored with node contains SHA-1 and resolved as needed – content always buffered • Option per repository – File system – Transient (temp directory) – JDBC database – MongoDB – Infinispan (separate caches) Binary  Storage – custom11
  12. 12. Sequencing • Automatically extract structured content – save BINARY or STRING values – path rules & MIME types determine which sequencer is run – output stored in repository at configurable location • Sequencers 4)  navigate   – CND or  query – DDL – text (fixed width, delimited) – Microsoft Office™ – Java (source & class) 1)  upload – ZIP (and JAR/WAR/EAR) – XML, XSD, and WSDL 2)  notify – Teiid VDBs – audio (MP3) – images 3)  derive   and  store – custom Sequencers12
  13. 13. Federation (reintroduced in 3.1) • Access data in external systems – external data projected as nodes with properties and node types – supports read and optional write – same validation rules • Connector options – File system (3.1) – Local git (3.2) – Database (3.2) – Database metadata (3.2) External  source  A – Local repository (3.2) External  source  B – External JCR repository (3.2) – custom13
  14. 14. Monitoring • Measure statistics for a variety of metrics – Total counts: active sessions, queries, workspaces, locks, listeners, events in queue, sequencing operations in queue – Increment counts: events sent to listeners, nodes changed, saves, nodes sequenced • Results – Metrics measured every 5 seconds – Results are aggregated into windows that show statistics (min, max, median, variance, stdev, sample count) during last minute, hour, day, week, year14
  15. 15. Public APIs15
  16. 16. JCR 2.0 • Standard Java API (JSR-283) – javax.jcr packages – programmatically access, find, update, query content – commonly needed features: events, versioning, etc. – hierarchical tree of nodes, nodes have properties, property values can reference other nodes databases file  systems query read streams integrity locking schema write hierarchy transactions access  control events search unstructured versioning content  repositories16
  17. 17. Extended JCR API • Extended JCR interfaces – additional node type management methods – additional event types – additional Binary value methods (hash) – additional JCR-QOM language objects – cancel queries – sequencer and text SPIs – monitoring API17
  18. 18. JDBC API • Use JDBC driver or data source – connect to local or remote repository – issue JCR-SQL2 queries – access database metadata • Enables existing applications to access content – ad hoc query tools – reporting systems18
  19. 19. RESTful API • Access content over HTTP – POST, PUT, GET, DELETE methods – JSON representations – Single or subtree of nodes with properties – Streams large BINARY values – Register node types – Execute queries • Deployed as WAR file – Same app server in which ModeShape is deployed – Handles multiple repositories19
  20. 20. WebDAV API • Exposes content as files and directories – nt:file nodes exposed as files – nt:folder nodes exposed as directories – other nodes exposed as directories • Mount repository on file system – Treat as external drive – Upload files and folders into repository • Deployed as WAR file – Same app server in which ModeShape is deployed – Handles multiple repositories20
  21. 21. Deployment options21
  22. 22. ModeShape 3 and Infinispan Single  process ModeShape ... ... Infinispan cache (local) data Persistent Store22
  23. 23. ModeShape 3 and Infinispan Small  cluster ModeShape ModeShape ModeShape ... ... ... events events ... ... ... Infinispan cache Infinispan cache Infinispan cache (replicated) (replicated) (replicated) data data data data data Persistent Store23
  24. 24. ModeShape 3 and Infinispan Moderate  single-­  or  multi-­site  cluster ModeShape ModeShape ModeShape ModeShape ... ... ... ... ... events ... events ... events ... Infinispan Infinispan Infinispan ... Infinispan (distributed) (distributed) (distributed) (distributed) data data data24
  25. 25. ModeShape 3 and Infinispan Large  single-­  or  multi-­site  cluster ModeShape ModeShape ModeShape ModeShape ... ... ... ... ... ... events ... events ... events ... data data data data Infinispan data grid25
  26. 26. ModeShape AS7 kit26
  27. 27. Deploying ModeShape in AS7 • Simple installation – simply unzip into existing AS7 installation – includes “standalone-modeshape.xml” that contains a a variety of ready-to-run sample repositories • ModeShape subsystem for AS7 – use AS7 tools to define 1+ repositories – each repository is independently configured – update repository configuration while running – (re)uses Infinispan and JGroups subsystems – clustering is built-in – perform management and monitoring operations27
  28. 28. Sample AS7 configuration     <subsystem xmlns="urn:jboss:domain:modeshape:3.0">       <repository name="sample" />     </subsystem> – Each “repository” fragment defines a repository – Multiple are supported28
  29. 29. Sample AS7 configuration (more thorough)     <subsystem xmlns="urn:jboss:domain:modeshape:3.0">       <!-- Multiple repository elements are allowed -->       <repository name="sample"                   cache-name="sample" cache-container="modeshape"                   jndi-name="jcr/local/sample"                   enable-monitoring="true"                   default-workspace="default" allow-workspace-creation="true"                   security-domain="modeshape-security"                   anonymous-roles="readonly,readwrite,admin" anonymous-username="<anonymous>"                   use-anonymous-upon-failed-authentication="false">         <workspaces>           <!-- 0 or more workspaces can be predefined. At the moment, these are just names. But we may want to specify content or something else, so create element for each. -->           <workspace name="predefinedWorkspace1" />           <workspace name="predefinedWorkspace2" />           <workspace name="predefinedWorkspace3" />         </workspaces>                  <indexing thread-pool="modeshape-workers" batch-size="-1" reader-strategy="shared" mode="sync"                   async-thread-pool-size="1" async-max-queue-size="1" >           <analyzer classname="org.apache.lucene.analysis.standard.StandardAnalyzer" module="" />           <jms-master-backend connection-factory-jndi-name="" queue-jndi-name=""/>         </indexing>         <file-master-index-storage rebuild-upon-startup="ifMissing" format="LUCENE_CURRENT"                             path="modeshape/sample/indexes" relative-to=""                             access-type="auto" locking-strategy="native"                             source-path="/var/lib/modeshape/index/" refresh-in-seconds="3600"/>         <file-binary-storage min-value-size="4096" path="modeshape/sample/binaries"                  relative-to="" />         <sequencers>           <!-- 0 or more sequencers -->           <sequencer name="Java Source" classname="java"                      path-expression="/files/(*.java)[/jcr:content] => /java/$1"/>         </sequencers>       </repository>     </subsystem>29
  30. 30. ModeShape repositories in JBoss AS7 Web Apps, EJBs, MDBs, etc HTT JDBC JDBC JDBC JCR JCR JCR P/RE ST PHP Ruby JavaScript HTTP/REST Java Python WebD AV Repositories JBoss AS 730
  31. 31. Current Status & Roadmap31
  32. 32. ModeShape releases Development  shifted  to  3.x  in  October  201132
  33. 33. ModeShape 3 (part 1 of 2) • Much faster – Order of magnitude faster, or more – Way higher write concurrency (equivalent to node-level locking) – Thread-safe implementations – Memory is the new disk – Internal caches and lazy loading – Faster resolution of references and back-references • Massively larger repository sizes – Millions of nodes, or more – Flat hierarchies (>>10K children under 1 parent) – Very large files, without consuming heap • More deployment options – Large clusters – High availability – Multiple sites33 – Cloud
  34. 34. ModeShape 3 (part 2 of 2) • Easily embedded – Lightweight, multi-repository engine – Hot deployment and configuration of repositories – Windowed metrics • JBoss AS 7 integration – Provides lightweight, on-demand JCR subsystem – Hot deployment and configuration of repositories – Management of domains (clusters and groups) – Monitoring and alerting (via RHQ/JON) • Participate in JTA transactions – Enabling easy use of JCR in EJB, MDB, CDI, etc. • Simpler SPIs – Sequencers, text extractors, security providers, binary stores, and connectors34
  35. 35. Under the ModeShape 3 covers • Use best-of-breed technology – Infinispan: cache, key-value store and data grid – Hibernate Search: indexing – JGroups: clustering events – JBoss AS7: small, fast, clusterable, manageable, cloud – Others: RESTEasy, PicketLink, etc. • Design techniques – Simplify, simplify, simplify! – Use immutability first, otherwise write concurrent code – Cache data (especially immutable) – Share more data between sessions – Plan for eventual consistency – Remove layers – Use sequences (lazily load data, benefits large collections) – JSON/BSON documents optimized for in-memory usage35
  36. 36. Design (How ModeShape uses Infinispan)36
  37. 37. ModeShape 3 Basic  architecture ModeShape Repository ... JCR  layer ... Binary External Storage Content Storage Systems Storage  layer37
  38. 38. ModeShape 3 and Infinispan Using  different  caches  for  different  purposes ModeShape Repository JCR  sessions  hold  their  changes   in  memory;;  will  use  Infinispan   ... caches  that  (can)  overflow  to  disk Shared,  transient  Infinispan   ... (Infinispan) caches  for  each  workspace,   (Infinispan) (Infinispan) caching  node  representations  and   expiring  entries  based  on  events Binary Content Storage External Storage (Infinispan) Systems (Infinispan) Each  node  state  stored  in   Infinispan  cache  as  1  or  more JSON/BSON  documents38 Configure  Infinispan  store  as  needed
  39. 39. Best Practices39
  40. 40. Best practices (1 of 2) • Build structure first, then node types – most important to get your node structure right – it will change over time anyway, so don’t define the node types too soon • Use mixin node types and mixins – where possible define sets of properties as mixins – use in primary types and dynamically add to nodes • Limit use of same-name-siblings – useful when required, but can be expensive and difficult to use (i.e., paths change) • Prefer hierarchies – moderate numbers of child nodes, use multiple levels if necessary • Store files and folders with ‘nt:file’ and ‘nt:folder’ – use it wherever appropriate; not for all binary data, though! • Verify features are enabled – improves portability and safety with configuration changes • Import and export – avoid document view; use system view wherever possible40
  41. 41. Best practices (2 of 2) • Prefer JCR-SQL2 and JCR-QOM over other query languages – by far the richest and most useful – do this even when it appears the queries are more complicated • Only Repository is thread-safe; no other APIs are – don’t share sessions – don’t share anything between sessions • Register all listeners in special long-lived sessions – do nothing else with these sessions, however (Session is not threadsafe) – get off the notification thread ASAP, using work queues where necessary – Session is not threadsafe • Create new sessions rather than reusing a pool of sessions – Sessions are intended to be lightweight as possible – Create a session, use it, log out (even web applications and services!) • Avoid deprecated APIs – either perform poorly or are a bad idea; besides, they’ll be removed eventually • Use not
  42. 42. Questions?42