Climbing the Beanstalk  Scaling Java Persistence to the Cloud Gordon Yorke JPA 2.1 Expert Group EclipseLink Architecture Council
EclipseLink Project Provides JPA, JAXB, SDO, DBWS, and EIS persistence services Open source Eclipse project Project Lead by Oracle Founded by Oracle with the contribution of full TopLink source code and tests Based upon product with 12+ years of commercial usage
Improving the system Reached the limits of current resources Want to improve system throughput latency user experience Optimized the application Plan for Concurrency Scale data tier / Scale mid-tier
Optimization Steps Performance evaluation environment Profiling tools Regression testing Focus Efficient access to data Lazy loading Fetch Groups JOIN FETCH / Batch Reading Caching Heap usage / Persistence Context efficiency Loading only what you need into an EntityManager Synchronization
Concurrency Plan for volatile vs static data Transaction isolation Handle multiple processes updating same data  Locking Optimistic locking  Data that tends to be static Pessimistic locking Volatile data Can be more efficient to lock database than respond to Optimistic Locks Currency requirements
Locking Optimistic vs Pessimistic Optimistic types Version counter, time stamp, changed fields JPA LockMode values: OPTIMISTIC  (READ) ‏ OPTIMISTIC_FORCE_INCREMENT (WRITE) PESSIMISTIC_READ PESSIMISTIC_WRITE PESSIMISTIC_FORCE_INCREMENT Optimistic locking cooperates with pessimistic locking Multiple places to specify lock (depends upon need) ‏ Query EntityManager
Scaling Data / Data Affinity Leverage middle tier affinity feature sets App servers that associate clients to database connections Group common users to same infrastructure Partitioning functionality Spread data set across multiple databases @Partitioned @RangePartitioning / @HashPartitioning Partitions access to a database cluster by the hash of a field value from the object.  @PinnedPartitioning Pins requests to a single connection pool/node.  @Partitioning Custom rules / implementation
Entity Caching Caching will alleviate demands on database More cache used == bigger benefit when scaling middle tier Configuration JPA  @Cacheable <shared-cache-mode/> EclipseLink @Cache Allows fine tuning of advanced EclipseLink extensions Size, type of cache
Considerations with cache concurrency Stale data Gauge volatility 3 rd  party data updates. Refreshing policy When and how often is your application going to refresh Cache Invalidation policies Time To Live, Time of Day Direct Invalidation Distributed Caching
Cache Co-ordination For small and medium clusters Can greatly reduce number of Optimistic Lock conflicts Reduce the need for Pessimistic Locks Creation and/or modification of Entity results in message to all other nodes Update nodes with changes Invalidate cache unneeded updates ignored Easy to configure sub-clusters Configuration through persistence unit properties Multiple communication mechanisms supported JMS, RMI, MDB
Caching in the Grid Very large deployments can tax cache co-ordination. Cost of coordinating 1 concurrent update per node is n 2  as all nodes must be informed— cost of communication and processing may eventually exceed value of caching   EclipseLink has hooks allowing you to leverage distributed caches Distributed caches reduce demand for database and spread data requirements out across your application grid
TopLink Grid Oracle developed EclipseLink integration with Coherence Data Grid TopLink Grid allows Java developers to simply and transparently leverage the power of the Coherence data grid TopLink Grid combines: the simplicity of application development using the Java standard Java Persistence API (JPA) with the scalability and distributed processing power of Oracle’s Coherence Data Grid.  Supports 'JPA on the Grid' Architecture EclipseLink JPA applications using Coherence as a shared (L2) cache replacement along with configuration for more advanced usage Entities are stored in the Grid not data rows
Grid Architecture Each Entity class can be configured independently as one of: Grid Cache Grid Read Grid Entity
Grid Cache Coherence as Shared (L2) Cache Replacement Ensures all nodes have coherent view of data. Database is always right Shared Cache is always right— Entities read, modified, or created are available to all cluster members. Updates no longer cost n 2  as not all members are messaged—minimum communication is to primary and backup nodes. Coherence cache size is the sum of the available heap of all members—larger cache size enables longer tenure and better cache hit rate Can be used with existing applications and all EclipseLink performance features without altering application results
Grid Read In the Grid Read configuration,  all reads  (both pk and non-pk) are  executed against the grid  (by default). For Entities that typically: Need to be highly available Must have updates written  synchronously  to the database; database is system of record Features: Database is always correct—committed before grid updated Supports all EclipseLink performance features (including batch writing, parameter binding, stored procedures, and statement ordering). High performance  parallel JP QL query execution Can be optionally used with CacheLoader.
Grid Entity The Grid Entity configuration is the same as the Grid Read configuration except that  all reads and writes are executed against the grid , not the database. Coherence is effectively the &quot;system of record&quot; as all Entity queries are directed to it rather than the database. For Entities that typically: May have updates written asynchronously to the database (if CacheStore configured) Features: Can be optionally used with CacheStore to update the database. Database will not be up to date until Coherence flushes changes through CacheStore Will not benefit from EclipseLink performance features such as batch writing
Summary Consider performance from the beginning Be able to measure/profile performance Prepare for concurrency Consider partitioning your system Use Partitioning/Affinity support to scale the data tier Use distributed caching to scale the midtier

Climbing the beanstalk

  • 1.
    Climbing the Beanstalk Scaling Java Persistence to the Cloud Gordon Yorke JPA 2.1 Expert Group EclipseLink Architecture Council
  • 2.
    EclipseLink Project ProvidesJPA, JAXB, SDO, DBWS, and EIS persistence services Open source Eclipse project Project Lead by Oracle Founded by Oracle with the contribution of full TopLink source code and tests Based upon product with 12+ years of commercial usage
  • 3.
    Improving the systemReached the limits of current resources Want to improve system throughput latency user experience Optimized the application Plan for Concurrency Scale data tier / Scale mid-tier
  • 4.
    Optimization Steps Performanceevaluation environment Profiling tools Regression testing Focus Efficient access to data Lazy loading Fetch Groups JOIN FETCH / Batch Reading Caching Heap usage / Persistence Context efficiency Loading only what you need into an EntityManager Synchronization
  • 5.
    Concurrency Plan forvolatile vs static data Transaction isolation Handle multiple processes updating same data Locking Optimistic locking Data that tends to be static Pessimistic locking Volatile data Can be more efficient to lock database than respond to Optimistic Locks Currency requirements
  • 6.
    Locking Optimistic vsPessimistic Optimistic types Version counter, time stamp, changed fields JPA LockMode values: OPTIMISTIC (READ) ‏ OPTIMISTIC_FORCE_INCREMENT (WRITE) PESSIMISTIC_READ PESSIMISTIC_WRITE PESSIMISTIC_FORCE_INCREMENT Optimistic locking cooperates with pessimistic locking Multiple places to specify lock (depends upon need) ‏ Query EntityManager
  • 7.
    Scaling Data /Data Affinity Leverage middle tier affinity feature sets App servers that associate clients to database connections Group common users to same infrastructure Partitioning functionality Spread data set across multiple databases @Partitioned @RangePartitioning / @HashPartitioning Partitions access to a database cluster by the hash of a field value from the object. @PinnedPartitioning Pins requests to a single connection pool/node. @Partitioning Custom rules / implementation
  • 8.
    Entity Caching Cachingwill alleviate demands on database More cache used == bigger benefit when scaling middle tier Configuration JPA @Cacheable <shared-cache-mode/> EclipseLink @Cache Allows fine tuning of advanced EclipseLink extensions Size, type of cache
  • 9.
    Considerations with cacheconcurrency Stale data Gauge volatility 3 rd party data updates. Refreshing policy When and how often is your application going to refresh Cache Invalidation policies Time To Live, Time of Day Direct Invalidation Distributed Caching
  • 10.
    Cache Co-ordination Forsmall and medium clusters Can greatly reduce number of Optimistic Lock conflicts Reduce the need for Pessimistic Locks Creation and/or modification of Entity results in message to all other nodes Update nodes with changes Invalidate cache unneeded updates ignored Easy to configure sub-clusters Configuration through persistence unit properties Multiple communication mechanisms supported JMS, RMI, MDB
  • 11.
    Caching in theGrid Very large deployments can tax cache co-ordination. Cost of coordinating 1 concurrent update per node is n 2 as all nodes must be informed— cost of communication and processing may eventually exceed value of caching EclipseLink has hooks allowing you to leverage distributed caches Distributed caches reduce demand for database and spread data requirements out across your application grid
  • 12.
    TopLink Grid Oracledeveloped EclipseLink integration with Coherence Data Grid TopLink Grid allows Java developers to simply and transparently leverage the power of the Coherence data grid TopLink Grid combines: the simplicity of application development using the Java standard Java Persistence API (JPA) with the scalability and distributed processing power of Oracle’s Coherence Data Grid. Supports 'JPA on the Grid' Architecture EclipseLink JPA applications using Coherence as a shared (L2) cache replacement along with configuration for more advanced usage Entities are stored in the Grid not data rows
  • 13.
    Grid Architecture EachEntity class can be configured independently as one of: Grid Cache Grid Read Grid Entity
  • 14.
    Grid Cache Coherenceas Shared (L2) Cache Replacement Ensures all nodes have coherent view of data. Database is always right Shared Cache is always right— Entities read, modified, or created are available to all cluster members. Updates no longer cost n 2 as not all members are messaged—minimum communication is to primary and backup nodes. Coherence cache size is the sum of the available heap of all members—larger cache size enables longer tenure and better cache hit rate Can be used with existing applications and all EclipseLink performance features without altering application results
  • 15.
    Grid Read Inthe Grid Read configuration, all reads (both pk and non-pk) are executed against the grid (by default). For Entities that typically: Need to be highly available Must have updates written synchronously to the database; database is system of record Features: Database is always correct—committed before grid updated Supports all EclipseLink performance features (including batch writing, parameter binding, stored procedures, and statement ordering). High performance parallel JP QL query execution Can be optionally used with CacheLoader.
  • 16.
    Grid Entity TheGrid Entity configuration is the same as the Grid Read configuration except that all reads and writes are executed against the grid , not the database. Coherence is effectively the &quot;system of record&quot; as all Entity queries are directed to it rather than the database. For Entities that typically: May have updates written asynchronously to the database (if CacheStore configured) Features: Can be optionally used with CacheStore to update the database. Database will not be up to date until Coherence flushes changes through CacheStore Will not benefit from EclipseLink performance features such as batch writing
  • 17.
    Summary Consider performancefrom the beginning Be able to measure/profile performance Prepare for concurrency Consider partitioning your system Use Partitioning/Affinity support to scale the data tier Use distributed caching to scale the midtier

Editor's Notes

  • #8 Australia
  • #16 The Grid Read configuration a little different than the Grid Cache configuration in that rather than just caching objects in Coherence we start to execute queries for those objects against Coherence. This is for both primary key and non-primary key queries. If you noticed in the last configuration, the Grid Cache configuration, we were executing primary key queries against Coherence. In the Grid Read configuration we redirect all, both primary and non-primary key queries, to Coherence. This configuration is useful for entities is that have to be highly available. Being in Coherence they can be found a very rapidly without having to do a database round-trip. This is also useful for entities that have to have their changes written synchronously to the database. In this configuration EclipseLink is doing all the writing. Therefore you get the advantage of batch writing, JTA transaction integration, and other write optimizations and features. And you&apos;re also guaranteed that your database is correct. Each transaction runs, the database is updated synchronously, and once the transaction has committed Coherence is updated. It is possible for database failures to occur in this configuration, for example optimistic lock exceptions. If a failure does occur the transaction roles back and the changes are not applied to Coherence. So this configuration is suitable when database transaction failures can occur. The characteristics of this configuration include that the database is always correct – the database is committed before the cache is updated. All the write performance features of EclipseLink are available. But because all reads are being redirected into the grid you get the benefit of high-performance parallel query processing. And you can optionally configure a CacheLoader so that primary key queries against Coherence can load an object from the database if it&apos;s not in the grid.
  • #17 Grid Entity is a further incremental change on top of the previous Grid Read configuration. In this case all the reads and all the writes are executed against Coherence. In this configuration, Coherence is effectively the system of record. All the queries are redirected to it instead of the database. You may have a database behind Coherence but EclipseLink will treat Coherence as the data source. This configuration makes sense for Entities that need to be highly available and can be written asynchronously to a backing database through a CacheStore. With Coherence write behind changes can be flushed the database asynchronously at intervals. The database will not be up-to-date until Coherence flushes any pending writes. So if you&apos;re using right behind, in the period between the EclipseLink transaction commit and the flush of those changes the database will be out of sync with the cache so third-party applications that access the database may read stale data. This is nothing new for Coherence developers working with a database and using write behind. This configuration cannot benefit from all of the EclipseLink write features optimizations other available using the other two configurations.