Your SlideShare is downloading. ×
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Climbing the beanstalk
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Climbing the beanstalk


Published on

Scaling JPA applications or deploying them to flexible resources can be a challenge. How do I scale, what is the impact on caching and how can I reuse resources? In this talk we will work through …

Scaling JPA applications or deploying them to flexible resources can be a challenge. How do I scale, what is the impact on caching and how can I reuse resources? In this talk we will work through these challenges with real examples using JPA and EclipseLink. Exploring where and when to apply best practices and the many features available for caching, scalability, resource sharing and elastic deployments.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Australia
  • The Grid Read configuration a little different than the Grid Cache configuration in that rather than just caching objects in Coherence we start to execute queries for those objects against Coherence. This is for both primary key and non-primary key queries. If you noticed in the last configuration, the Grid Cache configuration, we were executing primary key queries against Coherence. In the Grid Read configuration we redirect all, both primary and non-primary key queries, to Coherence. This configuration is useful for entities is that have to be highly available. Being in Coherence they can be found a very rapidly without having to do a database round-trip. This is also useful for entities that have to have their changes written synchronously to the database. In this configuration EclipseLink is doing all the writing. Therefore you get the advantage of batch writing, JTA transaction integration, and other write optimizations and features. And you're also guaranteed that your database is correct. Each transaction runs, the database is updated synchronously, and once the transaction has committed Coherence is updated. It is possible for database failures to occur in this configuration, for example optimistic lock exceptions. If a failure does occur the transaction roles back and the changes are not applied to Coherence. So this configuration is suitable when database transaction failures can occur. The characteristics of this configuration include that the database is always correct – the database is committed before the cache is updated. All the write performance features of EclipseLink are available. But because all reads are being redirected into the grid you get the benefit of high-performance parallel query processing. And you can optionally configure a CacheLoader so that primary key queries against Coherence can load an object from the database if it's not in the grid.
  • Grid Entity is a further incremental change on top of the previous Grid Read configuration. In this case all the reads and all the writes are executed against Coherence. In this configuration, Coherence is effectively the system of record. All the queries are redirected to it instead of the database. You may have a database behind Coherence but EclipseLink will treat Coherence as the data source. This configuration makes sense for Entities that need to be highly available and can be written asynchronously to a backing database through a CacheStore. With Coherence write behind changes can be flushed the database asynchronously at intervals. The database will not be up-to-date until Coherence flushes any pending writes. So if you're using right behind, in the period between the EclipseLink transaction commit and the flush of those changes the database will be out of sync with the cache so third-party applications that access the database may read stale data. This is nothing new for Coherence developers working with a database and using write behind. This configuration cannot benefit from all of the EclipseLink write features optimizations other available using the other two configurations.
  • Transcript

    • 1. Climbing the Beanstalk Scaling Java Persistence to the Cloud Gordon Yorke JPA 2.1 Expert Group EclipseLink Architecture Council
    • 2. EclipseLink Project
      • Provides JPA, JAXB, SDO, DBWS, and EIS persistence services
      • Open source Eclipse project
      • Project Lead by Oracle
      • Founded by Oracle with the contribution of full TopLink source code and tests
      • Based upon product with 12+ years of commercial usage
    • 3. Improving the system
      • Reached the limits of current resources
      • Want to improve system
        • throughput
        • latency
        • user experience
      • Optimized the application
      • Plan for Concurrency
      • Scale data tier / Scale mid-tier
    • 4. Optimization Steps
      • Performance evaluation environment
        • Profiling tools
        • Regression testing
      • Focus
        • Efficient access to data
          • Lazy loading
          • Fetch Groups
          • JOIN FETCH / Batch Reading
          • Caching
        • Heap usage / Persistence Context efficiency
          • Loading only what you need into an EntityManager
        • Synchronization
    • 5. Concurrency
      • Plan for volatile vs static data
      • Transaction isolation
      • Handle multiple processes updating same data
        • Locking
          • Optimistic locking
            • Data that tends to be static
          • Pessimistic locking
            • Volatile data
              • Can be more efficient to lock database than respond to Optimistic Locks
            • Currency requirements
    • 6. Locking
      • Optimistic vs Pessimistic
      • Optimistic types
        • Version counter, time stamp, changed fields
      • JPA LockMode values:
        • OPTIMISTIC (READ) ‏
      • Optimistic locking cooperates with pessimistic locking
      • Multiple places to specify lock (depends upon need) ‏
        • Query
        • EntityManager
    • 7. Scaling Data / Data Affinity
      • Leverage middle tier affinity feature sets
        • App servers that associate clients to database connections
        • Group common users to same infrastructure
      • Partitioning functionality
        • Spread data set across multiple databases
          • @Partitioned
            • @RangePartitioning / @HashPartitioning
              • Partitions access to a database cluster by the hash of a field value from the object.
            • @PinnedPartitioning
              • Pins requests to a single connection pool/node.
            • @Partitioning
              • Custom rules / implementation
    • 8. Entity Caching
      • Caching will alleviate demands on database
      • More cache used == bigger benefit when scaling middle tier
      • Configuration
        • JPA
          • @Cacheable
          • <shared-cache-mode/>
        • EclipseLink
          • @Cache
            • Allows fine tuning of advanced EclipseLink extensions
              • Size, type of cache
    • 9. Considerations with cache concurrency
      • Stale data
        • Gauge volatility
        • 3 rd party data updates.
        • Refreshing policy
          • When and how often is your application going to refresh
        • Cache Invalidation policies
          • Time To Live, Time of Day
        • Direct Invalidation
        • Distributed Caching
    • 10. Cache Co-ordination
      • For small and medium clusters
      • Can greatly reduce number of Optimistic Lock conflicts
      • Reduce the need for Pessimistic Locks
      • Creation and/or modification of Entity results in message to all other nodes
        • Update nodes with changes
        • Invalidate cache
        • unneeded updates ignored
        • Easy to configure sub-clusters
      • Configuration through persistence unit properties
        • Multiple communication mechanisms supported
          • JMS, RMI, MDB
    • 11. Caching in the Grid
      • Very large deployments can tax cache co-ordination.
        • Cost of coordinating 1 concurrent update per node is n 2 as all nodes must be informed— cost of communication and processing may eventually exceed value of caching
      • EclipseLink has hooks allowing you to leverage distributed caches
      • Distributed caches reduce demand for database and spread data requirements out across your application grid
    • 12. TopLink Grid
      • Oracle developed EclipseLink integration with Coherence Data Grid
      • TopLink Grid allows Java developers to simply and transparently leverage the power of the Coherence data grid
      • TopLink Grid combines:
        • the simplicity of application development using the Java standard Java Persistence API (JPA) with
        • the scalability and distributed processing power of Oracle’s Coherence Data Grid.
      • Supports 'JPA on the Grid' Architecture
        • EclipseLink JPA applications using Coherence as a shared (L2) cache replacement along with configuration for more advanced usage
      • Entities are stored in the Grid not data rows
    • 13. Grid Architecture
      • Each Entity class can be configured independently as one of:
        • Grid Cache
        • Grid Read
        • Grid Entity
    • 14. Grid Cache
      • Coherence as Shared (L2) Cache Replacement
      • Ensures all nodes have coherent view of data.
        • Database is always right
        • Shared Cache is always right— Entities read, modified, or created are available to all cluster members.
      • Updates no longer cost n 2 as not all members are messaged—minimum communication is to primary and backup nodes.
      • Coherence cache size is the sum of the available heap of all members—larger cache size enables longer tenure and better cache hit rate
      • Can be used with existing applications and all EclipseLink performance features without altering application results
    • 15. Grid Read
      • In the Grid Read configuration, all reads (both pk and non-pk) are executed against the grid (by default).
      • For Entities that typically:
        • Need to be highly available
        • Must have updates written synchronously to the database; database is system of record
      • Features:
        • Database is always correct—committed before grid updated
        • Supports all EclipseLink performance features (including batch writing, parameter binding, stored procedures, and statement ordering).
        • High performance parallel JP QL query execution
        • Can be optionally used with CacheLoader.
    • 16. Grid Entity
      • The Grid Entity configuration is the same as the Grid Read configuration except that all reads and writes are executed against the grid , not the database.
      • Coherence is effectively the &quot;system of record&quot; as all Entity queries are directed to it rather than the database.
      • For Entities that typically:
        • May have updates written asynchronously to the database (if CacheStore configured)
      • Features:
        • Can be optionally used with CacheStore to update the database.
        • Database will not be up to date until Coherence flushes changes through CacheStore
        • Will not benefit from EclipseLink performance features such as batch writing
    • 17. Summary
      • Consider performance from the beginning
      • Be able to measure/profile performance
      • Prepare for concurrency
      • Consider partitioning your system
      • Use Partitioning/Affinity support to scale the data tier
      • Use distributed caching to scale the midtier