Series Overview Data Access with JPA Distributed Caching with Coherence Message Driven and Web Services with Spring RESTful Web Services with JAX-RS and Javascript UI with JQuery Troubleshooting and tuning ©2010 Oracle Corporation
Next Session: JMS and WebServices with Spring Learn how to: Use Spring with WebLogic JMS Use Spring to create Web Services on WebLogic ©2010 Oracle Corporation
Coherence, TopLink Grid JPA, and WebLogic James Bayer WebLogic Server Product Management ©2010 Oracle Corporation
Agenda Coherence Overview TopLink Grid – JPA + Coherence Oracle Parcel Service Example WebLogic Server and Coherence Integration ©2010 Oracle Corporation
<Insert Picture Here> Coherence Overview
<Insert Picture Here> “ A  Data Grid  is a system composed of multiple servers that work together to manage information and related operations - such as computations - in a  distributed environment .”
Coherence Clustering: Tangosol Clustered Messaging Protocol (TCMP) Completely  asynchronous  yet  ordered  messaging built on UDP multicast/unicast Truly  Peer-to-Peer : equal responsibility for  both producing and consuming  the services of the cluster Self Healing  - Quorum based diagnostics Linearly scalable  mesh architecture . TCP-like features Messaging throughput scales to the network infrastructure.
Coherence Clustering: The Cluster Service Transparent ,  dynamic  and  automatic  cluster membership management Clustered Consensus:   All members  in the cluster understand the topology of the  entire grid  at  all times . Crowdsourced  member  health diagnostics
Coherence Clustering: The Coherence Hierarchy One Cluster  (i.e. “singleton”) Under the cluster there are  any number of uniquely named Services  (e.g. caching service) Underneath each caching service  there are any number of uniquely named Caches
Data Management: Partitioned Caching Extreme Scalability:  Automatically, dynamically and transparently partitions the data set across the members of the grid.  Pros: Linear scalability of data capacity  Processing power scales with data capacity. Fixed cost per data access Cons: Cost Per Access:  High percentage chance that each data access will go across the wire. Primary Use: Large in-memory storage environments Parallel processing environments
Data Management: Partitioned Fault Tolerance Automatically, dynamically and transparently  manages the  fault tolerance  of your data. Backups are guaranteed  to be on a separate physical machine as the primary. Backup responsibilities for one node’s data is  shared amongst the other nodes  in the grid.
Data Management: Cache Client/Cache Server Partitioning can be controlled on a  member by member basis . A  member is either responsible for an equal partition of the data or not  (“storage enabled” vs. “storage disabled”) Cache Client  – typically the application instances Cache Servers  – typically stand-alone JVMs responsible for storage and data processing only.
Data Management: Near Caching Extreme Scalability &  Performance  The best of both worlds between the Replicated and Partitioned topologies. Most recently/frequently used data is stored locally. Pros: All of the same Pros as the Partitioned topology plus… High percentage chance data is local to request. Cons: Cost Per Update:  There is a cost associated with each update to a piece of data that is stored locally on other nodes. Primary Use: Large in-memory storage environments with likelihood of repetitive data access.
Data Management: Data Affinity The ability to  associate objects across caches  guaranteeing they are located  on the same member . Typical Use Case:  Parent Child relationships
<Insert Picture Here> Data Processing Options
Data Processing: Events -  JavaBean Event Model Listen to all events for all keys ENTRY_DELETED ENTRY_INSERTED ENTRY_UPDATED NamedCache cache = CacheFactory.getCache(“myCache”); cache.addMapListener(listener);
Data Processing: Parallel Query
Data Processing: Parallel Query
Data Processing: Parallel Query
Data Processing: Parallel Query
Data Processing: Continuous Query Cache
Data Processing: Continuous Query Cache
Data Processing: Continuous Query Cache
Data Processing: Continuous Query Cache
Data Processing: Invocable Map
Data Processing: Invocable Map
Data Processing: Invocable Map
Data Processing: Invocable Map
Data Processing: Triggers
Data Processing: Triggers
Data Processing: Triggers
<Insert Picture Here> TopLink Grid JPA + Coherence
TopLink Grid, Coherence & WebLogic Server   JPA   DBWS   SDO   EIS   MOXy   TopLink Grid APPLICATION GRID
EclipseLink Project Open source Eclipse project Project Lead by Oracle Founded by Oracle with the contribution of full TopLink source code and tests Based upon product with 12+ years of commercial usage  Certified on WebLogic and redistributed by Oracle as part of TopLink product
Scaling JPA Applications Historically, scaling a JPA application entails Adding nodes to a cluster Tuning database performance to reduce query time Both of these approaches will support scalability but only to a point By leveraging Oracle Coherence, TopLink Grid offers a new way to scale JPA applications
EclipseLink in a Cluster Application EntityManager EntityManagerFactory Shared Cache L1 Cache Application EntityManager EntityManagerFactory Shared Cache L1 Cache Need to keep Shared Caches Coherent
Traditional Approaches to Scaling JPA Prior to TopLink Grid, there were two strategies for scaling EclipseLink JPA applications into a cluster: Disable Shared Cache Each transaction retrieves all required data from the database.  Increased database load limits overall scalability but ensures all nodes have latest data. Cache Coordination When Entity is modified in one node, other cluster nodes messaged to replicate/invalidate shared cached Entities.
Disable Shared Cache Application EntityManager EntityManagerFactory L1 Cache Application EntityManager EntityManagerFactory L1 Cache
Disable Shared Cache Ensures all nodes have coherent view of data. Database is always right Each transaction queries all required data from database and constructs Entities No inter-node messaging Memory footprint of application increases as each transaction has a copy of each required Entity Every transaction pays object construction cost for queried Entities. Database becomes bottleneck
Cache Coordination Application EntityManager EntityManagerFactory Shared Cache Cache  Coordination L1 Cache Application EntityManager EntityManagerFactory Shared Cache L1 Cache
Cache Coordination Ensures all nodes have coherent view of data. Database is always right Fresh Entities retrieved from shared cache Stale Entities refreshed from database on access Creation and/or modification of Entity results in message to all other nodes Messaging latency means that nodes may have stale data for a short period. Cost of coordinating 1 simultaneous update per node is n 2  as all nodes must be informed— cost of communication and processing may eventually exceed value of caching   Shared cache size limited by heap of each node Objects shared across transactions to reduce memory footprint
TopLink Grid TopLink Grid is a component of Oracle TopLink TopLink Grid allows Java developers to transparently leverage the power of the Coherence data grid TopLink Grid combines: the simplicity of application development using the Java standard Java Persistence API (JPA) with the scalability and distributed processing power of Oracle’s Coherence Data Grid.  Supports 'JPA on the Grid' Architecture EclipseLink JPA applications using Coherence as a shared (L2) cache replacement along with configuration for more advanced usage
Scaling JPA with TopLink Grid TopLink Grid integrates EclipseLink JPA and Coherence Base configuration uses Coherence data grid as distributed shared cache Updates to Coherence cache immediately available to all cluster nodes Advanced configurations uses data grid to process queries to avoid database access and decrease database load
TopLink Grid with Coherence Cache Application EntityManager EntityManagerFactory L1 Cache Application EntityManager EntityManagerFactory L1 Cache Coherence
TopLink Grid—Typical Configurations Grid Cache—Coherence as Shared (L2) Cache Configurable per Entity type Entities read by one grid member are put into Coherence and are immediately available across the entire grid Grid Read All supported read queries executed in the Coherence data grid All writes performed directly on the database by TopLink (synchronously) and Coherence updated Grid Entity All supported read queries and all writes are executed in the Coherence data grid
Grid Cache—Reading Objects Queries are performed using JPA em.find(..) or JPQL. A find() will result in a get() on the appropriate Coherence cache.  If found, Entity is returned.  If get() returns null or query is JPQL, the database is queried with SQL. The queried Entities are put() into Coherence and returned to the application.
Grid Cache—Query Results Coherence also leveraged when processing database results EclipseLink constructs Entities from JDBC result set but first extracts primary keys from results and checks cache to avoid object construction cost Even if a SQL query is executed, Coherence can still improve application throughput by eliminating object construction costs for cached Entities
Grid Cache—Writing Objects Applications persist Entities using standard JPA and commit a transaction. The new and/or updated Entities are inserted/updated in the database and the database transaction committed. If the database transaction is successful the Entities are put() into Coherence which makes them available to all cluster members.
Grid Cache Configuration A CoherenceInterceptor intercepts all shared cache operations and direct them to Coherence instead of the default EclipseLink shared cache. Configure with annotations or via eclipselink-orm.xml @CacheInterceptor (CoherenceInterceptor. class ) public   class  Employee  implements  Serializable {
Grid Read—Reading Objects Queries are performed using JPA em.find(..) or JPQL. JQPL will be translated to a Coherence Filter and used to query results from Coherence. A find() will result in a get() on the appropriate Coherence cache. The database is not queried by EclipseLink. If Coherence is configured with a CacheLoader then a find() may result in a SELECT, but JQPL will not.
Grid Read—Writing Objects An application commits a transaction with new Entities or modifications to existing Entities. EclipseLink issues the appropriate SQL to update the database and commits the database transaction. Upon successful commit, the new and updated Entities are put() into Coherence.
Grid Read Configuration An Entity can be configured as Grid Read through annotations or in eclipselink-orm.xml @Entity @Customizer (CoherenceReadCustomizer. class ) public   class  Employee  implements  Serializable {
Limitations in TopLink 11gR1 JPQL translated to Filter and executed in Coherence: TopLink Grid 11gR1 Supports single Entity queries with constraints on attributes, e.g.: select e from Employee e where e.name = 'Joe' Complex queries are executed on the database: Multi-Entity queries or queries that traverse relationships ('joins'), e.g.: select e from Employee e  where e.address.city = 'Bonn' Projection (Report) queries, e.g.: select e.name, e.city from Employee e
Grid Entity Configuration An Entity can be configured as a Grid Entity through annotations or in eclipselink-orm.xml @Entity @Customizer (CoherenceReadWriteCustomizer. class ) public   class  Employee  implements  Serializable {
Grid Entity—Reading Objects  (Same as Grid Read) Queries are performed using JPA em.find(..) or JPQL. JQPL will be translated to a Coherence Filter and used to query results from Coherence. A find() will result in a get() on the appropriate Coherence cache. The database is not queried by EclipseLink. If Coherence is configured with a CacheLoader then a find() may result in a SELECT, but JQPL will not.
Grid Entity—Writing Objects An application commits a transaction with new Entities or modifications to existing Entities. EclipseLink put()s all new and updated Entities into Coherence. If   a CacheStore is configured, Coherence will synchronously or asynchronously write the changes to the database, depending on configuration.
How is TopLink Grid different from Hibernate with Coherence? Hibernate does not cache objects, it caches  data rows Hibernate caches serialized data rows in Coherence Using Coherence as a cache for Hibernate Every cache hit incurs both object construction  and  serialization costs Worse, object construction cost is paid by every cluster member for every cache hit Hibernate only uses Coherence as a cache—TopLink Grid is  unique  in supporting execution of queries against Coherence which can significantly offload the database and increase throughput
Summary TopLink supports a range of strategies for scaling JPA applications TopLink Grid integrates EclipseLink JPA with Oracle Coherence to provide: 'JPA on the Grid' functionality to support scaling JPA applications with Coherence Support for caching Entities with relationships in Coherence Both TopLink and Coherence are a part of WebLogic Application Grid
<Insert Picture Here> Oracle Parcel Service Example WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration
<Insert Picture Here> WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration
Coherence Server Lifecycle WLS MBean’s Node Manager Client Node Manager WebLogic Admin Server WLS Console WLST,  JMX Domain Directory - Coherence Cluster - tangosol-coherence-override.xml - Coherence Server Coherence Server(s) Machine A Node Manager Coherence Server(s) Machine B [Lifecycle,  HA] Pack / Unpack
OracleWebLogic YouTube Channel www.YouTube.com/OracleWebLogic ©2010 Oracle Corporation
OracleWebLogic YouTube Channel www.YouTube.com/OracleWebLogic ©2010 Oracle Corporation

JPA and Coherence with TopLink Grid

  • 1.
    Series Overview DataAccess with JPA Distributed Caching with Coherence Message Driven and Web Services with Spring RESTful Web Services with JAX-RS and Javascript UI with JQuery Troubleshooting and tuning ©2010 Oracle Corporation
  • 2.
    Next Session: JMSand WebServices with Spring Learn how to: Use Spring with WebLogic JMS Use Spring to create Web Services on WebLogic ©2010 Oracle Corporation
  • 3.
    Coherence, TopLink GridJPA, and WebLogic James Bayer WebLogic Server Product Management ©2010 Oracle Corporation
  • 4.
    Agenda Coherence OverviewTopLink Grid – JPA + Coherence Oracle Parcel Service Example WebLogic Server and Coherence Integration ©2010 Oracle Corporation
  • 5.
    <Insert Picture Here>Coherence Overview
  • 6.
    <Insert Picture Here>“ A Data Grid is a system composed of multiple servers that work together to manage information and related operations - such as computations - in a distributed environment .”
  • 7.
    Coherence Clustering: TangosolClustered Messaging Protocol (TCMP) Completely asynchronous yet ordered messaging built on UDP multicast/unicast Truly Peer-to-Peer : equal responsibility for both producing and consuming the services of the cluster Self Healing - Quorum based diagnostics Linearly scalable mesh architecture . TCP-like features Messaging throughput scales to the network infrastructure.
  • 8.
    Coherence Clustering: TheCluster Service Transparent , dynamic and automatic cluster membership management Clustered Consensus: All members in the cluster understand the topology of the entire grid at all times . Crowdsourced member health diagnostics
  • 9.
    Coherence Clustering: TheCoherence Hierarchy One Cluster (i.e. “singleton”) Under the cluster there are any number of uniquely named Services (e.g. caching service) Underneath each caching service there are any number of uniquely named Caches
  • 10.
    Data Management: PartitionedCaching Extreme Scalability: Automatically, dynamically and transparently partitions the data set across the members of the grid. Pros: Linear scalability of data capacity Processing power scales with data capacity. Fixed cost per data access Cons: Cost Per Access: High percentage chance that each data access will go across the wire. Primary Use: Large in-memory storage environments Parallel processing environments
  • 11.
    Data Management: PartitionedFault Tolerance Automatically, dynamically and transparently manages the fault tolerance of your data. Backups are guaranteed to be on a separate physical machine as the primary. Backup responsibilities for one node’s data is shared amongst the other nodes in the grid.
  • 12.
    Data Management: CacheClient/Cache Server Partitioning can be controlled on a member by member basis . A member is either responsible for an equal partition of the data or not (“storage enabled” vs. “storage disabled”) Cache Client – typically the application instances Cache Servers – typically stand-alone JVMs responsible for storage and data processing only.
  • 13.
    Data Management: NearCaching Extreme Scalability & Performance The best of both worlds between the Replicated and Partitioned topologies. Most recently/frequently used data is stored locally. Pros: All of the same Pros as the Partitioned topology plus… High percentage chance data is local to request. Cons: Cost Per Update: There is a cost associated with each update to a piece of data that is stored locally on other nodes. Primary Use: Large in-memory storage environments with likelihood of repetitive data access.
  • 14.
    Data Management: DataAffinity The ability to associate objects across caches guaranteeing they are located on the same member . Typical Use Case: Parent Child relationships
  • 15.
    <Insert Picture Here>Data Processing Options
  • 16.
    Data Processing: Events- JavaBean Event Model Listen to all events for all keys ENTRY_DELETED ENTRY_INSERTED ENTRY_UPDATED NamedCache cache = CacheFactory.getCache(“myCache”); cache.addMapListener(listener);
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    <Insert Picture Here>TopLink Grid JPA + Coherence
  • 33.
    TopLink Grid, Coherence& WebLogic Server JPA DBWS SDO EIS MOXy TopLink Grid APPLICATION GRID
  • 34.
    EclipseLink Project Opensource Eclipse project Project Lead by Oracle Founded by Oracle with the contribution of full TopLink source code and tests Based upon product with 12+ years of commercial usage Certified on WebLogic and redistributed by Oracle as part of TopLink product
  • 35.
    Scaling JPA ApplicationsHistorically, scaling a JPA application entails Adding nodes to a cluster Tuning database performance to reduce query time Both of these approaches will support scalability but only to a point By leveraging Oracle Coherence, TopLink Grid offers a new way to scale JPA applications
  • 36.
    EclipseLink in aCluster Application EntityManager EntityManagerFactory Shared Cache L1 Cache Application EntityManager EntityManagerFactory Shared Cache L1 Cache Need to keep Shared Caches Coherent
  • 37.
    Traditional Approaches toScaling JPA Prior to TopLink Grid, there were two strategies for scaling EclipseLink JPA applications into a cluster: Disable Shared Cache Each transaction retrieves all required data from the database. Increased database load limits overall scalability but ensures all nodes have latest data. Cache Coordination When Entity is modified in one node, other cluster nodes messaged to replicate/invalidate shared cached Entities.
  • 38.
    Disable Shared CacheApplication EntityManager EntityManagerFactory L1 Cache Application EntityManager EntityManagerFactory L1 Cache
  • 39.
    Disable Shared CacheEnsures all nodes have coherent view of data. Database is always right Each transaction queries all required data from database and constructs Entities No inter-node messaging Memory footprint of application increases as each transaction has a copy of each required Entity Every transaction pays object construction cost for queried Entities. Database becomes bottleneck
  • 40.
    Cache Coordination ApplicationEntityManager EntityManagerFactory Shared Cache Cache Coordination L1 Cache Application EntityManager EntityManagerFactory Shared Cache L1 Cache
  • 41.
    Cache Coordination Ensuresall nodes have coherent view of data. Database is always right Fresh Entities retrieved from shared cache Stale Entities refreshed from database on access Creation and/or modification of Entity results in message to all other nodes Messaging latency means that nodes may have stale data for a short period. Cost of coordinating 1 simultaneous update per node is n 2 as all nodes must be informed— cost of communication and processing may eventually exceed value of caching Shared cache size limited by heap of each node Objects shared across transactions to reduce memory footprint
  • 42.
    TopLink Grid TopLinkGrid is a component of Oracle TopLink TopLink Grid allows Java developers to transparently leverage the power of the Coherence data grid TopLink Grid combines: the simplicity of application development using the Java standard Java Persistence API (JPA) with the scalability and distributed processing power of Oracle’s Coherence Data Grid. Supports 'JPA on the Grid' Architecture EclipseLink JPA applications using Coherence as a shared (L2) cache replacement along with configuration for more advanced usage
  • 43.
    Scaling JPA withTopLink Grid TopLink Grid integrates EclipseLink JPA and Coherence Base configuration uses Coherence data grid as distributed shared cache Updates to Coherence cache immediately available to all cluster nodes Advanced configurations uses data grid to process queries to avoid database access and decrease database load
  • 44.
    TopLink Grid withCoherence Cache Application EntityManager EntityManagerFactory L1 Cache Application EntityManager EntityManagerFactory L1 Cache Coherence
  • 45.
    TopLink Grid—Typical ConfigurationsGrid Cache—Coherence as Shared (L2) Cache Configurable per Entity type Entities read by one grid member are put into Coherence and are immediately available across the entire grid Grid Read All supported read queries executed in the Coherence data grid All writes performed directly on the database by TopLink (synchronously) and Coherence updated Grid Entity All supported read queries and all writes are executed in the Coherence data grid
  • 46.
    Grid Cache—Reading ObjectsQueries are performed using JPA em.find(..) or JPQL. A find() will result in a get() on the appropriate Coherence cache. If found, Entity is returned. If get() returns null or query is JPQL, the database is queried with SQL. The queried Entities are put() into Coherence and returned to the application.
  • 47.
    Grid Cache—Query ResultsCoherence also leveraged when processing database results EclipseLink constructs Entities from JDBC result set but first extracts primary keys from results and checks cache to avoid object construction cost Even if a SQL query is executed, Coherence can still improve application throughput by eliminating object construction costs for cached Entities
  • 48.
    Grid Cache—Writing ObjectsApplications persist Entities using standard JPA and commit a transaction. The new and/or updated Entities are inserted/updated in the database and the database transaction committed. If the database transaction is successful the Entities are put() into Coherence which makes them available to all cluster members.
  • 49.
    Grid Cache ConfigurationA CoherenceInterceptor intercepts all shared cache operations and direct them to Coherence instead of the default EclipseLink shared cache. Configure with annotations or via eclipselink-orm.xml @CacheInterceptor (CoherenceInterceptor. class ) public class Employee implements Serializable {
  • 50.
    Grid Read—Reading ObjectsQueries are performed using JPA em.find(..) or JPQL. JQPL will be translated to a Coherence Filter and used to query results from Coherence. A find() will result in a get() on the appropriate Coherence cache. The database is not queried by EclipseLink. If Coherence is configured with a CacheLoader then a find() may result in a SELECT, but JQPL will not.
  • 51.
    Grid Read—Writing ObjectsAn application commits a transaction with new Entities or modifications to existing Entities. EclipseLink issues the appropriate SQL to update the database and commits the database transaction. Upon successful commit, the new and updated Entities are put() into Coherence.
  • 52.
    Grid Read ConfigurationAn Entity can be configured as Grid Read through annotations or in eclipselink-orm.xml @Entity @Customizer (CoherenceReadCustomizer. class ) public class Employee implements Serializable {
  • 53.
    Limitations in TopLink11gR1 JPQL translated to Filter and executed in Coherence: TopLink Grid 11gR1 Supports single Entity queries with constraints on attributes, e.g.: select e from Employee e where e.name = 'Joe' Complex queries are executed on the database: Multi-Entity queries or queries that traverse relationships ('joins'), e.g.: select e from Employee e where e.address.city = 'Bonn' Projection (Report) queries, e.g.: select e.name, e.city from Employee e
  • 54.
    Grid Entity ConfigurationAn Entity can be configured as a Grid Entity through annotations or in eclipselink-orm.xml @Entity @Customizer (CoherenceReadWriteCustomizer. class ) public class Employee implements Serializable {
  • 55.
    Grid Entity—Reading Objects (Same as Grid Read) Queries are performed using JPA em.find(..) or JPQL. JQPL will be translated to a Coherence Filter and used to query results from Coherence. A find() will result in a get() on the appropriate Coherence cache. The database is not queried by EclipseLink. If Coherence is configured with a CacheLoader then a find() may result in a SELECT, but JQPL will not.
  • 56.
    Grid Entity—Writing ObjectsAn application commits a transaction with new Entities or modifications to existing Entities. EclipseLink put()s all new and updated Entities into Coherence. If a CacheStore is configured, Coherence will synchronously or asynchronously write the changes to the database, depending on configuration.
  • 57.
    How is TopLinkGrid different from Hibernate with Coherence? Hibernate does not cache objects, it caches data rows Hibernate caches serialized data rows in Coherence Using Coherence as a cache for Hibernate Every cache hit incurs both object construction and serialization costs Worse, object construction cost is paid by every cluster member for every cache hit Hibernate only uses Coherence as a cache—TopLink Grid is unique in supporting execution of queries against Coherence which can significantly offload the database and increase throughput
  • 58.
    Summary TopLink supportsa range of strategies for scaling JPA applications TopLink Grid integrates EclipseLink JPA with Oracle Coherence to provide: 'JPA on the Grid' functionality to support scaling JPA applications with Coherence Support for caching Entities with relationships in Coherence Both TopLink and Coherence are a part of WebLogic Application Grid
  • 59.
    <Insert Picture Here>Oracle Parcel Service Example WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration
  • 60.
    <Insert Picture Here>WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration WebLogic Server and Coherence Integration
  • 61.
    Coherence Server LifecycleWLS MBean’s Node Manager Client Node Manager WebLogic Admin Server WLS Console WLST, JMX Domain Directory - Coherence Cluster - tangosol-coherence-override.xml - Coherence Server Coherence Server(s) Machine A Node Manager Coherence Server(s) Machine B [Lifecycle, HA] Pack / Unpack
  • 62.
    OracleWebLogic YouTube Channelwww.YouTube.com/OracleWebLogic ©2010 Oracle Corporation
  • 63.
    OracleWebLogic YouTube Channelwww.YouTube.com/OracleWebLogic ©2010 Oracle Corporation

Editor's Notes

  • #34 Initial Diagram: What we see in this slide is a high level architecture diagram for TopLink. EclipseLink is at the core of TopLink and EclipseLink provides the persistence services we saw on the previous slide. The MOXy (Mapping Objects to XML) component is EclipseLink&apos;s JAXB implementation. Animation: Add TopLink We bundle TopLink Grid with EclipseLink to compose the Oracle TopLink product. If you look at the TopLink product that you can download today what you&apos;ll see is an EclipseLink jar, a TopLink Grid jar, and a jar named toplink.jar which contains the backwards compatibility support for older applications. This diagram illustrates the contents of Oracle TopLink but to use TopLink Grid you&apos;d combine TopLink with Oracle Coherence. Animation: Add Coherence Both Coherence and TopLink are components of WebLogic Suite. Animation: Add WebLogic Suite If you’re working with WebLogic Suite then you have all these products available to you. Animation: Developer Tools I mentioned a number of developer tools support TopLink and TopLink Grid and those include JDeveloper—JDeveloper has extensive support for developing with TopLink. In Eclipse we have support in the Web Tools Platform&apos;s Dali project for JPA development and OEPE, the Oracle Enterprise Pack for Eclipse, which includes Dali offers some addition JPA tooling.
  • #35 EclipseLink is a project at Eclipse (as the name suggests). It&apos;s a project lead by Oracle and was founded with the full source code for Oracle TopLink and for its test suites. Oracle contributed all of TopLink and there are no secret &amp;quot;go fast&amp;quot; bits retained by Oracle. The entire product was open sourced and the development team that previously was working on Oracle TopLink is now working in open source in the subversion repository at Eclipse—the same developers, same source albeit moved from the oracle.toplink.* packages to org.eclipse.persistence.* packages. What&apos;s significant about EclipseLink is that although the latest release (as of this writing) is 1.2 this is not new code. This is code that has been evolved and used in many commercial applications in a wide variety of environments for well over a decade. There&apos;s a lot of experience, a lot of corner cases and real world customer requirements baked into this software so it is a very mature and capable code base. As I mentioned Oracle redistributes EclipseLink in TopLink and so we certify it on WebLogic and provide support for it. TopLink customers can call Oracle support for EclipseLink issues.
  • #36 The topic of this presentation is scaling JPA applications and historically there&apos;s been a couple of ways to do that. One of them is to add more nodes to your cluster. If you have a database tier and an application tier then you&apos;d be adding machines to the application tier. The other thing you can do of course is tune your database by doing SQL analysis to improve query performance. But there are limits to the scalability achievable with these approaches. Clearly you can tune your database and at some point you&apos;re going to hit the point at which no more tuning is possible and your database is running as fast as it can. And continually adding nodes to a cluster will increase the load on your database. You can keep adding clients to your database and eventually it won&apos;t be able to handle any more—you&apos;ll reach a limit. By leveraging Oracle Coherence, TopLink Grid offers a third way to scale JPA applications that doesn&apos;t suffer from the limitations we just discussed.
  • #37 So let&apos;s look at EclipseLink in a cluster. In the diagram we see a couple of application server nodes. At the bottom I have my database and in each node I have an EntityManagerFactory which contains a shared cache. This is an L2 cache that exists in each application server. On top of an EntityManagerFactory I&apos;d have any number of EntityManagers each of which has an L1 transaction level cache for objects that have been modified in the local transaction context. The challenge this scenario raises is that the shared caches in each of the cluster nodes needs to be kept consistent. And so changes made in one node need to be somehow reflected in the other nodes. If we fail to do this then changes in one node will be committed to the database and will be visible in that node&apos;s shared cache however the share caches of other nodes will contain stale data. Queries performed in the other nodes could return incorrect results.
  • #38 Traditionally, what we would do to scale JPA applications is one of two things: We could turn off the shared caches completely. The other approach is what we call &amp;quot;cache coordination&amp;quot; which is inter-node communication of changes.
  • #39 Note that we are not turning off caching altogether, we are just disabling the shared L2 cache. Each EntityManager would still have a local L1 cache that exists for the life of persistence context and afterwards is garbage collected.
  • #40 In this configuration the database is always right. But with the shared caches disabled, what would happen is that every transaction on every node would have to hit the database for all the data it needs. No data is cached between transactions. Now you can see that this would increase the load on the database significantly. But the upside of this is that every application transaction gets the current data values. So there are no problems scaling this up in term of data consistency because all nodes will have the right data. However the database will get hammered. And there are costs beyond the database costs. You have the costs incurred when every transaction has to build new objects out of relational database query results. On the positive side there&apos;s no inter-node messaging. The nodes are completely independent. You can keep adding nodes and they don’t have to know about or communicate with each other so there&apos;s no additional network load added by this configuration other than the traffic from each node to the database. But the memory footprint of each application server will increase. Each EntityManager running in your application server will have it&apos;s own copy of all the objects the application requires. There&apos;s no shared cache and so obviously nothing is shared and multiple copies of the same object will likely exist and therefore the memory footprint increases. But as mentioned earlier, the real downside to this configuration is that the database becomes the bottleneck. You can safely add any number of nodes/clients to your environment and you can tune the database to the maximum but at some point you&apos;re going to max it out and much sooner than if you had shared caching. The database will be the bottleneck.
  • #41 In the Cache Coordination configuration, we have messaging between cluster nodes so we can communicate changes from one node to all others to avoid having to hit the database in secondary nodes in order to see changes. This configuration is characterized by each node having a consistent view of the data. We say &amp;quot;consistent&amp;quot; but not &amp;quot;synchronized&amp;quot; because we are not synchronizing the shared caches as it may not be the most efficient way to maintain consistency. For example, if each node has object A in its shared cache and in one node it&apos;s modified, say A&apos;. Then what we can do is inform the other cluster caches that A has been modified and invalidate their cache. So we don&apos;t actually copy the changes to A, we don&apos;t synchronize the caches, we simply invalidate A. When and if a node with an invalidated A queries for it EclipseLink will see that A is invalid and query the latest version from the database. Of course there are also a number of different cache configurations available in which the invalid A will be garbage collected in which case the database would also be queried. Either way, all applications in all nodes will receive the latest version of A when it is queried. Cache coordination supports a number of messaging technologies out of the box including RMI, JMS, and IIOP. It&apos;s also very easy to plug in a new technology as the API required of a cache coordination provider is very small.
  • #42 The downside to cache coordination is that the creation or modification of any Entity in any cache requires messaging to every to other cluster node. This can be expensive. Also there is some latency involved so there is a window in which the shared caches are not consistent with each other. This just means that you need to still have optimistic locking configured as you would do anyway to deal with potential concurrent updates. The cost of cache coordination in a large cluster can be significant. The cost of every node in a cluster processing a single concurrent change requires them to message every other node. This cost is close to n 2 (specifically n(n-1)). When scaling up to tens or hundreds of nodes inter-node messaging is going to be the bottleneck. One of the obvious characteristics of this configuration is that the share cache size on each node is limited by the available heap. But because we have a shared cache it&apos;s possible to share objects between transactions. There are mechanisms in EclipseLink to support sharing to avoid copy on read which can help keep the memory footprint of each transaction to a minimum.
  • #43 So let&apos;s look at how we can work around these issues we have with scaling JPA and the shortcomings of the two strategies we&apos;ve looked at. TopLink Grid is a new component of Oracle TopLink and it provides a way for JPA developers to leverage the Coherence data grid to scale applications. What&apos;s nice about the TopLink Grid approach is it combines the use of the Java Persistence API with Coherence. That is, the programming model is the Java standard JPA programming model but you are able to leverage Coherence. There&apos;s no need for a JPA developer to learn a new API to scale their applications. The integration is fairly transparent as we&apos;ll see in a few slides. We call this JPA programming model backed by Coherence &amp;quot;JPA on the Grid&amp;quot;.
  • #44 So TopLink Grid supports a &amp;quot;JPA on the Grid&amp;quot; architecture. In the base configuration, Coherence is a replacement for the shared L2 cache for EclipseLink JPA and there are some more advanced configurations that we&apos;ll see were we leverage even more of Coherence&apos;s power.
  • #45 The diagram illustrates the how Coherence becomes a truly shared cache which spans the cluster. To each node the cache appears to be local but is in fact distributed across the cluster.
  • #46 There are three core TopLink Grid configurations: The first is &amp;quot;Grid Cache&amp;quot; where we use Coherence as a replacement for the shared cache implementation. We can configure this on an Entity by Entity basis—we can specify whether a particular Entity type is cached in Coherence or in the built-in shared cache. In this configuration, anything put into Coherence in one node is immediately available to every other node. The second configuration is &amp;quot;Grid Read&amp;quot;. In this configuration all read queries for a particular Entity are redirected to Coherence. And the third configuration is &amp;quot;Grid Entity&amp;quot;. In this configuration all read and write operations are redirected to Coherence instead of the database. Let&apos;s take a closer look at each of these configurations and their characteristics.
  • #47 Let&apos;s step through what happens when we perform a read. A query is performed, either a JPQL query or an entity manager find by primary key. A primary key query will go to Coherence and do a get() by primary key. If you find the object it is simply returned. If the get returns a null or the query was at JPQL query then the database is queried. So for all JPQL queries we do hit the database. We are just using Coherence for primary key queries. If we do query the database then we perform a select, we build the objects, we put them into Coherence, and we return them to the application for use. And in this way we populate the Coherence cache.
  • #48 The optimization that is not necessarily apparent is that EclipseLink leverages the cache when processing query results. What we do is we extract the primary keys from database query results and look for objects in the cache. So even if we are issuing a SQL query, let&apos;s say we have &amp;quot;select e from Employee e where name like &apos;B%&apos;&amp;quot; we will get all the employees back but we won&apos;t pay the cost of building objects if we&apos;ve previously built those objects. We will look in Coherence or in the local shared cache, depending upon how the entity was configured, and we will use it if its version number indicates it&apos;s current. We can avoid a huge application tier processing cost by using the cache instead of building objects every time.
  • #49 Let&apos;s look at the process of writing objects using the Grid Cache configuration. To write an object or update an object we&apos;re going to either read and modify, persist, or merge an Entity and then commit a transaction. In the Grid Cache scenario EclipseLink will directly perform the database transaction – it&apos;ll do the necessary inserts and updates– and commit the transaction. If the transaction commits successfully Coherence will be updated with the changed objects. So Coherence will have objects that reflect the committed database state.
  • #50 Configuring Grid Cache is very easy. We have support for both annotations and XML configuration. The annotation approach is shown here. What we have here is an Employee entity and we&apos;re going to attach a cache interceptor. Cache interceptor is an API in EclipseLink JPA that lets us plug-in any cache implementation. In this case were going to plug-in a Coherence cache interceptor which will redirect all cache interactions to Coherence rather than to the built-in shared cache. This configuration is very straightforward and can be configured on any entity individually.
  • #51 Ok lets look at how reads are performed in this configuration. You can issue either a find or JPQL query against the EntityManager. If we do a find then we do a get() on Coherence. If we do a JPQL query then it will be translated to a Coherence filter and that filter passed to Coherence. The database is not queried by EclipseLink. If you have a CacheLoader you may load an individual object as a result of a get() by primary key. But if you issue a JPQL query which is translated to a filter then the database will not be consulted. So you can see that in this configuration you&apos;re going to want to warm your cache before you begin your application.
  • #52 The write is very much like the previous configuration with EclipseLink doing the writing and upon successful commit the changes are placed into Coherence.
  • #53 Configuration is slightly different than the Grid Cache. You can use either annotations or XML but in this case you use a Customizer annotation. We aren&apos;t simply plugging in Coherence as the shared cache anymore. In the slide we customize the metadata for the employee with an object that&apos;s provided by TopLink Grid call the CoherenceReadCustomizer and this does the necessary changes to the configuration of the entity to setup the Grid Read configuration.
  • #54 There are some limitations in the current TopLink Grid 11gR1 release. The first is in the JPQL can be translated. We&apos;re currently limited by the features that are provided by Coherence. So for example we can do simple selects as on the slide. This is easily translated to a filter. More complex queries, specifically queries involving joins, will not be transmitted into filters. So for example I have here &amp;quot;select e from Employee e where e.address.city = &apos;Bonn&apos;&amp;quot; where both Employee and Address are Entities. In TopLink Grid with Coherence the Employee and Address entities are stored in different caches and we cannot currently process this query against Coherence. Instead we will follow the normal query processing route and translate the query into SQL and execute it against the database. We&apos;ll use the database to identify the results but we will then use Coherence to look for those entities in the cache to avoid having to pay object build costs. We also currently don&apos;t support projection or report queries so selecting data values of objects is not supported and such queries will also be directed to the database.
  • #55 Configuration is almost identical to that of Grid Read except that we now use a CoherenceReadWriteCustomizer to configure the entity.
  • #56 Reading is the same as in the Grid Read configuration so there is nothing new here.
  • #57 On the writing side things are a little different from the Grid Read. Unlike in the two previous configurations, when you update objects and commit a transaction EclipseLink executes puts into Coherence for all the new or modified entities in the transaction. If you have a CacheStore configured these changes can be pushed out the database either synchronously are asynchronously. If you don&apos;t have a CacheStore configured they aren&apos;t pushed to the database. One thing you must be aware of when using CacheStores is that the writes must be idempotent and commits must succeed. The EclipseLink object level transaction can succeed but if your asynchronous database writes later fail you now have a database out of sync with your cache. Again this is nothing new for Coherence developers working with CacheStores.
  • #58 Now let&apos;s compare this with Hibernate&apos;s use of Coherence as a shared L2 cache. The first difference is that Hibernate&apos;s shared cache is a data cache. It caches data rows rather than objects and serializes these rows into Coherence. A Coherence cache hit in Hibernate incurs both deserialization and object construction costs every time and in every cluster member. For example, when an object is read from the database in Node One a data row object is constructed from the JDBC result row and an object is constructed. The data row object is serialized into Coherence. On Node Two, querying this same object by primary key would get a Coherence cache hit which would return the deserialized data row object. Hibernate will then pay the object build cost on Node Two to construct the object from the row. As you can see an object build cost will be paid on every node for every cache hit unlike in TopLink Grid where this cost is only paid on the initial read. The other significant difference is that Hibernate only uses Coherence as a cache. There is no way to leverage Coherence&apos;s ability to perform parallel queries in the grid to offload the database. Hibernate only uses Coherence in the most basic way whereas TopLink Grid is able to leverage the distributed compute power of the grid.
  • #59 In this presentation we&apos;ve seen a number of ways to scale JPA applications with TopLink. TopLink Grid is a new feature in Oracle TopLink that offers a new way to scale by supporting &amp;quot;JPA on the Grid&amp;quot; which goes beyond simple caching and provides a way to leverage the power of the Oracle Coherence data grid. TopLink Grid adds unique support for caching complex object class in Coherence along with support for both eager and lazy loading of related objects. TopLink Grid with Coherence provides the most scalable platform for building Enterprise JPA applications. Oracle TopLink and Oracle Coherence are key components in Oracle WebLogic Suite.