• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
JPA and Coherence with TopLink Grid
 

JPA and Coherence with TopLink Grid

on

  • 2,659 views

 

Statistics

Views

Total Views
2,659
Views on SlideShare
2,659
Embed Views
0

Actions

Likes
3
Downloads
109
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Initial Diagram: What we see in this slide is a high level architecture diagram for TopLink. EclipseLink is at the core of TopLink and EclipseLink provides the persistence services we saw on the previous slide. The MOXy (Mapping Objects to XML) component is EclipseLink's JAXB implementation. Animation: Add TopLink We bundle TopLink Grid with EclipseLink to compose the Oracle TopLink product. If you look at the TopLink product that you can download today what you'll see is an EclipseLink jar, a TopLink Grid jar, and a jar named toplink.jar which contains the backwards compatibility support for older applications. This diagram illustrates the contents of Oracle TopLink but to use TopLink Grid you'd combine TopLink with Oracle Coherence. Animation: Add Coherence Both Coherence and TopLink are components of WebLogic Suite. Animation: Add WebLogic Suite If you’re working with WebLogic Suite then you have all these products available to you. Animation: Developer Tools I mentioned a number of developer tools support TopLink and TopLink Grid and those include JDeveloper—JDeveloper has extensive support for developing with TopLink. In Eclipse we have support in the Web Tools Platform's Dali project for JPA development and OEPE, the Oracle Enterprise Pack for Eclipse, which includes Dali offers some addition JPA tooling.
  • EclipseLink is a project at Eclipse (as the name suggests). It's a project lead by Oracle and was founded with the full source code for Oracle TopLink and for its test suites. Oracle contributed all of TopLink and there are no secret "go fast" bits retained by Oracle. The entire product was open sourced and the development team that previously was working on Oracle TopLink is now working in open source in the subversion repository at Eclipse—the same developers, same source albeit moved from the oracle.toplink.* packages to org.eclipse.persistence.* packages. What's significant about EclipseLink is that although the latest release (as of this writing) is 1.2 this is not new code. This is code that has been evolved and used in many commercial applications in a wide variety of environments for well over a decade. There's a lot of experience, a lot of corner cases and real world customer requirements baked into this software so it is a very mature and capable code base. As I mentioned Oracle redistributes EclipseLink in TopLink and so we certify it on WebLogic and provide support for it. TopLink customers can call Oracle support for EclipseLink issues.
  • The topic of this presentation is scaling JPA applications and historically there's been a couple of ways to do that. One of them is to add more nodes to your cluster. If you have a database tier and an application tier then you'd be adding machines to the application tier. The other thing you can do of course is tune your database by doing SQL analysis to improve query performance. But there are limits to the scalability achievable with these approaches. Clearly you can tune your database and at some point you're going to hit the point at which no more tuning is possible and your database is running as fast as it can. And continually adding nodes to a cluster will increase the load on your database. You can keep adding clients to your database and eventually it won't be able to handle any more—you'll reach a limit. By leveraging Oracle Coherence, TopLink Grid offers a third way to scale JPA applications that doesn't suffer from the limitations we just discussed.
  • So let's look at EclipseLink in a cluster. In the diagram we see a couple of application server nodes. At the bottom I have my database and in each node I have an EntityManagerFactory which contains a shared cache. This is an L2 cache that exists in each application server. On top of an EntityManagerFactory I'd have any number of EntityManagers each of which has an L1 transaction level cache for objects that have been modified in the local transaction context. The challenge this scenario raises is that the shared caches in each of the cluster nodes needs to be kept consistent. And so changes made in one node need to be somehow reflected in the other nodes. If we fail to do this then changes in one node will be committed to the database and will be visible in that node's shared cache however the share caches of other nodes will contain stale data. Queries performed in the other nodes could return incorrect results.
  • Traditionally, what we would do to scale JPA applications is one of two things: We could turn off the shared caches completely. The other approach is what we call "cache coordination" which is inter-node communication of changes.
  • Note that we are not turning off caching altogether, we are just disabling the shared L2 cache. Each EntityManager would still have a local L1 cache that exists for the life of persistence context and afterwards is garbage collected.
  • In this configuration the database is always right. But with the shared caches disabled, what would happen is that every transaction on every node would have to hit the database for all the data it needs. No data is cached between transactions. Now you can see that this would increase the load on the database significantly. But the upside of this is that every application transaction gets the current data values. So there are no problems scaling this up in term of data consistency because all nodes will have the right data. However the database will get hammered. And there are costs beyond the database costs. You have the costs incurred when every transaction has to build new objects out of relational database query results. On the positive side there's no inter-node messaging. The nodes are completely independent. You can keep adding nodes and they don’t have to know about or communicate with each other so there's no additional network load added by this configuration other than the traffic from each node to the database. But the memory footprint of each application server will increase. Each EntityManager running in your application server will have it's own copy of all the objects the application requires. There's no shared cache and so obviously nothing is shared and multiple copies of the same object will likely exist and therefore the memory footprint increases. But as mentioned earlier, the real downside to this configuration is that the database becomes the bottleneck. You can safely add any number of nodes/clients to your environment and you can tune the database to the maximum but at some point you're going to max it out and much sooner than if you had shared caching. The database will be the bottleneck.
  • In the Cache Coordination configuration, we have messaging between cluster nodes so we can communicate changes from one node to all others to avoid having to hit the database in secondary nodes in order to see changes. This configuration is characterized by each node having a consistent view of the data. We say "consistent" but not "synchronized" because we are not synchronizing the shared caches as it may not be the most efficient way to maintain consistency. For example, if each node has object A in its shared cache and in one node it's modified, say A'. Then what we can do is inform the other cluster caches that A has been modified and invalidate their cache. So we don't actually copy the changes to A, we don't synchronize the caches, we simply invalidate A. When and if a node with an invalidated A queries for it EclipseLink will see that A is invalid and query the latest version from the database. Of course there are also a number of different cache configurations available in which the invalid A will be garbage collected in which case the database would also be queried. Either way, all applications in all nodes will receive the latest version of A when it is queried. Cache coordination supports a number of messaging technologies out of the box including RMI, JMS, and IIOP. It's also very easy to plug in a new technology as the API required of a cache coordination provider is very small.
  • The downside to cache coordination is that the creation or modification of any Entity in any cache requires messaging to every to other cluster node. This can be expensive. Also there is some latency involved so there is a window in which the shared caches are not consistent with each other. This just means that you need to still have optimistic locking configured as you would do anyway to deal with potential concurrent updates. The cost of cache coordination in a large cluster can be significant. The cost of every node in a cluster processing a single concurrent change requires them to message every other node. This cost is close to n 2 (specifically n(n-1)). When scaling up to tens or hundreds of nodes inter-node messaging is going to be the bottleneck. One of the obvious characteristics of this configuration is that the share cache size on each node is limited by the available heap. But because we have a shared cache it's possible to share objects between transactions. There are mechanisms in EclipseLink to support sharing to avoid copy on read which can help keep the memory footprint of each transaction to a minimum.
  • So let's look at how we can work around these issues we have with scaling JPA and the shortcomings of the two strategies we've looked at. TopLink Grid is a new component of Oracle TopLink and it provides a way for JPA developers to leverage the Coherence data grid to scale applications. What's nice about the TopLink Grid approach is it combines the use of the Java Persistence API with Coherence. That is, the programming model is the Java standard JPA programming model but you are able to leverage Coherence. There's no need for a JPA developer to learn a new API to scale their applications. The integration is fairly transparent as we'll see in a few slides. We call this JPA programming model backed by Coherence "JPA on the Grid".
  • So TopLink Grid supports a "JPA on the Grid" architecture. In the base configuration, Coherence is a replacement for the shared L2 cache for EclipseLink JPA and there are some more advanced configurations that we'll see were we leverage even more of Coherence's power.
  • The diagram illustrates the how Coherence becomes a truly shared cache which spans the cluster. To each node the cache appears to be local but is in fact distributed across the cluster.
  • There are three core TopLink Grid configurations: The first is "Grid Cache" where we use Coherence as a replacement for the shared cache implementation. We can configure this on an Entity by Entity basis—we can specify whether a particular Entity type is cached in Coherence or in the built-in shared cache. In this configuration, anything put into Coherence in one node is immediately available to every other node. The second configuration is "Grid Read". In this configuration all read queries for a particular Entity are redirected to Coherence. And the third configuration is "Grid Entity". In this configuration all read and write operations are redirected to Coherence instead of the database. Let's take a closer look at each of these configurations and their characteristics.
  • Let's step through what happens when we perform a read. A query is performed, either a JPQL query or an entity manager find by primary key. A primary key query will go to Coherence and do a get() by primary key. If you find the object it is simply returned. If the get returns a null or the query was at JPQL query then the database is queried. So for all JPQL queries we do hit the database. We are just using Coherence for primary key queries. If we do query the database then we perform a select, we build the objects, we put them into Coherence, and we return them to the application for use. And in this way we populate the Coherence cache.
  • The optimization that is not necessarily apparent is that EclipseLink leverages the cache when processing query results. What we do is we extract the primary keys from database query results and look for objects in the cache. So even if we are issuing a SQL query, let's say we have "select e from Employee e where name like 'B%'" we will get all the employees back but we won't pay the cost of building objects if we've previously built those objects. We will look in Coherence or in the local shared cache, depending upon how the entity was configured, and we will use it if its version number indicates it's current. We can avoid a huge application tier processing cost by using the cache instead of building objects every time.
  • Let's look at the process of writing objects using the Grid Cache configuration. To write an object or update an object we're going to either read and modify, persist, or merge an Entity and then commit a transaction. In the Grid Cache scenario EclipseLink will directly perform the database transaction – it'll do the necessary inserts and updates– and commit the transaction. If the transaction commits successfully Coherence will be updated with the changed objects. So Coherence will have objects that reflect the committed database state.
  • Configuring Grid Cache is very easy. We have support for both annotations and XML configuration. The annotation approach is shown here. What we have here is an Employee entity and we're going to attach a cache interceptor. Cache interceptor is an API in EclipseLink JPA that lets us plug-in any cache implementation. In this case were going to plug-in a Coherence cache interceptor which will redirect all cache interactions to Coherence rather than to the built-in shared cache. This configuration is very straightforward and can be configured on any entity individually.
  • Ok lets look at how reads are performed in this configuration. You can issue either a find or JPQL query against the EntityManager. If we do a find then we do a get() on Coherence. If we do a JPQL query then it will be translated to a Coherence filter and that filter passed to Coherence. The database is not queried by EclipseLink. If you have a CacheLoader you may load an individual object as a result of a get() by primary key. But if you issue a JPQL query which is translated to a filter then the database will not be consulted. So you can see that in this configuration you're going to want to warm your cache before you begin your application.
  • The write is very much like the previous configuration with EclipseLink doing the writing and upon successful commit the changes are placed into Coherence.
  • Configuration is slightly different than the Grid Cache. You can use either annotations or XML but in this case you use a Customizer annotation. We aren't simply plugging in Coherence as the shared cache anymore. In the slide we customize the metadata for the employee with an object that's provided by TopLink Grid call the CoherenceReadCustomizer and this does the necessary changes to the configuration of the entity to setup the Grid Read configuration.
  • There are some limitations in the current TopLink Grid 11gR1 release. The first is in the JPQL can be translated. We're currently limited by the features that are provided by Coherence. So for example we can do simple selects as on the slide. This is easily translated to a filter. More complex queries, specifically queries involving joins, will not be transmitted into filters. So for example I have here "select e from Employee e where e.address.city = 'Bonn'" where both Employee and Address are Entities. In TopLink Grid with Coherence the Employee and Address entities are stored in different caches and we cannot currently process this query against Coherence. Instead we will follow the normal query processing route and translate the query into SQL and execute it against the database. We'll use the database to identify the results but we will then use Coherence to look for those entities in the cache to avoid having to pay object build costs. We also currently don't support projection or report queries so selecting data values of objects is not supported and such queries will also be directed to the database.
  • Configuration is almost identical to that of Grid Read except that we now use a CoherenceReadWriteCustomizer to configure the entity.
  • Reading is the same as in the Grid Read configuration so there is nothing new here.
  • On the writing side things are a little different from the Grid Read. Unlike in the two previous configurations, when you update objects and commit a transaction EclipseLink executes puts into Coherence for all the new or modified entities in the transaction. If you have a CacheStore configured these changes can be pushed out the database either synchronously are asynchronously. If you don't have a CacheStore configured they aren't pushed to the database. One thing you must be aware of when using CacheStores is that the writes must be idempotent and commits must succeed. The EclipseLink object level transaction can succeed but if your asynchronous database writes later fail you now have a database out of sync with your cache. Again this is nothing new for Coherence developers working with CacheStores.
  • Now let's compare this with Hibernate's use of Coherence as a shared L2 cache. The first difference is that Hibernate's shared cache is a data cache. It caches data rows rather than objects and serializes these rows into Coherence. A Coherence cache hit in Hibernate incurs both deserialization and object construction costs every time and in every cluster member. For example, when an object is read from the database in Node One a data row object is constructed from the JDBC result row and an object is constructed. The data row object is serialized into Coherence. On Node Two, querying this same object by primary key would get a Coherence cache hit which would return the deserialized data row object. Hibernate will then pay the object build cost on Node Two to construct the object from the row. As you can see an object build cost will be paid on every node for every cache hit unlike in TopLink Grid where this cost is only paid on the initial read. The other significant difference is that Hibernate only uses Coherence as a cache. There is no way to leverage Coherence's ability to perform parallel queries in the grid to offload the database. Hibernate only uses Coherence in the most basic way whereas TopLink Grid is able to leverage the distributed compute power of the grid.
  • In this presentation we've seen a number of ways to scale JPA applications with TopLink. TopLink Grid is a new feature in Oracle TopLink that offers a new way to scale by supporting "JPA on the Grid" which goes beyond simple caching and provides a way to leverage the power of the Oracle Coherence data grid. TopLink Grid adds unique support for caching complex object class in Coherence along with support for both eager and lazy loading of related objects. TopLink Grid with Coherence provides the most scalable platform for building Enterprise JPA applications. Oracle TopLink and Oracle Coherence are key components in Oracle WebLogic Suite.

JPA and Coherence with TopLink Grid JPA and Coherence with TopLink Grid Presentation Transcript