An Engineer's Intro to Oracle Coherence


Published on

Building scalable, highly-available applications that perform well is not an easy task. These features cannot be simply “bolted” onto an existing application – they have to be architected into it. Unfortunately, the things we need to do to achieve them are often in conflict with each other, and finding the right balance is crucial. In this session we will discuss why scaling web applications is difficult and will look at some of solutions we have come up with in the past to deal with the issues involved. We will then look at how in-memory data grids can make our jobs easier by providing a solid architectural foundation to build our applications on top of. If you are new to in-memory data grids, you are guaranteed to leave the presentation eager to learn more. However, even if you are already using one you will likely walk out with a few ideas on how to improve performance and scalability of your applications.

Published in: Technology, Education
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • (c) Copyright 2007. Oracle Corporation
  • So why can’t we use database technology to bring high-performance transaction processing to Java applications? The problem is the classic mismatch between object and relational and the huge performance penalty translating back and forth between those two representations of the data. First the object data must be loaded into mid-tier memory from several relational database tables. Then the transaction (object method) is performed. Finally the data is written back to the relational database to commit the transaction and save session state. If another transaction (method call) is performed with the same object, this same process is repeated beginning to end. This performance problem is compounded in modern Event Driven Architectures where one object method call can spawn a whole succession of others.
  • It is a Development Library. In Java it is jars, dlls etc. We ship with other components Jars to support Spring and Groovy HTTP Session can be used for WLS, OAS. Large online retailer has unified shopping cart across multiple application servers. (WAS, .Net) WebInstaller which replaces default replication
  • Serialization Options Because serialization is often the most expensive part of clustered data management, Coherence provides the following options for serializing/deserializing data: – The simplest, but slowest option. – The Portable Object Format (also referred to as POF) is a language agnostic binary format. POF was designed to be incredibly efficient in both space and time and has become the recommended serialization option in Coherence. – This requires developers to implement serialization manually, but can provide significant performance benefits. Compared to, this can cut serialized data size by a factor of two or more (especially helpful with Distributed caches, as they generally cache data in serialized form). Most importantly, CPU usage is dramatically reduced. – This is very similar to, but offers better performance and less memory usage by using a more efficient I/O stream implementation.– A default implementation of ExternalizableLite (c) Copyright 2007. Oracle Corporation
  • Coherence provides several cache implementations: Local Cache—Local on-heap caching for non-clustered caching. Replicated Cache Service—Perfect for small, read-heavy caches. Partitioned Cache Service—True linear scalability for both read and write access. Data is automatically, dynamically and transparently partitioned across nodes. The distribution algorithm minimizes network traffic and avoids service pauses by incrementally shifting data. Near Cache—Provides the performance of local caching with the scalability of distributed caching. Several different near-cache strategies provide varying trade-offs between performance and synchronization guarantees. In-process caching provides the highest level of raw performance, since objects are managed within the local JVM. This benefit is most directly realized by the Local, Replicated, Optimistic and Near Cache implementations. Out-of-process (client/server) caching provides the option of using dedicated cache servers. This can be helpful when you want to partition workloads (to avoid stressing the application servers). This is accomplished by using the Partitioned cache implementation and simply disabling local storage on client nodes through a single command-line option or a one-line entry in the XML configuration. Tiered caching (using the Near Cache functionality) enables you to couple local caches on the application server with larger, partitioned caches on the cache servers, combining the raw performance of local caching with the scalability of partitioned caching. This is useful for both dedicated cache servers and co-located caching (cache partitions stored within the application server JVMs). Tech Details Appendix for Cache types/strategies Distributed Cache A distributed, or partitioned, cache is a clustered, fault-tolerant cache that has linear scalability. Data is partitioned among all the machines of the cluster. For fault-tolerance, partitioned caches can be configured to keep each piece of data on one or more unique machines within a cluster. Distributed caches are the most commonly used caches in Coherence. Replicated Cache A replicated cache is a clustered, fault tolerant cache where data is fully replicated to every member in the cluster. This cache offers the fastest read performance with linear performance scalability for reads but poor scalability for writes (as writes must be processed by every member in the cluster). Because data is replicated to all machines, adding servers does not increase aggregate cache capacity. Optimistic Cache An optimistic cache is a clustered cache implementation similar to the replicated cache implementation but without any concurrency control. This implementation offers higher write throughput than a replicated cache. It also allows an alternative underlying store for the cached data (for example, a MRU/MFU-based cache). However, if two cluster members are independently pruning or purging the underlying local stores, it is possible that a cluster member may have a different store content than that held by another cluster member. Near Cache A near cache is a hybrid cache; it typically fronts a distributed cache or a remote cache with a local cache. Near cache invalidates front cache entries, using configurable invalidation strategy, and provides excellent performance and synchronization. Near cache backed by a partitioned cache offers zero-millisecond local access for repeat data access, while enabling concurrency and ensuring coherency and fail-over, effectively combining the best attributes of replicated and partitioned caches. Local Cache A local cache is a cache that is local to (completely contained within) a particular cluster node. While it is not a clustered service, the Coherence local cache implementation is often used in combination with various clustered cache services. Remote Cache A remote cache describes any out of process cache accessed by a Coherence*Extend client. All cache requests are sent to a Coherence proxy where they are delegated to one of the other Coherence cache types (Repilcated, Optimistic, Partitioned).
  • Data Grids are used for different purposes. These are the four most common uses. Caching Coherence was the first technology to prove reliable distributed caching Helped many organizations alleviate data bottleneck issues and scale out application tier Analytics Enables applications to efficiently run queries across entire data grid Support for heavy query loads, while improving responsiveness of each query Server failures do not impact correctness of “in flight” queries and analytics Transactions Data Grid provides optimal platform for joining data and business logic Greater business agility by moving database stored procedures into the Data Grid Coherence reliability allows not only in-memory data processing, but provides the ability to commit transactions in-memory Reliability is key to conducting in-memory transactions. Coherence provides absolute reliability – every transaction matters. Events Oracle Coherence Data Grid manages processing state, guaranteeing once-and-only-once event processing Data Grid provides scalable management of event processing
  • An Engineer's Intro to Oracle Coherence

    1. 1. Oracle Coherence Integration with 
WebLogic Server & WebLogic Portal

 An engineer’s introduction to in-memory data grid development Presenter Title
    2. 2. Agenda <ul><li>What Is Coherence </li></ul><ul><ul><li>Distributed Data Grid </li></ul></ul><ul><li>How Does It Work? </li></ul><ul><li>Use Cases </li></ul><ul><ul><li>Customer Examples </li></ul></ul><ul><li>Q&A </li></ul>
    3. 3. Oracle Coherence <ul><li>Development Toolkit </li></ul><ul><ul><li>Pure Java 1.5+ Libraries </li></ul></ul><ul><ul><li>Pure .Net 1.1 and 2.0 (Client Libraries) </li></ul></ul><ul><ul><li>No Third-Party Dependencies </li></ul></ul><ul><ul><li>No Open Source Dependencies </li></ul></ul><ul><li>Other Libraries for… </li></ul><ul><ul><li>Database and File System Integration </li></ul></ul><ul><ul><li>Top Link and Hibernate </li></ul></ul><ul><ul><li>Http Session Management, Spring, … </li></ul></ul>
    4. 4. Oracle Coherence <ul><li>Provides… </li></ul><ul><ul><li>Container-less Clustering of Java Processes </li></ul></ul><ul><ul><li>Data Structures to manage Data across a Cluster / Grid </li></ul></ul><ul><ul><li>Real-Time Event Observation – Listener Pattern </li></ul></ul><ul><ul><li>Materialized Views of Data </li></ul></ul><ul><ul><li>Parallel Queries and Aggregation – Object-based Queries </li></ul></ul><ul><ul><li>Parallel Data Processing </li></ul></ul><ul><ul><li>Parallel Grid Processing </li></ul></ul><ul><ul><li>RemoteException Free Distributed Computing </li></ul></ul><ul><ul><li>Clustered JMX </li></ul></ul><ul><ul><li>MAN + WAN Connectivity </li></ul></ul><ul><ul><li>Client + Data Grid Deployment Models </li></ul></ul>
    5. 5. Distributed Data Grid
    6. 6. <Insert Picture Here> “ A Data Grid is a system composed of multiple servers that work together to manage information and related operations - such as computations - in a distributed environment .” Cameron Purdy VP of Development, Oracle
    7. 7. What is a Data Grid? <ul><li>What </li></ul><ul><ul><li>In-Memory </li></ul></ul><ul><ul><li>Objects </li></ul></ul><ul><ul><li>Shared </li></ul></ul><ul><li>Benefits </li></ul><ul><ul><li>Low response time </li></ul></ul><ul><ul><li>High throughput </li></ul></ul><ul><ul><li>Predictable scalability </li></ul></ul><ul><ul><li>Continuous availability </li></ul></ul><ul><ul><li>Information reliability </li></ul></ul>
    8. 8. Scalability Chasm Application Servers Web Servers Data Demand Ever Expanding Universe of Users Data Supply <ul><li>Data Demand outpacing Data Supply </li></ul><ul><li>Rate of growth outpacing ability to cost effectively scale applications </li></ul>
    9. 9. Performance Problem A Performance Bottleneck Application Database Tables Object Java SQL server Relational <ul><li>Volume </li></ul><ul><li>Complexity </li></ul><ul><li>Frequency of Data Access </li></ul>
    10. 10. Oracle Coherence as Data Broker Application Servers Web Servers Data Demand Ever Expanding Universe of Users Data Supply <ul><li>Oracle Coherence brokers Data Supply with Data Demand </li></ul><ul><li>Scale out Data Grid in middle tier using commodity hardware </li></ul>Data Sources Objects
    11. 11. <Insert Picture Here> Coherence Clustering
    12. 12. Coherence Clustering: Tangosol Clustered Messaging Protocol (TCMP) <ul><li>Completely asynchronous yet ordered messaging built on UDP multicast/unicast </li></ul><ul><li>Truly Peer-to-Peer : equal responsibility for both producing and consuming the services of the cluster </li></ul><ul><li>Self Healing - Quorum based diagnostics </li></ul><ul><li>Linearly scalable mesh architecture . </li></ul><ul><li>TCP-like features </li></ul><ul><li>Messaging throughput scales to the network infrastructure. </li></ul>
    13. 13. Coherence Clustering: The Cluster Service <ul><li>Transparent , dynamic and automatic cluster membership management </li></ul><ul><li>Clustered Consensus: All members in the cluster understand the topology of the entire grid at all times . </li></ul><ul><li>Crowdsourced member health diagnostics </li></ul>
    14. 14. Coherence Distributed data management for applications <ul><li>Development Library </li></ul><ul><ul><li>Pure Java 1.4.2+ </li></ul></ul><ul><ul><li>Pure .Net 1.1 and 2.0 (client) </li></ul></ul><ul><ul><li>C++ client (3.4) </li></ul></ul><ul><ul><li>No Third-Party Dependencies </li></ul></ul><ul><ul><li>No Open Source Dependencies </li></ul></ul><ul><ul><li>Proprietary Network Stack (Peer-To-Peer model) </li></ul></ul><ul><li>Other Libraries Support… </li></ul><ul><ul><li>Database and File System Integration </li></ul></ul><ul><ul><li>TopLink and Hibernate </li></ul></ul><ul><ul><li>Http Session Management </li></ul></ul><ul><ul><li>WebLogic Portal Caches </li></ul></ul><ul><ul><li>Spring, Groovy </li></ul></ul>
    15. 15. The Portable Object Format Advanced Serialization <ul><li>Simple Serialization Comparison </li></ul><ul><ul><li>In XML </li></ul></ul><ul><ul><ul><li><date format=“java.util.Date”>2008-07-03</date> </li></ul></ul></ul><ul><ul><ul><li>47 characters (possibly 94 bytes depending on encoding) </li></ul></ul></ul><ul><ul><li>In Java (as a raw long) </li></ul></ul><ul><ul><ul><li>64 bits = 8 bytes </li></ul></ul></ul><ul><ul><li>In Java (java.util.Date using ObjectOutputStream) </li></ul></ul><ul><ul><ul><li>46 bytes </li></ul></ul></ul><ul><ul><li>In ExternalizableLite (as a raw long) </li></ul></ul><ul><ul><ul><li>8 bytes </li></ul></ul></ul><ul><ul><li>In POF </li></ul></ul><ul><ul><ul><li>4F 58 1F 70 6C = 5 bytes </li></ul></ul></ul>(c) Copyright 2008. Oracle Corporation
    16. 16. ©2011 Oracle Corporation Coherence Cache Types / Strategies Replicated Cache Optimistic Cache Partitioned Cache Near Cache backed by partitioned cache LocalCache not clustered Topology Replicated Replicated Partitioned Cache Local Caches + Partitioned Cache Local Cache Read Performance Instant Instant Locally cached: instant --Remote: network speed Locally cached: instant -- Remote: network speed Instant Fault Tolerance Extremely High Extremely High Configurable Zero to Extremely High Configurable 4 Zero to Extremely High Zero Write Performance Fast Fast Extremely fast Extremely fast Instant Memory Usage (Per JVM) DataSize DataSize DataSize/JVMs x Redundancy LocalCache + [DataSize / JVMs] DataSize Coherency fully coherent fully coherent fully coherent fully coherent n/a Memory Usage (Total) JVMs x DataSize JVMs x DataSize Redundancy x DataSize [Redundancy x DataSize] + [JVMs x LocalCache] n/a Locking fully transactional none fully transactional fully transactional fully transactional Typical Uses Metadata n/a (see Near Cache) Read-write caches Read-heavy caches w/ access affinity Local data
    17. 17. Use Cases
    18. 18. Data Grid Uses Caching Applications request data from the Data Grid rather than backend data sources Analytics Applications ask the Data Grid questions from simple queries to advanced scenario modeling Transactions Data Grid acts as a transactional System of Record, hosting data and business logic Events Automated processing based on event
    19. 19. Code Examples
    20. 20. Clustering Java Processes <ul><li>Joins an existing cluster or forms a new cluster </li></ul><ul><ul><li>Time “to join” configurable </li></ul></ul><ul><li>cluster contains information about the Cluster </li></ul><ul><ul><li>Cluster Name </li></ul></ul><ul><ul><li>Members </li></ul></ul><ul><ul><li>Locations </li></ul></ul><ul><ul><li>Processes </li></ul></ul><ul><li>No “master” servers </li></ul><ul><li>No “server registries” </li></ul>Cluster cluster = CacheFactory.ensureCluster(); (c) Copyright 2007. Oracle Corporation
    21. 21. Leaving a Cluster <ul><li>Leaves the current cluster </li></ul><ul><li>shutdown blocks until “data” is safe </li></ul><ul><li>Failing to call shutdown results in Coherence having to detect process death/exit and recover information from another process. </li></ul><ul><li>Death detection and recovery is automatic </li></ul>(c) Copyright 2007. Oracle Corporation CacheFactory.shutdown();
    22. 22. Using a Cache get, put, size & remove <ul><li>CacheFactory resolves cache names (ie: “mine” ) to configured NamedCache s </li></ul><ul><li>NamedCache provides data topology agnostic access to information </li></ul><ul><li>NamedCache interfaces implement several interfaces; </li></ul><ul><ul><li>java.util.Map, Jcache, ObservableMap * , ConcurrentMap * , QueryMap * , InvocableMap * </li></ul></ul>(c) Copyright 2007. Oracle Corporation NamedCache nc = CacheFactory.getCache(“mine”); Object previous = nc.put(“key”, “hello world”); Object current = nc.get(“key”); int size = nc.size(); Object value = nc.remove(“key”); Coherence* Extensions
    23. 23. Using a Cache keySet, entrySet, containsKey <ul><li>Using a NamedCache is like using a java.util.Map </li></ul><ul><li>What is the difference between a Map and a Cache data-structure? </li></ul><ul><ul><li>Both use (key,value) pairs for entries </li></ul></ul><ul><ul><li>Map entries don’t expire </li></ul></ul><ul><ul><li>Cache entries may expire </li></ul></ul><ul><ul><li>Maps are typically limited by heap space </li></ul></ul><ul><ul><li>Caches are typically size limited (by number of entries or memory) </li></ul></ul><ul><ul><li>Map content is typically in-process (on heap) </li></ul></ul>(c) Copyright 2007. Oracle Corporation NamedCache nc = CacheFactory.getCache(“mine”); Set keys = nc.keySet(); Set entries = nc.entrySet(); boolean exists = nc.containsKey(“key”);
    24. 24. Observing Cache Changes ObservableMap <ul><li>Observe changes in real-time as they occur in a NamedCache </li></ul><ul><li>Options exist to optimize events by using Filters, (including pre and post condition checking) and reducing on-the-wire payload (Lite Events) </li></ul><ul><li>Several MapListener s are provided out-of-the-box. </li></ul><ul><ul><li>Abstract, Multiplexing... </li></ul></ul>(c) Copyright 2007. Oracle Corporation NamedCache nc = CacheFactory.getCache(“stocks”); nc.addMapListener(new MapListener() { public void onInsert(MapEvent mapEvent) { } public void onUpdate(MapEvent mapEvent) { } public void onDelete(MapEvent mapEvent) { } });
    25. 25. Querying Caches QueryMap <ul><li>Query NamedCache keys and entries across a cluster (Data Grid) in parallel * using Filters </li></ul><ul><li>Results may be ordered using natural ordering or custom comparators </li></ul><ul><li>Filters provide support almost all SQL constructs </li></ul><ul><li>Query using non-relational data representations and models </li></ul><ul><li>Create your own Filters </li></ul><ul><li>* Requires Enterprise Edition or above </li></ul>(c) Copyright 2007. Oracle Corporation NamedCache nc = CacheFactory.getCache(“people”); Set keys = nc.keySet( new LikeFilter(“getLastName”, “%Stone%”)); Set entries = nc.entrySet( new EqualsFilter(“getAge”, 35));
    26. 26. Continuous Observation Continuous Query Caches <ul><li>ContinuousQueryCache provides real-time and in-process copy of filtered cached data </li></ul><ul><li>Use standard or your own custom Filters to limit view </li></ul><ul><li>Access to “view”of cached information is instant </li></ul><ul><li>May use with MapListeners to support rendering real-time local views (aka: Think Client) of Data Grid information. </li></ul>(c) Copyright 2007. Oracle Corporation NamedCache nc = CacheFactory.getCache(“stocks”); NamedCache expensiveItems = new ContinuousQueryCache(nc, new GreaterThan(“getPrice”, 1000));
    27. 27. Aggregating Information InvocableMap <ul><li>Aggregate values in a NamedCache across a cluster (Data Grid) in parallel * using Filters </li></ul><ul><li>Aggregation constructs include; Distinct, Sum, Min, Max, Average, Having, Group By </li></ul><ul><li>Aggregate using non-relational data models </li></ul><ul><li>Create your own aggregators </li></ul><ul><li>* Requires Enterprise Edition or above </li></ul>(c) Copyright 2007. Oracle Corporation NamedCache nc = CacheFactory.getCache(“stocks”); Double total = (Double)nc.aggregate( AlwaysFilter.INSTANCE, new DoubleSum(“getQuantity”)); Set symbols = (Set)nc.aggregate( new EqualsFilter(“getOwner”, “Larry”), new DistinctValue(“getSymbol”));
    28. 28. Mutating Information InvocableMap <ul><li>Invoke EntryProcessors on zero or more entries in a NamedCache across a cluster (Data Grid) in parallel * (using Filters) to perform operations </li></ul><ul><li>Execution occurs where the entries are managed in the cluster, not in the thread calling invoke </li></ul><ul><li>This permits Data + Processing Affinity </li></ul><ul><li>* Requires Enterprise Edition or above </li></ul>(c) Copyright 2007. Oracle Corporation NamedCache nc = CacheFactory.getCache(“stocks”); nc.invokeAll( new EqualsFilter(“getSymbol”, “ORCL”), new StockSplitProcessor()); ... class StockSplitProcessor extends AbstractProcessor { Object process(Entry entry) { Stock stock = (Stock)entry.getValue(); stock.quantity *= 2; entry.setValue(stock); return null; } }
    29. 29. Customer Examples
    30. 30. Amir Razmara Director, The Gap <ul><li>Problem: </li></ul><ul><ul><li>Universal user profile, preferences, shopping cart and single sign-on across 4 brands </li></ul></ul><ul><ul><li>Shared sessions across all 4 brands leading to a need for a global session (each brand has its own cluster of servers) </li></ul></ul><ul><li>Possible Solutions: </li></ul><ul><ul><li>State repository with DB </li></ul></ul><ul><ul><li>Data Grid backed session management with Coherence*Web </li></ul></ul><ul><li>Coherence*Web Solution: </li></ul><ul><ul><li>Create a “Global Cache Cloud” to maintain a brand agnostic session caching layer, enabling the global session </li></ul></ul><ul><ul><li>Any server from any brand is able to obtain a session from the global cache cloud, enabling SSO and shared bag </li></ul></ul><ul><ul><li>Session durability, immune from crashes in the application tier </li></ul></ul><ul><ul><li>Sessions maintained during the nightly cell switch for publishing content </li></ul></ul>*GAP - OOW2008 S299392 - Beyond Performance: Pushing Transaction
    31. 31. The Universal Experience: Sister Tabs *GAP - OOW2008 S299392 - Beyond Performance: Pushing Transaction…
    32. 32. Customer Examples <ul><li>Telecommunications </li></ul><ul><ul><li>Major Communications provider </li></ul></ul><ul><ul><ul><li>Home Subscriber Server (HSS) part of IMS platform </li></ul></ul></ul><ul><ul><ul><li>“ Enterprise Data Grid” – Unified Data Access layer across the enterprise </li></ul></ul></ul><ul><ul><ul><li>Active-Active data center replication across WAN </li></ul></ul></ul><ul><ul><li>Major Communications Provider </li></ul></ul><ul><ul><ul><li>“ Click-to-Chat” application – web chat between customers and CSRs </li></ul></ul></ul><ul><li>Major Financial Services Provider </li></ul><ul><ul><li>User Session data in Coherence, access from Java and C++ </li></ul></ul><ul><ul><li>User Session data replicated across WAN to alternate data center </li></ul></ul><ul><ul><li>Mainframe MIPS cost mitigation </li></ul></ul><ul><ul><li>Mid-tier caching to aid migration off MS SQL Server </li></ul></ul>
    33. 33. Oracle Coherence Advantage <ul><li>Protect the Database Investment </li></ul><ul><ul><li>Ensure DB does what it does best, limit cost of rearchitecture </li></ul></ul><ul><li>Scale as you Grow </li></ul><ul><ul><li>Cost effective: Start small with 3-5 servers, scale to hundreds of servers as business grows </li></ul></ul><ul><li>Enables business continuity </li></ul><ul><ul><li>Providing continuous data availability </li></ul></ul>
    34. 34. A Q & Q U E S T I O N S A N S W E R S