An engineer's introduction to in-memory data grid development
Upcoming SlideShare
Loading in...5
×
 

An engineer's introduction to in-memory data grid development

on

  • 2,415 views

 

Statistics

Views

Total Views
2,415
Views on SlideShare
1,910
Embed Views
505

Actions

Likes
0
Downloads
105
Comments
0

6 Embeds 505

http://blogs.oracle.com 490
http://www.javaoracleblog.com 8
http://feeds.feedburner.com 3
http://www.hanrss.com 2
http://flavors.me 1
http://prsync.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

An engineer's introduction to in-memory data grid development An engineer's introduction to in-memory data grid development Presentation Transcript

  • <Insert Picture Here>An engineer’s introduction to in-memory data grid developmentAndrei Cristian NiculaeTechnology Presales Consultant
  • Agenda• What Is Coherence – Distributed Data Grid• How Does It Work?• Use Cases• Conclusion• Q&A
  • <Insert Picture Here>Distributed Data Grid
  • <Insert Picture Here> Cameron Purdy VP of Development, Oracle“A Distributed Data Grid is a system composed of multiple servers that work together to manage information and related operations - such as computations - in a distributed environment.”
  • Scalability Chasm • Data Demand outpacing Data Supply• Rate of growth outpacing ability to cost effectively scale applications Ever Expanding Universe of Users Web Servers Application Servers Data Demand Data Supply
  • Performance Problem Database TablesApplicationObject Relational Java SQL server • VolumeA Performance Bottleneck • Complexity • Frequency of Data Access
  • Oracle Coherence as Data Broker • Oracle Coherence brokers Data Supply with Data Demand • Scale out Data Grid in middle tier using commodity hardware Ever Expanding Universe of Users Web Servers Application Servers Data Demand ObjectsData Supply Data Sources
  • Coherence In-Memory Data Grid Linear scaling, extreme performance, unparalleled reliability• Memory spans multiple machines• Add/remove nodes dynamically Coherence Coherence• Scale linearly to thousands• Reliability through redundancy• Performance through parallelization• Integration through shared memory grid
  • <Insert Picture Here> How does it work
  • Oracle Coherence: A Unique Approach • Dynamically scale-out during operation • Data automatically load-balanced to new servers in the cluster • No repartitioning required • No reconfiguration required • No interruption to service during scale-out • Scale capacity and processing on- the-fly
  • Oracle Coherence: How it Works • Data is automatically partitioned and load-balanced across the Server Cluster • Data is synchronously replicated for continuous availability • Servers monitor the health of each other • When in doubt, servers work together to diagnose status• Healthy servers assume responsibility for failed server (in parallel)• Continuous Operation: No interruption toservice or data loss due to a server failure
  • Oracle Coherence: Summary • Peer-to-Peer Clustering and Data Management Technology • No Single Points of Failure • No Single Points of Bottleneck • No Masters / Slaves / Registries etc • All members have responsibility to; • Manage Cluster Health & Data • Perform Processing and Queries • Work as a “team” in parallel • Communication is point-to-point (not TCP/IP) and/or one-to-many • Scale to limit of the back-plane • Use with commodity infrastructure • Linearly Scalable By Design
  • Coherence Topology Example Distributed Topology • Data spread and backed up across Members • Transparent to developer • Members have access to all Data • All Data locations are known – no lookup & no registry!15
  • Coherence Update Example Distributed Topology • Synchronous Update • Avoids potential Data Loss & Corruption • Predictable Performance • Backup Partitions are partitioned away from Primaries for resilience • No engineering requirement to setup Primaries or Backups • Automatically and Dynamically Managed16
  • Coherence Recovery Example Distributed Topology • No in-flight operations lost • Some latencies (due to higher priority of recovery) • Reliability, Availability, Scalability, Performance are the priority • Degrade performance of some requests17
  • CoherenceDistributed data management for applications• Development Library – Pure Java 1.4.2+ – Pure .Net 1.1 and 2.0 (client) – C++ client (3.4) – No Third-Party Dependencies – No Open Source Dependencies – Proprietary Network Stack (Peer-To-Peer model)• Other Libraries Support… – Database and File System Integration – TopLink and Hibernate – Http Session Management – WebLogic Portal Caches – Spring, Groovy
  • The Portable Object Format Advanced Serialization • Simple Serialization Comparison – In XML • <date format=“java.util.Date”>2008-07-03</date> • 47 characters (possibly 94 bytes depending on encoding) – In Java (as a raw long) • 64 bits = 8 bytes – In Java (java.util.Date using ObjectOutputStream) • 46 bytes – In ExternalizableLite (as a raw long) • 8 bytes – In POF (Portable Object Format) • 4F 58 1F 70 6C = 5 bytes(c) Copyright 2008. Oracle Corporation
  • Coherence Cache Types / Strategies Near Cache backed by Replicated Optimistic Partitioned partitioned LocalCache not Cache Cache Cache cache clustered Topology Replicated Replicated Partitioned Cache Local Caches + Local Cache Partitioned Cache Read Performance Instant Instant Locally cached: Locally cached: Instant instant --Remote: instant --Remote: network speed network speed Fault Tolerance Extremely High Extremely High Configurable Zero to Configurable 4 Zero Zero Extremely High to Extremely High Write Performance Fast Fast Extremely fast Extremely fast Instant Memory Usage DataSize DataSize DataSize/JVMs x LocalCache + DataSize (Per JVM) Redundancy [DataSize / JVMs] Coherency fully coherent fully coherent fully coherent fully coherent n/a Memory Usage JVMs x DataSize JVMs x DataSize Redundancy x [Redundancy x n/a (Total) DataSize DataSize] + [JVMs x LocalCache] Locking fully transactional none fully transactional fully transactional fully transactional Typical Uses Metadata n/a (see Near Read-write caches Read-heavy caches Local data Cache) w/ access affinity©2011 Oracle Corporation 23
  • <Insert Picture Here>Data Management Options
  • Data Management Partitioned Caching• Extreme Scalability: Automatically, dynamically and transparently partitions the data set across the members of the grid.• Pros: – Linear scalability of data capacity – Processing power scales with data capacity. – Fixed cost per data access• Cons: – Cost Per Access: High percentage chance that each data access will go across the wire.• Primary Use: • Large in-memory storage environments • Parallel processing environments
  • Data Management Partitioned Fault Tolerance• Automatically, dynamically and transparently manages the fault tolerance of your data.• Backups are guaranteed to be on a separate physical machine as the primary.• Backup responsibilities for one node’s data is shared amongst the other nodes in the grid.
  • Data Management Cache Client/Cache Server• Partitioning can be controlled on a member by member basis.• A member is either responsible for an equal partition of the data or not (“storage enabled” vs. “storage disabled”)• Cache Client – typically the application instances• Cache Servers – typically stand-alone JVMs responsible for storage and data processing only.
  • Data Management Near Caching• Extreme Scalability and Extreme Performance: The best of both worlds between the Replicated and Partitioned topologies. Most recently/frequently used data is stored locally.• Pros: – All of the same Pros as the Partitioned topology plus… – High percentage chance data is local to request.• Cons: – Cost Per Update: There is a cost associated with each update to a piece of data that is stored locally on other nodes.• Primary Use: – Large in-memory storage environments with likelihood of repetitive data access.
  • <Insert Picture Here>Caching Patternsand Terminology
  • Cache Aside• Cache Aside pattern means the developer manages the contents of the cache manually – Check the cache before reading from data source – Put data into cache after reading from data source – Evict or update cache when updating data source• Cache Aside can be written as a core part of the application logic, or it can be a cross cutting concern (using AOP) if the data access pattern is generic
  • Read Through / Write Through• Read Through – All data reads occur through cache – If there is a cache miss, the cache will load the data from the data source automatically• Write Through – All data writes occur through cache – Updates to the cache are written synchronously to the data source
  • Write Behind• All data writes occur through cache – Just like Write Through• Updates to the cache are written asynchronously to the data source
  • <Insert Picture Here>How it Works
  • CacheStore• CacheLoader and CacheStore (normally referred to generically as a CacheStore) are generic interfaces• There are no restrictions to the type of data store that can be used – Databases – Web services – Mainframes – File systems – Remote clusters – …even another Coherence cluster!
  • Read Through• Every key in a Coherence cache is assigned (hashed) to a specific cache server• If a get results in a cache miss, the cache entry will be loaded from the data store via the cache server through the provided CacheLoader implementation• Multiple simultaneous requests for the same key will result in a single load request to the data store – This is a key advantage over Cache Aside
  • Read Through
  • Read Through
  • Read Through
  • Read Through
  • Read Through
  • Read Through
  • Read Through
  • Read Through
  • Refresh Ahead• Coherence can use a CacheLoader to fetch data asynchronously from the data store before expiry• This means that no client threads are blocked while the data is refreshed from the data store• This mechanism is triggered by a cache get - items that are not accessed will be allowed to expire, thus requiring a read through for the next access
  • Refresh Ahead
  • Refresh Ahead
  • Refresh Ahead
  • Refresh Ahead
  • Refresh Ahead
  • Refresh Ahead
  • Refresh Ahead
  • Refresh Ahead
  • Write Through• Updates to the cache will be synchronously written to the data store via the cache server responsible for the given key through the provided CacheStore• Advantages – Write to cache can be rolled back upon data store error• Disadvantages – Write performance can only scale as far as the database will allow
  • Write Through
  • Write Through
  • Write Through
  • Write Through
  • Write Through
  • Write Through
  • Write Through
  • Write Behind• Updates to the cache will be asynchronously written to the data store via the cache server responsible for the given key through the provided CacheStore
  • Write Behind Factors• Advantages – Application response time and scalability decoupled from data store, ideal for write heavy applications – Multiple updates to the same cache entry are coalesced, reducing the load on the data store – Application can continue to function upon data store failure• Disadvantages – In the event of multiple simultaneous server failures, some data in the write behind queue may be lost – Loss of a single physical server will result in no data loss
  • Write Behind
  • Write Behind
  • Write Behind
  • Write Behind
  • Write Behind
  • Write Behind
  • Write Behind
  • Write Behind
  • <Insert Picture Here>Data Processing Options
  • Data ProcessingParallel Query• Programmatic query mechanism• Queries performed in parallel across the grid• Standard indexes provided out-of-the-box and supports implementing your own custom indexes
  • Data ProcessingParallel Query// get the “myTrades” cacheNamedCache cacheTrades = CacheFactory.getCache(“myTrades”);// create the “query”Filter filter = new AndFilter(new EqualsFilter("getTrader", traderid), new EqualsFilter("getStatus", Status.OPEN));// perform the parallel querySet setOpenTrades = cacheTrades.entrySet(filter);
  • Data ProcessingParallel Query
  • Data ProcessingParallel Query
  • Data ProcessingParallel Query
  • Data ProcessingParallel Query
  • Data ProcessingContinuous Query Cache• Automatically, transparently and dynamically maintains a view locally based on a specific criteria (i.e. Filter)• Same API as all other Coherence caches• Support local listeners.• Supports layered views
  • Data ProcessingContinuous Query Cache// get the “myTrades” cacheNamedCache cacheTrades = CacheFactory.getCache(“myTrades”);// create the “query”Filter filter = new AndFilter(new EqualsFilter("getTrader", traderid), new EqualsFilter("getStatus", Status.OPEN));// create the continuous query cacheNamedCache cqcOpenTrades = new ContinuousQueryCache(cacheTrades, filter);
  • Data ProcessingContinuous Query Cache
  • Data ProcessingContinuous Query Cache
  • Data ProcessingContinuous Query Cache
  • Data ProcessingContinuous Query Cache
  • Data ProcessingInvocable Map• The inverse of caching• Sends the processing (e.g. EntryProcessors) to where the data is in the grid• Standard EntryProcessors provided Out-of-the-box• Once and only once guarantees• Processing is automatically fault-tolerant• Processing can be: • Targeted to a specific key • Targeted to a collection of keys • Targeted to any object that matches a specific criteria (i.e. Filter)
  • Data ProcessingInvocable Map// get the “myTrades” cacheNamedCache cacheTrades = CacheFactory.getCache(“myTrades”);// create the “query”Filter filter = new AndFilter(new EqualsFilter("getTrader", traderid), new EqualsFilter("getStatus", Status.OPEN));// perform the parallel processingcacheTrades.invokeAll(filter, new CloseTradeProcessor());
  • Data ProcessingInvocable Map
  • Data ProcessingInvocable Map
  • Data ProcessingInvocable Map
  • Data ProcessingInvocable Map
  • <Insert Picture Here> Code Examples
  • Clustering Java Processes • Joins an existing cluster Cluster cluster = CacheFactory.ensureCluster(); or forms a new cluster – Time “to join” configurable • cluster contains information about the Cluster – Cluster Name – Members – Locations – Processes • No “master” servers • No “server registries”(c) Copyright 2007. Oracle Corporation
  • Leaving a Cluster • Leaves the current CacheFactory.shutdown(); cluster • shutdown blocks until “data” is safe • Failing to call shutdown results in Coherence having to detect process death/exit and recover information from another process. • Death detection and recovery is automatic(c) Copyright 2007. Oracle Corporation
  • Using a Cache get, put, size & remove• CacheFactory NamedCache nc = CacheFactory.getCache(“mine”); resolves cache names (ie: “mine”) to configured Object previous = nc.put(“key”, “hello world”); NamedCaches Object current = nc.get(“key”);• NamedCache provides data topology agnostic int size = nc.size(); access to information Object value = nc.remove(“key”);• NamedCache interfaces implement several interfaces; – java.util.Map, Jcache, ObservableMap*, ConcurrentMap*, Coherence* Extensions QueryMap*, InvocableMap*
  • Using a Cache keySet, entrySet, containsKey• Using a NamedCache is NamedCache nc = CacheFactory.getCache(“mine”); like using a java.util.Map Set keys = nc.keySet();• What is the difference between a Map and a Set entries = nc.entrySet(); Cache data-structure? – Both use (key,value) pairs boolean exists = nc.containsKey(“key”); for entries – Map entries don’t expire – Cache entries may expire – Maps are typically limited by heap space – Caches are typically size limited (by number of entries or memory) – Map content is typically in- process (on heap)
  • Observing Cache Changes ObservableMap• Observe changes in NamedCache nc = CacheFactory.getCache(“stocks”); real-time as they occur in a NamedCache nc.addMapListener(new MapListener() { public void onInsert(MapEvent mapEvent) { }• Options exist to optimize events by using Filters, public void onUpdate(MapEvent mapEvent) { (including pre and post } condition checking) and reducing on-the-wire public void onDelete(MapEvent mapEvent) { payload (Lite Events) } });• Several MapListeners are provided out-of-the- box. – Abstract, Multiplexing...
  • Querying Caches QueryMap• Query NamedCache keys NamedCache nc = CacheFactory.getCache(“people”); and entries across a cluster (Data Grid) in parallel* Set keys = nc.keySet( using Filters new LikeFilter(“getLastName”, “%Stone%”));• Results may be ordered Set entries = nc.entrySet( using natural ordering or new EqualsFilter(“getAge”, custom comparators 35));• Filters provide support almost all SQL constructs• Query using non-relational data representations and models• Create your own Filters* Requires Enterprise Edition or above
  • Continuous Observation Continuous Query Caches• ContinuousQueryCache NamedCache nc = CacheFactory.getCache(“stocks”); provides real-time and in- process copy of filtered NamedCache expensiveItems = cached data new ContinuousQueryCache(nc, new GreaterThan(“getPrice”, 1000));• Use standard or your own custom Filters to limit view• Access to “view”of cached information is instant• May use with MapListeners to support rendering real- time local views (aka: Think Client) of Data Grid information.
  • Aggregating Information InvocableMap• Aggregate values in a NamedCache nc = CacheFactory.getCache(“stocks”); NamedCache across a cluster (Data Grid) in Double total = (Double)nc.aggregate( parallel* using Filters AlwaysFilter.INSTANCE, new DoubleSum(“getQuantity”));• Aggregation constructs Set symbols = (Set)nc.aggregate( include; Distinct, Sum, new EqualsFilter(“getOwner”, “Larry”), Min, Max, Average, new DistinctValue(“getSymbol”)); Having, Group By• Aggregate using non- relational data models• Create your own aggregators* Requires Enterprise Edition or above
  • Mutating Information InvocableMap• Invoke EntryProcessors NamedCache nc = CacheFactory.getCache(“stocks”); on zero or more entries in a NamedCache across a nc.invokeAll( cluster (Data Grid) in new EqualsFilter(“getSymbol”, “ORCL”), parallel* (using Filters) to new StockSplitProcessor()); perform operations ...• Execution occurs where the entries are managed in the cluster, not in the thread class StockSplitProcessor extends AbstractProcessor { calling invoke Object process(Entry entry) { Stock stock = (Stock)entry.getValue();• This permits Data + stock.quantity *= 2; Processing Affinity entry.setValue(stock); return null;* Requires Enterprise Edition or above } }
  • <Insert Picture Here> Conclusion
  • Data Grid Uses Caching Applications request data from the Data Grid rather than backend data sources Analytics Applications ask the Data Grid questions from simple queries to advanced scenario modeling Transactions Data Grid acts as a transactional System of Record, hosting data and business logic Events Automated processing based on event
  • Oracle Coherence - Conclusion• Protect the Database Investment – Ensure DB does what it does best, limit cost of rearchitecture• Scale as you Grow – Cost effective: Start small with 3-5 servers, scale to hundreds of servers as business grows• Enables business continuity – Providing continuous data availability
  • Q U E S T I O N S A N S W E R S