Overview of the Ehcache

       2011.12.02
        chois79
Contents
•   About Caches
•   Why caching works
•   Will an Application Benefit from Caching?
•   How much will an application speed up?
•   About Ehcache
•   Features of Ehcache
•   Key Concepts of Ehcache
•   Using Ehcache
•   Distributed Ehcache Architecture
•   References
About Caches
• In Wiktionary
   – A store of things that will be required in future and can be
     retrieved rapidly

• In computer science
   – A collection of temporary data which either duplicates data
     located elsewhere of is the result of a computation

   – The data can be repeatedly accessed inexpensively
Why caching works
•   Locality of Reference
     – Data that is near other data or has just been used is more likely to be used
        again

•   The Long Tail


                                          A small number of items may make up the
                                          bulk of sales.          – Chris Anderson



     – One form of a Power Law distribution is the Pareto distribution (80:20 rule)
     – IF 20% of objects are used 80% of the time and a way can be found to
        reduce the cost of obtaining that 20%, then system performance will improve
Will an Application Benefit from Caching?
          CPU bound Application
• The time taken principally depends on the speed of the CPU
  and main memory
• Speeding up
   – Improving algorithm performance
   – Parallelizing the computations across multiple CPUs or multiple
     machines
   – Upgrading the CPU speed

• The role of caching
   – Temporarily store computations that may be reused again
       • Ex) DB Cache, Large web pages that have a high rendering cost.
Will an Application Benefit from Caching?
          I/O bound Application
• The time taken to complete a computation depends principally
  on the rate at which data can be obtained
• Speeding up
   – Hard disks are speeding up by using their own caching of blocks into
     memory
       • There is no Moore’s law for hard disk.

   – Increase the network bandwidth

• The role of cache
   – Web page caching, for pages generated from databases
   – Data Access object caching
Will an Application Benefit from Caching?
      Increased Application Scalability

• Data bases can do 100 expensive queries per second
  – Caching may be able to reduce the workload required
How much will an application speed up?
           (Amdahl’s Law)

• Depend on a multitude of factors
  – How many times a cached piece of data can and is
    reduced by the application

  – The proportion of the response time that is alleviated by
    caching

• Amdahl’s Law
                 P: Proportion speed up
                 S: Speed up
Amdahl’s Law Example
(Speed up from a Database Level Cache)
Un-cached page time: 2 seconds
Database time: 1.5 seconds
Cache retrieval time: 2ms
Proportion: 75% (2/1.5)
The expected system speedup is thus:
     1 / (( 1 – 0.75) + 0.75 / (1500/2))
    = 1 / (0.25 + 0.75/750)
    = 3.98 times system speedup
About Ehcache
•   Open source, standards-based cache used to boost performance

•   Basically, based on in-process
•   Scale from in-process with one more nodes through to a mixed in-
    process/out-of-process configuration with terabyte-sized caches
•   For applications needing a coherent distributed cache, Ehcache uses
    the open source Terracotta Server Array

•   Java-based Cache, Available under an Apache 2 license

•   The Wikimedia Foundation use Ehcache to improve the performance
    of its wiki projects
Features of Ehcache(1/2)
•   Fast and Light Weight
     – Fast, Simple API
     – Small foot print: Ehcache 2.2.3 is 668 kb making it convenient to package
     – Minimal dependencies: only dependency on SLF4J

•   Scalable
     – Provides Memory and Disk store for scalability into gigabytes
     – Scalable to hundreds of nodes with the Terracotta Server Array

•   Flexible
     – Supports Object or Serializable caching
     – Provides LRU, LFU and FIFO cache eviction policies
     – Provides Memory and Disk stores
Features of Ehcache(2/2)
•   Standards Based
     –   Full implementation of JSR107 JCACHE API

•   Application Persistence
     –   Persistent disk store which stores data between VM restarts

•   JMX Enable
•   Distributed Caching
     –   Clustered caching via Terracotta
     –   Replicated caching via RMI, JGroups, or JMS

•   Cache Server
     –   RESTful, SOAP cache Server

•   Search
     –   Standalone and distributed search using a fluent query language
Key Concepts of Ehcache
                     Key Classes
•   CacheManager
    – Manages caches

•   Ehcache
    – All caches implement the Ehcache interface
    – A cache has a name and attributes
    – Cache elements are stored in the memory store, optionally the also overflow
       to a disk store

•   Element
    – An atomic entry in a cache
    – Has key and value
    – Put into and removed from caches
Key Concepts of Ehcache
            Usage patterns: Cache-aside
•   Application code use the cache directly
•   Order
     – Application code consult the cache first
     – If cache contains the data, then return the data directly
     – Otherwise, the application cod must fetch the data from the system-of-record,
        store the data in the cache, then return.




     – 0
Key Concepts of Ehcache
           Usage patterns: Read-through
•   Mimics the structure of the cache-aside patterns when reading data
•   The difference
     – Must implement the CacheEntryFactory interface to instruct the cache how to
       read objects on a cache miss
     – Must wrap the Ehcache instance with an instance of SelfPopulationCache




     – 4
Key Concepts of Ehcache
    Usage patterns: Write-through and behind
•   Mimics the structure of the cache-aside pattern when data write
•   The difference
     –   Must implement the CacheWriter interface and configure the cache for write-through or write
         behind
     –   A write-through cache writes data to the system-of-record in the same thread of execution
     –   A write-behind queues the data for write at a later time




     –   d
Key Concepts of Ehcache
         Usage patterns: Cache-as-sor
• Delegate SOR reading and writing actives to the cache
• To implement, use a combination of the following patterns
   – Read-through
   – Write-through or write-behind

• Advantages
   – Less cluttered application code
   – Easily choose between write-through or write-behind strategies
   – Allow the cache to solve the “thundering-herd” problem

• Disadvantages
   – Less directly visible code-path
Key Concepts of Ehcache
         Storage Options: Memory Store
•   Suitable Element Types
     – All Elements are suitable for placement in the Memory Store

•   Characteristics
     – Thread safe for use by multiple concurrent threads
     – Backed By LinkedHashMap (Jdk 1.4 later)
          •   LinkedHashMap: Hash table and linked list implementation of the Map interface

     – Fast

•   Memory Use, Spooling and Expiry Strategy
     – Least Recently Used (LRU): default
     – Least frequently Used (LFU)
     – First In First Out (FIFO)
Key Concepts of Ehcache
    Storage Options: Big-Memory Store
•   Pure java product from Terracotta that permits caches to use an additional type of
    memory store outside the object heap. (Packaged for use in Enterprise Ehcache)
     –   Not subject to Java GC
     –   100 times faster than Disk-Store
     –   Allows very large caches to be created(tested up to 350GB)

•   Two implementations
     –   Only Serializable cache keys and values can be placed similar to Disk Store
     –   Serializaion and deserialization take place putting and getting from the store
           •   Around 10 times slower than Memory Store
           •   The memory store holds the hottest subset of data from the off-heap store, already in deserialized form

•   Suitable Element Types
     –   Only Elements which are serializable can be placed in the off-heap
     –   Any non serializable Elements will be removed and WARNING level log message emitted
Key Concepts of Ehcache
               Storage Options: Disk Store
• Disk Store are optional
• Suitable Element Type
     – Only Elements which are serializable can be placed in the off-heap
     – Any non serializable Elements will be removed and WARNING level
         log message emitted

• Eviction
     –   The LFU algorithm is used and it is not configurable or changeable

•   Persistence
     –   Controlled by the disk persistent configuration
     –   If false or onmitted, disk store will not presit between CacheManager restarts
Key Concepts of Ehcache
                      Replicated Caching
•   Ehcache has a pluggable cache replication scheme
     –   RMI, JGroups, JMS

•   Using a Cache Server
     –   To achieve shared data, all JVMs read to and write from a Cache Server

•   Notification Strategies
     –   If the Element is not available anywhere else then the element it self shoud from the pay load
         of the notification




     –   D
Key Concepts of Ehcache
                       Search APIs
•   Allows you to execute arbitrarily complex queries either a standalone
    cache or a Terracotta clustered cache with pre-built indexes
•   Searchable attributes may be extracted from both key and vales
•   Attribute Extractors
     – Attributes are extracted from keys or values
     – This is done during search or, if using Distributed Ehcache on put() into the
        cache using AttributeExtractors
     – Supported types
         •   Boolean, Byte, Character, Double, Float, Integer, Long, Short, String, Enum, java.util.Date,
             Java.sql.Date
Using Ehcache
           General-Purpose Caching
• Local Cache
• Configuration
   – Place the Ehcache jar into your class-path
   – Configure ehcache.xml and place it in your class-path
   – Optionally, configure an appropriate logging level

                                                  DB

                        Local
         Application                           Web
                       Ehcache
                                              Server

   – d                                         Web
                                              Server
Using Ehcache
                     Cache Server
• Support for RESTful and SOAP APIs
• Redundant, Scalable with client hash-based routing
   – The client can be implemented in any language
   – The client must work out a partitioning scheme




   – s
Using Ehcache
    Integrate with other solutions
• Hivernate

• Java EE Servlet Caching

• JCache style caching

• Spring, cocoon, Acegi and other frameworks
Distributed Ehcache Architecture
                   (Logical View)
•   Distributed Ehcache combines an in-process Ehcache with the Terracotta Server Array




•   The data is split between an Ehcache node(L1) and the Terracotta Server Array(L2)
     –   The L1 can hold as much data as is comfortable

     –   The L2 always a complete copy of all cache data

     –   The L1 acts as a hot-set of recently used data
Distributed Ehcache Architecture
             (Ehcache topologies)
•   Standalone
     – The cache data set is held in the application node
     – Any other application nodes are independent with no communication
        between them

•   Distributed Ehcache
     – The data is held in a Terracotta server Array with a subset of recently used
        data held in each application cache node

•   Replicated
     – The cached data set is held in each application node and data is copied or
        invalidated across the cluster without locking
     – Replication can be either asynchronous or synchronous
     – The only consistency mode available is weak consistency
Distributed Ehcache Architecture
                (Network View)
•   From a network topology point of view Distributed Ehcache consist of
     – Ehcache node(L1)
          •   The Ehcache library is present in each app
          •   An Ehcache instance, running in-process sits in each JVM

     – Terracotta Server Array(L2)
          •   Each Ehcache instance maintains a connection with one or more Terracotta Servers
          •   Consistent hashing is used by the Ehcache nodes to store and retrieve cache data




          •   4
Distributed Ehcache Architecture
          (Memory Hierarchy View)
•   Each in-process Ehcache instance
     – Heap memory
     – Off-heap memory(Big Memory)

•   The Terracotta Server Arrays
     – Heap memory
     – Off-heap memory
     – Disk storage.
           •   This is optional.(Persistence)




     – 1
Ehcache in-process compared with
           Memcached
Reference
• Ehcache User Guide
   – http://ehcache.org/documentation

• Ehcache Architecture, Features And Usage patterns
   – Greg Luck, 2009 JavaOne Session 2007

Overview of the ehcache

  • 1.
    Overview of theEhcache 2011.12.02 chois79
  • 2.
    Contents • About Caches • Why caching works • Will an Application Benefit from Caching? • How much will an application speed up? • About Ehcache • Features of Ehcache • Key Concepts of Ehcache • Using Ehcache • Distributed Ehcache Architecture • References
  • 3.
    About Caches • InWiktionary – A store of things that will be required in future and can be retrieved rapidly • In computer science – A collection of temporary data which either duplicates data located elsewhere of is the result of a computation – The data can be repeatedly accessed inexpensively
  • 4.
    Why caching works • Locality of Reference – Data that is near other data or has just been used is more likely to be used again • The Long Tail A small number of items may make up the bulk of sales. – Chris Anderson – One form of a Power Law distribution is the Pareto distribution (80:20 rule) – IF 20% of objects are used 80% of the time and a way can be found to reduce the cost of obtaining that 20%, then system performance will improve
  • 5.
    Will an ApplicationBenefit from Caching? CPU bound Application • The time taken principally depends on the speed of the CPU and main memory • Speeding up – Improving algorithm performance – Parallelizing the computations across multiple CPUs or multiple machines – Upgrading the CPU speed • The role of caching – Temporarily store computations that may be reused again • Ex) DB Cache, Large web pages that have a high rendering cost.
  • 6.
    Will an ApplicationBenefit from Caching? I/O bound Application • The time taken to complete a computation depends principally on the rate at which data can be obtained • Speeding up – Hard disks are speeding up by using their own caching of blocks into memory • There is no Moore’s law for hard disk. – Increase the network bandwidth • The role of cache – Web page caching, for pages generated from databases – Data Access object caching
  • 7.
    Will an ApplicationBenefit from Caching? Increased Application Scalability • Data bases can do 100 expensive queries per second – Caching may be able to reduce the workload required
  • 8.
    How much willan application speed up? (Amdahl’s Law) • Depend on a multitude of factors – How many times a cached piece of data can and is reduced by the application – The proportion of the response time that is alleviated by caching • Amdahl’s Law P: Proportion speed up S: Speed up
  • 9.
    Amdahl’s Law Example (Speedup from a Database Level Cache) Un-cached page time: 2 seconds Database time: 1.5 seconds Cache retrieval time: 2ms Proportion: 75% (2/1.5) The expected system speedup is thus: 1 / (( 1 – 0.75) + 0.75 / (1500/2)) = 1 / (0.25 + 0.75/750) = 3.98 times system speedup
  • 10.
    About Ehcache • Open source, standards-based cache used to boost performance • Basically, based on in-process • Scale from in-process with one more nodes through to a mixed in- process/out-of-process configuration with terabyte-sized caches • For applications needing a coherent distributed cache, Ehcache uses the open source Terracotta Server Array • Java-based Cache, Available under an Apache 2 license • The Wikimedia Foundation use Ehcache to improve the performance of its wiki projects
  • 11.
    Features of Ehcache(1/2) • Fast and Light Weight – Fast, Simple API – Small foot print: Ehcache 2.2.3 is 668 kb making it convenient to package – Minimal dependencies: only dependency on SLF4J • Scalable – Provides Memory and Disk store for scalability into gigabytes – Scalable to hundreds of nodes with the Terracotta Server Array • Flexible – Supports Object or Serializable caching – Provides LRU, LFU and FIFO cache eviction policies – Provides Memory and Disk stores
  • 12.
    Features of Ehcache(2/2) • Standards Based – Full implementation of JSR107 JCACHE API • Application Persistence – Persistent disk store which stores data between VM restarts • JMX Enable • Distributed Caching – Clustered caching via Terracotta – Replicated caching via RMI, JGroups, or JMS • Cache Server – RESTful, SOAP cache Server • Search – Standalone and distributed search using a fluent query language
  • 13.
    Key Concepts ofEhcache Key Classes • CacheManager – Manages caches • Ehcache – All caches implement the Ehcache interface – A cache has a name and attributes – Cache elements are stored in the memory store, optionally the also overflow to a disk store • Element – An atomic entry in a cache – Has key and value – Put into and removed from caches
  • 14.
    Key Concepts ofEhcache Usage patterns: Cache-aside • Application code use the cache directly • Order – Application code consult the cache first – If cache contains the data, then return the data directly – Otherwise, the application cod must fetch the data from the system-of-record, store the data in the cache, then return. – 0
  • 15.
    Key Concepts ofEhcache Usage patterns: Read-through • Mimics the structure of the cache-aside patterns when reading data • The difference – Must implement the CacheEntryFactory interface to instruct the cache how to read objects on a cache miss – Must wrap the Ehcache instance with an instance of SelfPopulationCache – 4
  • 16.
    Key Concepts ofEhcache Usage patterns: Write-through and behind • Mimics the structure of the cache-aside pattern when data write • The difference – Must implement the CacheWriter interface and configure the cache for write-through or write behind – A write-through cache writes data to the system-of-record in the same thread of execution – A write-behind queues the data for write at a later time – d
  • 17.
    Key Concepts ofEhcache Usage patterns: Cache-as-sor • Delegate SOR reading and writing actives to the cache • To implement, use a combination of the following patterns – Read-through – Write-through or write-behind • Advantages – Less cluttered application code – Easily choose between write-through or write-behind strategies – Allow the cache to solve the “thundering-herd” problem • Disadvantages – Less directly visible code-path
  • 18.
    Key Concepts ofEhcache Storage Options: Memory Store • Suitable Element Types – All Elements are suitable for placement in the Memory Store • Characteristics – Thread safe for use by multiple concurrent threads – Backed By LinkedHashMap (Jdk 1.4 later) • LinkedHashMap: Hash table and linked list implementation of the Map interface – Fast • Memory Use, Spooling and Expiry Strategy – Least Recently Used (LRU): default – Least frequently Used (LFU) – First In First Out (FIFO)
  • 19.
    Key Concepts ofEhcache Storage Options: Big-Memory Store • Pure java product from Terracotta that permits caches to use an additional type of memory store outside the object heap. (Packaged for use in Enterprise Ehcache) – Not subject to Java GC – 100 times faster than Disk-Store – Allows very large caches to be created(tested up to 350GB) • Two implementations – Only Serializable cache keys and values can be placed similar to Disk Store – Serializaion and deserialization take place putting and getting from the store • Around 10 times slower than Memory Store • The memory store holds the hottest subset of data from the off-heap store, already in deserialized form • Suitable Element Types – Only Elements which are serializable can be placed in the off-heap – Any non serializable Elements will be removed and WARNING level log message emitted
  • 20.
    Key Concepts ofEhcache Storage Options: Disk Store • Disk Store are optional • Suitable Element Type – Only Elements which are serializable can be placed in the off-heap – Any non serializable Elements will be removed and WARNING level log message emitted • Eviction – The LFU algorithm is used and it is not configurable or changeable • Persistence – Controlled by the disk persistent configuration – If false or onmitted, disk store will not presit between CacheManager restarts
  • 21.
    Key Concepts ofEhcache Replicated Caching • Ehcache has a pluggable cache replication scheme – RMI, JGroups, JMS • Using a Cache Server – To achieve shared data, all JVMs read to and write from a Cache Server • Notification Strategies – If the Element is not available anywhere else then the element it self shoud from the pay load of the notification – D
  • 22.
    Key Concepts ofEhcache Search APIs • Allows you to execute arbitrarily complex queries either a standalone cache or a Terracotta clustered cache with pre-built indexes • Searchable attributes may be extracted from both key and vales • Attribute Extractors – Attributes are extracted from keys or values – This is done during search or, if using Distributed Ehcache on put() into the cache using AttributeExtractors – Supported types • Boolean, Byte, Character, Double, Float, Integer, Long, Short, String, Enum, java.util.Date, Java.sql.Date
  • 23.
    Using Ehcache General-Purpose Caching • Local Cache • Configuration – Place the Ehcache jar into your class-path – Configure ehcache.xml and place it in your class-path – Optionally, configure an appropriate logging level DB Local Application Web Ehcache Server – d Web Server
  • 24.
    Using Ehcache Cache Server • Support for RESTful and SOAP APIs • Redundant, Scalable with client hash-based routing – The client can be implemented in any language – The client must work out a partitioning scheme – s
  • 25.
    Using Ehcache Integrate with other solutions • Hivernate • Java EE Servlet Caching • JCache style caching • Spring, cocoon, Acegi and other frameworks
  • 26.
    Distributed Ehcache Architecture (Logical View) • Distributed Ehcache combines an in-process Ehcache with the Terracotta Server Array • The data is split between an Ehcache node(L1) and the Terracotta Server Array(L2) – The L1 can hold as much data as is comfortable – The L2 always a complete copy of all cache data – The L1 acts as a hot-set of recently used data
  • 27.
    Distributed Ehcache Architecture (Ehcache topologies) • Standalone – The cache data set is held in the application node – Any other application nodes are independent with no communication between them • Distributed Ehcache – The data is held in a Terracotta server Array with a subset of recently used data held in each application cache node • Replicated – The cached data set is held in each application node and data is copied or invalidated across the cluster without locking – Replication can be either asynchronous or synchronous – The only consistency mode available is weak consistency
  • 28.
    Distributed Ehcache Architecture (Network View) • From a network topology point of view Distributed Ehcache consist of – Ehcache node(L1) • The Ehcache library is present in each app • An Ehcache instance, running in-process sits in each JVM – Terracotta Server Array(L2) • Each Ehcache instance maintains a connection with one or more Terracotta Servers • Consistent hashing is used by the Ehcache nodes to store and retrieve cache data • 4
  • 29.
    Distributed Ehcache Architecture (Memory Hierarchy View) • Each in-process Ehcache instance – Heap memory – Off-heap memory(Big Memory) • The Terracotta Server Arrays – Heap memory – Off-heap memory – Disk storage. • This is optional.(Persistence) – 1
  • 30.
  • 31.
    Reference • Ehcache UserGuide – http://ehcache.org/documentation • Ehcache Architecture, Features And Usage patterns – Greg Luck, 2009 JavaOne Session 2007