Designing IA for AI - Information Architecture Conference 2024
Choosing the Right Cache Framework
1. Selecting The Right Cache
Framework
BEST CACHE FRAMEWORK FOR YOUR APPLICATION
@MOHAMMED FA ZULUDDIN
2. Topics
• Overview
• Types of Caches
• Caching Algorithms
• Cache Time Based Expiration Models
• Cache Frameworks
• Cache Drawbacks
3. • A cache is an amount of faster memory used to improve data access by
storing portions of a data set the whole of which is slower to access.
• On most computer disk access is very slow relative to the speed of the main
memory; to speed repeated accesses to files or disk blocks, most computers
cache recently accessed data from disk in main memory or some other form
of fast memory.
• Using caching technology across the multi-tier model can help reduce the
number of back-and-forth communications.
• It avoids the expensive re-acquisition of objects by not releasing the objects
immediately after their use and instead, the objects are stored in memory and
reused for any subsequent client requests.
• The cache also allows the higher throughput from the underlying resources.
Overview
5. Types of Caches
• Web Caching (Browser/Proxy/Gateway):
• Browser, Proxy, and Gateway caching work differently but have the same goal
to reduce overall network traffic and latency.
• Browser caching is controlled at the individual user level, where as, proxy and
gateway is on a much larger scale.
• Commonly the cached data could be DNS (Domain Name Server) data, used
to resolve domain names to the IP addresses and mail server records.
• The data type which changes infrequently is best cached for longer periods of
time by the Proxy and/or Gateway servers.
• Browser caching helps users quickly navigate pages they have recently visited.
This caching feature is free to take advantage of and is often overlooked by
most hosting companies and many developers.
7. Types of Caches
• Data Caching:
• Data caching is a very important tool when you have database driven
applications or CMS solutions. It’s best used for frequent calls to data that
does not change rapidly.
• Data caching will help your website or application to load faster and it gives
better experience to the users.
• It will avoiding extra trips to the DB to retrieve data sets even it is not
changed. It stores the data in local memory on the server which is the fastest
way to retrieve information on a web server.
• The database is the bottle neck for almost all web application, so the fewer
DB calls the better. Most DB solutions will also make an attempt to cache
frequently used queries in order to reduce turnaround time. For example, MS
SQL uses Execution Plans for Store Procedures and Queries to speed up the
process time.
9. Types of Caches
• Application/Output Caching:
• Most CMS have built in this cache mechanisms; however, many users don’t
understand them and simply ignore them.
• It’s best to understand what data cache options you have and to implement
them whenever possible.
• Application/Output caching can drastically reduce your website load time and
reduce server overhead.
• Different than Data Caching, which stores raw data sets.
• Application/Output Caching often utilizes server level caching techniques that
cache raw HTML.
• It can be per page of data, parts of a page (headers/footers) or module data,
but it is usually HTML markup.
10. Types of Caches
• Distributed Caching:
• Distributed Caching is for the big applications.
• Most high volume systems like Google, YouTube, Amazon and many others
use this technique.
• This approach allows the web servers to pull and store from distributed
server’s memory. Once implemented, it allow the web server to simply serve
pages and not have to worry about running out of memory.
• This allows the distributed cache to be made up of a cluster of cheaper
machines only serving up memory. Once the cluster is setup, you can add
new machine of memory at any time without disrupting your users.
• Ever notice how these large companies like Google can return results so
quickly when they have hundreds of thousands of simultaneous users? They
use Clustered Distributed Caching along with other techniques to infinitely
store the data in memory because memory retrieval is faster than file or DB
retrieval.
12. Caching Algorithms
• Some of the most popular and theoretically important algorithms are
FIFO, LRU, LFU, LRU2, 2Q.
• FIFO (First In First Out):
• Items are added to the cache as they are accessed, putting them in a queue
or buffer and not changing their location in the buffer; when the cache is full,
items are ejected in the order they were added.
• Cache access overhead is constant time regardless of the size of the cache.
• The advantage of this algorithm is that it's simple and fast; it can be
implemented using just an array and an index.
• The disadvantage is that it's not very smart; it doesn't make any effort to keep
more commonly used items in cache.
13. Caching Algorithms
• LRU - (Least Recently Used):
• Items are added to the cache as they are accessed; when the cache is full, the
least recently used item is ejected.
• This type of cache is typically implemented as a linked list, so that an item in
cache, when it is accessed again, can be moved back up to the head of the
queue; items are ejected from the tail of the queue. Cache access overhead is
again constant time.
• This algorithm is simple and fast, and it has a significant advantage over FIFO
in being able to adapt somewhat to the data access pattern; frequently used
items are less likely to be ejected from the cache.
• The main disadvantage is that it can still get filled up with items that are
unlikely to be re-accessed soon; in particular, it can become useless in the
face of scans over a larger number of items than fit in the cache. Nonetheless,
this is by far the most frequently used caching algorithm.
14. Caching Algorithms
• LRU2 - (Least Recently Used Twice):
• Items are added to the main cache the second time they are accessed; when
the cache is full, the item whose second most recent access is ejected.
• Because of the need to track the two most recent accesses, access overhead
increases logarithmically with cache size, which can be a disadvantage. In
addition, accesses have to be tracked for some items not yet in the cache.
• There may also be a second, smaller, time limited cache to capture temporally
clustered accesses, but the optimal size of this cache relative to the main
cache depends strongly on the data access pattern, so there's some tuning
effort involved.
• The advantage is that it adapts to changing data patterns, like LRU, and in
addition won't fill up from scanning accesses, since items aren't retained in the
main cache unless they've been accessed more than once.
15. Caching Algorithms
• 2Q - (Two Queues):
• Items are added to an LRU cache as they are accessed.
• If accessed again, they are moved to a second, larger, LRU cache. Items are
typically ejected so as to keep the first cache at about 1/3 the size of the
second.
• This algorithm attempts to provide the advantages of LRU2 while keeping
cache access overhead constant, rather than having it increase with cache
size. Published data seems to indicate that it largely succeeds.
• LFU - (Least Frequently Used):
• Frequency of use data is kept on all items.
• The most frequently used items are kept in the cache. Because of the
bookkeeping requirements, cache access overhead increases logarithmically
with cache size; in addition, data needs to be kept on all items whether or not
in the cache.
16. Cache Time Based Expiration Models
• Simple time-based expiration: Data in the cache is invalidated based on
absolute time periods. Items are added to the cache, and remains in the
cache for a specific amount of time.
Summary for Simple time-based expiration: Fast, not adaptive, not scan
resistant.
• Extended time-based expiration: Data in the cache is invalidated based on
relative time periods. Items are added to the cache, and remains in the
cache until they are invalidated at certain points in time, such as every five
minutes, each day at 12.00 etc.
Summary for Extended time-based expiration: Fast, not adaptive, not scan
resistant.
• Sliding time-based expiration: Data in the cache is invalidated by specifying
the amount of time the item is allowed to be idle in the cache after last
access time.
Summary for Sliding time-based expiration: Fast, adaptive, not scan.
17. Caching Time Based Expiration
• JBoss Cache:
• It can be used in a standalone, non-clustered environment, to cache
frequently accessed data in memory thereby removing data retrieval or
calculation bottlenecks while providing “enterprise” features such as JTA
compatibility, eviction and persistence.
• JBoss Cache is also a clustered cache, and can be used in a cluster to
replicate state providing a high degree of failover.
• JBoss Cache can – and often is – used outside of JBoss AS, in other Java
EE environments such as spring, Tomcat, Glassfish, BEA WebLogic, IBM
WebSphere, and even in standalone Java programs.
• JBoss Cache works out of the box with most popular transaction
managers, and even provides an API where custom transaction manager
lookups can be written.
18. Cache Frameworks
• OSCache:
• It can be used to cache both static and dynamic web pages.
• OSCache is also used by many projects Jofti, spring, Hibernate.
• OSCache is also used by many sites like The Server Side, JRoller, and Java
Lobby
• JCS(Java Caching System):
• It is used in java for server-side java applications.
• It is intended to speed up dynamic web applications by providing a
means to manage cached data of various dynamic natures.
• Like any caching system, the JCS is most useful for high read, low put
applications.
19. Cache Frameworks
• EhCache:
• It is used for general purpose caching, J2EE and light-weight containers
tuned for large size cache objects.
• EhCache Acts as a pluggable cache for Hibernate 2.1. With Small foot
print, Minimal dependencies, fully documented and Production tested.
• It is used in a lot of Java frameworks such as Alfresco, Cocoon, Hibernate,
and spring, JPOX, Jofti, Acegi, Kosmos, Tudu Lists and Lutece.
• EhCache is the default cache for Hibernate with EhCache you can serialize
both Serializable objects and Non-serializable.
• Non-serializable Objects can use all parts of EhCache except for Disk
Store and replication.
20. Cache Frameworks
• JCache:
• JCache Open Source is an effort to make an Open Source version of JSR-107
JCache.
• ShiftOne:
• ShiftOne Java Object Cache is a Java library that implements several strict
object caching policies, as well as a light framework for configuring cache
behavior.
• SwarmCache:
• SwarmCache is a simple but effective distributed cache. It uses IP multicast to
efficiently communicate with any number of hosts on a LAN.
• It is specifically designed for use by clustered, database-driven web
applications.
• SwarmCache uses Java Groups internally to manage the membership and
communications of its distributed cache.
21. Cache Frameworks
• WhirlyCache:
• Whirlycache is a fast, configurable in-memory object cache for Java.
• It can be used, for example, to speed up a website or an application by
caching objects that would otherwise have to be created by querying a
database or by another expensive procedure.
• Jofti:
• Jofti is a simple to use, high-performance object indexing and searching
solution for Objects in a Caching layer or storage structure that supports the
Map interface.
• The framework supports EHCache, JBossCache and OSCache and provides
for transparent addition, removal and updating of objects in its index as well
as simple to use query capabilities for searching.
• Features include type aware searching, configurable object property indexing,
indexing/searching by interfaces as well as support for Dynamic Proxies,
primitive attributes, Collections and Arrays.
22. Cache Frameworks
• cache4j:
• cache4j is a cache for Java objects with a simple API and fast implementation.
• It features in-memory caching, a design for a multi-threaded environment,
both synchronized and blocking implementations, a choice of eviction
algorithms (LFU, LRU, FIFO), and the choice of either hard or soft references
for object storage..
• Open Terracotta:
• Open Terracotta is Open Source Clustering for Java.
• It has the features to support HTTP Session Replication, Distributed Cache,
POJO Clustering, Application Coordination across cluster's JVMs (made using
code injection, so you don't need to modify your code).
23. Cache Drawbacks
• Stale data:
• This means that when you use cached content/data you are at risk of
presenting old data that's no longer relevant to the new situation.
• If you've cached a query of products, but in the mean time the product
manager has delete four products, the users will get listings to products that
don't exists.
• Overhead:
• The business logic you use to make sure your data is somewhere between
being fast and being stale, which lead to complexity, and complexity leads to
more code that you need to maintain and understand.
• You'll easily lose oversight of where data exists in the caching complex, at
what level, and how to fix the stale data if you get it.