SlideShare a Scribd company logo
1 of 41
Download to read offline
1
Java In-Process Caching
Performance, Progress and Pitfalls
Tuesday, May 21, 2019
19th Software Performance Meetup, Munich
Jens Wilke
2
Links - Disclaimer - Copyright
talk slides and diagram source data
https://github.com/cruftex/talk-java-in-process-caching-performance-progress-pitfalls
used benchmarks
https://github.com/cache2k/cache2k-benchmark
disclaimer
No guarantees at all. Do not sue me for any mistake,
instead, send a pull request and correct it!
Copyright: Creative Commons Attribution
CC BY 4.0
3
About Me
● Performance Fan(atic)
● Java Hacker since 1998
● Author of cache2k
● 70+ answered questions on
StackOverflow about Caching
● JCache / JSR107 Contributor
Jens Wilke
@cruftex
cruftex.net
4
Up Next
Java In-Process Caching What and Why
5
Example 1: Geolocation Lookup
48° 4' 28" N 11° 40' 17" E
6
Example 2: Date Formatting
7
Expensive Operations per Web Request
0 1x 10x
How often is an operation executed per web
request or user interaction?
●
Less than once:
e.g. initialization on startup
●
Exactly once:
e.g. fetch data and resolve geolocation
●
More than once:
e.g. render a time or date
X X X
8
Reduce Expensive Operations
0 1 10x
Cache:
Less executions per
web request or user
interaction
X X X
9
(Java) Caching
● temporary data storage to serve
requests faster
● reduce expensive operations at the cost
of storage
● A tool to tune the space time tradeoff
problem
● Lower latency and improve UX
● If not because of great UX, let‘s save
computing costs!
technical benefits
10
Java In Process Caching
● temporary data storage to serve
requests faster
● reduce expensive operations at the cost
of storage heap memory
● keep data as close to the CPU as possible
● A tool to tune the space time tradeoff
problem
● Lower latency much more and improve
UX
● If not because of great UX, let‘s save
more computing costs!
technical benefits
11
Constructing an Java In Process Cache
The interface of a cache is similar
(sometimes identical) to a Java Map:
cache.put(key, value);
value = cache.get(key);
● A hash table
● An eviction strategy
– to limit the used memory
– but keep data that is „hot“
interface implementation
12
Up Next
Benchmark a Simple Cache
13
@Param({"100000"})
public int entryCount = 100 * 1000;
BenchmarkCache<Integer, Integer> cache;
Integer[] ints;
@Setup
public void setup() throws Exception {
cache = getFactory().create(entryCount);
ints = new Integer[PATTERN_COUNT];
RandomGenerator generator =
new XorShift1024StarRandomGenerator(1802);
for (int i = 0; i < PATTERN_COUNT; i++) {
ints[i] = generator.nextInt(entryCount);
}
for (int i = 0; i < entryCount; i++) {
cache.put(i, i);
}
}
@Benchmark @BenchmarkMode(Mode.Throughput)
public long read(ThreadState threadState) {
int idx = (int) (threadState.index++ % PATTERN_COUNT);
return cache.get(ints[idx]);
}
1
2
3
4
Read Only JMH Benchmark
1) create a cache, via a wrapper to
adapt to different implementations
2) create an array with random
integer objects. Value range does
not exceed entry count
3) fill cache once, not part of the
benchmark
4) benchmark operation does one
cache read with random key
Benchmark that does only read a
cache that is filled with data intially.
No eviction takes place, so we can
compare the read throughput with a
(concurrent) hash table.
14
Benchmark Parameters
● CPU: Intel(R) Xeon(R) CPU E3-1240
v5 @ 3.50GHz 4 physical cores
● Benchmarks are done with different
number of cores by Linux CPU
hotplugging
● Oracle JVM 1.8.0-131, JMH 1.18
● Ubuntu 14.04, 4.4.0-137-generic
● Google Guava Cache, Version 26
● Caffeine, Version 2.6.2
● cache2k, Version 1.2.0.Final
● EHCache, Version 3.6.1
hardware cache versions
● 2 forks, 2 warmup iterations, 3
measurement iterations, 15
second iterations times
● => 6 measurement iterations
JMH parameters
15
Results
 ConcurrentHashMap
 Simple Java Implementation with LRU
via LinkedHashMap
 Google Guava Cache
 Simple Java Implementation with LRU
via LinkedHashMap and Segmentation
Y axis: operations/s
X axis: Number of threads
0.0
20.0M
40.0M
60.0M
80.0M
100.0M
120.0M
140.0M
160.0M
180.0M
ops/s
threads-size-hitRate
CHM
SLHM
Guava
PLHM
Especially when multi-threaded the
ConcurrentHashMap is much faster
then a cache.
16
Up Next
LRU = Least Recently Used
17
LRU List Operations
head tail
11. put (1, x) insert new head
2 12. put (2, x) insert new head
1 23. get (1) move to front
3 1 24. put (3, x) insert new head
5. put (4,x) remove tail (key 2)
insert new head
remove tail
cache operation list operation
double linked list with three entries
4 3 1
18
LRU Properties
● Simple and smart algorithm for eviction
(or replacement)
● Everybody knows it from CS, „eviction
= LRU“
● List operations need synchronization
● A cache read means rewriting
references in 4 objects, most likely
touching 4 different CPU cache lines
● A read operation (happens often!) is
more expensive than an eviction
(happens not so often!)
● LRU is not scan resistent; scans wipe
out the working set in the cache
● Non frequently accessed objects need
a long time until evicted
cool... ...but:
19
LRU Alternatives?
● Reduce CPU cycles for the read operation
● Do more costly operations later when we
need to evict
● Also take frequency into account, keeping
more frequently accessed objects longer
Overview at:
Wikipedia:Page_replacement_algorithm
we look for lots of research
20
Up Next
Clock / Clock-Pro Eviction
21
Clock
10
0
1 ● Each cache entry has a
reference bit which
indicates whether the
entry was accessed
● Access: Sets reference bit
● Eviction, scan at clock
hand:
– Not-Referenced? Evict!
– Referenced? Clear reference
and move to the next
22
Clock-Pro
● Extra clock for hot data
● History of recently evicted keys
● cache2k: Use reference counter instead of
reference bit
10
0
1
4
1
0
1
0
3
5
0
hot
cold
history
Faster:
cache access is tracked by setting a
bit or incrementing a counter
23
Up Next
Will it Blend?
(more Benchmarks...)
24
Results
● Google Guava Cache and
EHCache3 are slow in
comparison to the
ConcurrentHashMap
● Caffeine is faster, if there
are sufficient CPU
cores/threads
● Cache2k is fastest, at about
half the speed of the
ConcurrentHashMap
0.0
20.0M
40.0M
60.0M
80.0M
100.0M
120.0M
140.0M
160.0M
180.0M
1-100K-100
2-100K-100
3-100K-100
4-100K-100
ops/s
threads-size-hitRate
cache2k
Caffeine
Guava
EhCache3
CHM
25
Up Next
What about eviction efficiency?
26
Benchmarking Eviction Quality
● Collect access sequences (traces)
● Replay the access sequence on a cache and count
hits and misses
More information about the traces in the blog article:
https://cruftex.net/2016/05/09/Java-Caching-
Benchmarks-2016-Part-2.html
27
Zipf10k
0
10
20
30
40
50
60
70
80
90
100
500
2000
8000
OPT
LRU
CLOCK
EHCache3
Guava
Caffeine
cache2k
RAND
cache size
hitrate
28cache size
0
10
20
30
40
50
60
70
80
90
100
1250
2500
5000
OPT
LRU
CLOCK
EHCache3
Guava
Caffeine
cache2k
RAND
OrmAccessBusytime
hitrate
29cache size
0
10
20
30
40
50
60
100000
200000
300000
OPT
LRU
CLOCK
EHCache3
Guava
Caffeine
cache2k
RAND
UMassWebsearch1
hitrate
30
Results
● Eviction Improvements Caffeine and cache2k
● Varying results depending on access
sequences / workloads
31
But… Isn‘t Clock O(n)?!
10
0
1 Accademic Objection:
Time for eviction grows linear with
the cache size
Yes, in theory….
32
Cache2k and Clock-Pro
● Uses counter instead of reference bit
● Heuristics to reduce intensive scanning,
if not much is gained
● test with a random sequence at 80%
hitrate, results in the following average
entry scan counts:
– 100K entries: 6.00398
– 1M entries: 6.00463
– 10M entries: 6.00482
=> Little increase, but practically irrelevant.
Improved algorithm battle tested
33
Technical Overview I
Guava Caffeine EHCache3 cache2k
Latest Version 26 2.6.2 3.6.1 1.2.0.Final
JDK compatibility 8+ 8+ 8+ 6+
sun.misc.Unsafe - X X -
Hash implementation own JVM CHM old CHM own
Single object per entry - - - X
Treeifications of
collisions
- X - -
Metrics for hash
collisions
- - - X
Key mutation detection - - - X
34
Technical Overview II
Guava Caffeine EHCache3 cache2k
Eviction Algorithm Q + LRU Q + W-TinyLFU „Scan8“ Clock-Pro+
Lock Free Cache Hit Lock free Lock + Wait free Lock free Lock + Wait free
Limit by count X X X X
Limit by memory size - - X -
Weigher X X - X
JCache / JSR107 - X X X
Separete API jar - - - X
35
Try cache2k?
● Open Source, Apache 2 Licence
● On Maven Central
● Info and User Guide at: https://cache2.org
● JCache support
● Compatible with Android or pure Java
● Runs with hibernate, Spring, datanucleus
● Compatible with Java 8, 9, 10, 11, 12, …. because of no
sun.misc.Unsafe magic
36
Summary
● LRU is simple but outdated
● Caffeine and cache2k use modern eviction algorithms and have
(mostly) better eviction efficiency than LRU
● Caffeine likes to have cores
● EHCache3 likes to have memory
● cache2k optimizes on a fast/fastest access path for a „cache hit“ while
having reasonable eviction efficiency
● Modern hardware needs modern algorithms
● Faster caches allows more fine grained caching
37
Keep Tuning! Questions?
Jens Wilke
@cruftex
cruftex.net
38
Up Next
Appendix / Backup Slides
39
Simple Cache – Part I
public class LinkedHashMapCache<K,V>
extends LinkedHashMap<K,V> {
private final int cacheSize;
public LinkedHashMapCache(int cacheSize) {
super(16, 0.75F, true);
this.cacheSize = cacheSize;
}
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() >= cacheSize;
}
}
40
Simple Cache – Part II – Thread Safety
public class SynchronizedLinkedHashMapCache<K,V> {
final private LinkedHashMapCache<K,V> backingMap;
public void put(K key, V value) {
synchronized (backingMap) {
backingMap.put(key, value);
}
}
public V get(K key) {
synchronized (backingMap) {
return backingMap.get(key);
}
}
}
41
Simple Cache – Part III –
Partitioning/Segmentation
public class PartitionedLinkedHashMapCache<K,V> {
final private int PARTS = 4;
final private int MASK = 3;
final private LinkedHashMapCache<K, V>[] backingMaps =
new LinkedHashMapCache[PARTS];
public void put(K key, V value) {
LinkedHashMapCache<K, V> backingMap = backingMaps[key.hashCode() & MASK];
synchronized (backingMap) {
backingMap.put(key, value);
}
}
public V get(K key) {
LinkedHashMapCache<K, V> backingMap = backingMaps[key.hashCode() & MASK];
synchronized (backingMap) {
return backingMap.get(key);
}
}
}

More Related Content

What's hot

Mastering java in containers - MadridJUG
Mastering java in containers - MadridJUGMastering java in containers - MadridJUG
Mastering java in containers - MadridJUGJorge Morales
 
GitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons LearnedGitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons LearnedAlexey Lesovsky
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 
Become a GC Hero
Become a GC HeroBecome a GC Hero
Become a GC HeroTier1app
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection HeroTier1app
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applicationsTier1 app
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres MonitoringDenish Patel
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applicationsTier1 app
 
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...Ontico
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
16 artifacts to capture when there is a production problem
16 artifacts to capture when there is a production problem16 artifacts to capture when there is a production problem
16 artifacts to capture when there is a production problemTier1 app
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQLMark Wong
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming PatternsHao Chen
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetAlexey Lesovsky
 

What's hot (20)

Mastering java in containers - MadridJUG
Mastering java in containers - MadridJUGMastering java in containers - MadridJUG
Mastering java in containers - MadridJUG
 
GitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons LearnedGitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons Learned
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
 
Become a GC Hero
Become a GC HeroBecome a GC Hero
Become a GC Hero
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection Hero
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applications
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applications
 
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
 
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
16 artifacts to capture when there is a production problem
16 artifacts to capture when there is a production problem16 artifacts to capture when there is a production problem
16 artifacts to capture when there is a production problem
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming Patterns
 
Tools for Metaspace
Tools for MetaspaceTools for Metaspace
Tools for Metaspace
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
 

Similar to Java In-Process Caching - Performance, Progress and Pitfalls

cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015cruftex
 
Distributed caching and computing v3.7
Distributed caching and computing v3.7Distributed caching and computing v3.7
Distributed caching and computing v3.7Rahul Gupta
 
Think Distributed: The Hazelcast Way
Think Distributed: The Hazelcast WayThink Distributed: The Hazelcast Way
Think Distributed: The Hazelcast WayRahul Gupta
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8Rahul Gupta
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersSeravo
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Ontico
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionTanel Poder
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and opsaragozin
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
cache concepts and varnish-cache
cache concepts and varnish-cachecache concepts and varnish-cache
cache concepts and varnish-cacheMarc Cortinas Val
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisationgrooverdan
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 

Similar to Java In-Process Caching - Performance, Progress and Pitfalls (20)

cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
 
Distributed caching and computing v3.7
Distributed caching and computing v3.7Distributed caching and computing v3.7
Distributed caching and computing v3.7
 
Think Distributed: The Hazelcast Way
Think Distributed: The Hazelcast WayThink Distributed: The Hazelcast Way
Think Distributed: The Hazelcast Way
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8
 
JCache Using JCache
JCache Using JCacheJCache Using JCache
JCache Using JCache
 
Cache is King
Cache is KingCache is King
Cache is King
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developers
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
cache concepts and varnish-cache
cache concepts and varnish-cachecache concepts and varnish-cache
cache concepts and varnish-cache
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Modern processors
Modern processorsModern processors
Modern processors
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Java In-Process Caching - Performance, Progress and Pitfalls

  • 1. 1 Java In-Process Caching Performance, Progress and Pitfalls Tuesday, May 21, 2019 19th Software Performance Meetup, Munich Jens Wilke
  • 2. 2 Links - Disclaimer - Copyright talk slides and diagram source data https://github.com/cruftex/talk-java-in-process-caching-performance-progress-pitfalls used benchmarks https://github.com/cache2k/cache2k-benchmark disclaimer No guarantees at all. Do not sue me for any mistake, instead, send a pull request and correct it! Copyright: Creative Commons Attribution CC BY 4.0
  • 3. 3 About Me ● Performance Fan(atic) ● Java Hacker since 1998 ● Author of cache2k ● 70+ answered questions on StackOverflow about Caching ● JCache / JSR107 Contributor Jens Wilke @cruftex cruftex.net
  • 4. 4 Up Next Java In-Process Caching What and Why
  • 5. 5 Example 1: Geolocation Lookup 48° 4' 28" N 11° 40' 17" E
  • 6. 6 Example 2: Date Formatting
  • 7. 7 Expensive Operations per Web Request 0 1x 10x How often is an operation executed per web request or user interaction? ● Less than once: e.g. initialization on startup ● Exactly once: e.g. fetch data and resolve geolocation ● More than once: e.g. render a time or date X X X
  • 8. 8 Reduce Expensive Operations 0 1 10x Cache: Less executions per web request or user interaction X X X
  • 9. 9 (Java) Caching ● temporary data storage to serve requests faster ● reduce expensive operations at the cost of storage ● A tool to tune the space time tradeoff problem ● Lower latency and improve UX ● If not because of great UX, let‘s save computing costs! technical benefits
  • 10. 10 Java In Process Caching ● temporary data storage to serve requests faster ● reduce expensive operations at the cost of storage heap memory ● keep data as close to the CPU as possible ● A tool to tune the space time tradeoff problem ● Lower latency much more and improve UX ● If not because of great UX, let‘s save more computing costs! technical benefits
  • 11. 11 Constructing an Java In Process Cache The interface of a cache is similar (sometimes identical) to a Java Map: cache.put(key, value); value = cache.get(key); ● A hash table ● An eviction strategy – to limit the used memory – but keep data that is „hot“ interface implementation
  • 12. 12 Up Next Benchmark a Simple Cache
  • 13. 13 @Param({"100000"}) public int entryCount = 100 * 1000; BenchmarkCache<Integer, Integer> cache; Integer[] ints; @Setup public void setup() throws Exception { cache = getFactory().create(entryCount); ints = new Integer[PATTERN_COUNT]; RandomGenerator generator = new XorShift1024StarRandomGenerator(1802); for (int i = 0; i < PATTERN_COUNT; i++) { ints[i] = generator.nextInt(entryCount); } for (int i = 0; i < entryCount; i++) { cache.put(i, i); } } @Benchmark @BenchmarkMode(Mode.Throughput) public long read(ThreadState threadState) { int idx = (int) (threadState.index++ % PATTERN_COUNT); return cache.get(ints[idx]); } 1 2 3 4 Read Only JMH Benchmark 1) create a cache, via a wrapper to adapt to different implementations 2) create an array with random integer objects. Value range does not exceed entry count 3) fill cache once, not part of the benchmark 4) benchmark operation does one cache read with random key Benchmark that does only read a cache that is filled with data intially. No eviction takes place, so we can compare the read throughput with a (concurrent) hash table.
  • 14. 14 Benchmark Parameters ● CPU: Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz 4 physical cores ● Benchmarks are done with different number of cores by Linux CPU hotplugging ● Oracle JVM 1.8.0-131, JMH 1.18 ● Ubuntu 14.04, 4.4.0-137-generic ● Google Guava Cache, Version 26 ● Caffeine, Version 2.6.2 ● cache2k, Version 1.2.0.Final ● EHCache, Version 3.6.1 hardware cache versions ● 2 forks, 2 warmup iterations, 3 measurement iterations, 15 second iterations times ● => 6 measurement iterations JMH parameters
  • 15. 15 Results  ConcurrentHashMap  Simple Java Implementation with LRU via LinkedHashMap  Google Guava Cache  Simple Java Implementation with LRU via LinkedHashMap and Segmentation Y axis: operations/s X axis: Number of threads 0.0 20.0M 40.0M 60.0M 80.0M 100.0M 120.0M 140.0M 160.0M 180.0M ops/s threads-size-hitRate CHM SLHM Guava PLHM Especially when multi-threaded the ConcurrentHashMap is much faster then a cache.
  • 16. 16 Up Next LRU = Least Recently Used
  • 17. 17 LRU List Operations head tail 11. put (1, x) insert new head 2 12. put (2, x) insert new head 1 23. get (1) move to front 3 1 24. put (3, x) insert new head 5. put (4,x) remove tail (key 2) insert new head remove tail cache operation list operation double linked list with three entries 4 3 1
  • 18. 18 LRU Properties ● Simple and smart algorithm for eviction (or replacement) ● Everybody knows it from CS, „eviction = LRU“ ● List operations need synchronization ● A cache read means rewriting references in 4 objects, most likely touching 4 different CPU cache lines ● A read operation (happens often!) is more expensive than an eviction (happens not so often!) ● LRU is not scan resistent; scans wipe out the working set in the cache ● Non frequently accessed objects need a long time until evicted cool... ...but:
  • 19. 19 LRU Alternatives? ● Reduce CPU cycles for the read operation ● Do more costly operations later when we need to evict ● Also take frequency into account, keeping more frequently accessed objects longer Overview at: Wikipedia:Page_replacement_algorithm we look for lots of research
  • 20. 20 Up Next Clock / Clock-Pro Eviction
  • 21. 21 Clock 10 0 1 ● Each cache entry has a reference bit which indicates whether the entry was accessed ● Access: Sets reference bit ● Eviction, scan at clock hand: – Not-Referenced? Evict! – Referenced? Clear reference and move to the next
  • 22. 22 Clock-Pro ● Extra clock for hot data ● History of recently evicted keys ● cache2k: Use reference counter instead of reference bit 10 0 1 4 1 0 1 0 3 5 0 hot cold history Faster: cache access is tracked by setting a bit or incrementing a counter
  • 23. 23 Up Next Will it Blend? (more Benchmarks...)
  • 24. 24 Results ● Google Guava Cache and EHCache3 are slow in comparison to the ConcurrentHashMap ● Caffeine is faster, if there are sufficient CPU cores/threads ● Cache2k is fastest, at about half the speed of the ConcurrentHashMap 0.0 20.0M 40.0M 60.0M 80.0M 100.0M 120.0M 140.0M 160.0M 180.0M 1-100K-100 2-100K-100 3-100K-100 4-100K-100 ops/s threads-size-hitRate cache2k Caffeine Guava EhCache3 CHM
  • 25. 25 Up Next What about eviction efficiency?
  • 26. 26 Benchmarking Eviction Quality ● Collect access sequences (traces) ● Replay the access sequence on a cache and count hits and misses More information about the traces in the blog article: https://cruftex.net/2016/05/09/Java-Caching- Benchmarks-2016-Part-2.html
  • 30. 30 Results ● Eviction Improvements Caffeine and cache2k ● Varying results depending on access sequences / workloads
  • 31. 31 But… Isn‘t Clock O(n)?! 10 0 1 Accademic Objection: Time for eviction grows linear with the cache size Yes, in theory….
  • 32. 32 Cache2k and Clock-Pro ● Uses counter instead of reference bit ● Heuristics to reduce intensive scanning, if not much is gained ● test with a random sequence at 80% hitrate, results in the following average entry scan counts: – 100K entries: 6.00398 – 1M entries: 6.00463 – 10M entries: 6.00482 => Little increase, but practically irrelevant. Improved algorithm battle tested
  • 33. 33 Technical Overview I Guava Caffeine EHCache3 cache2k Latest Version 26 2.6.2 3.6.1 1.2.0.Final JDK compatibility 8+ 8+ 8+ 6+ sun.misc.Unsafe - X X - Hash implementation own JVM CHM old CHM own Single object per entry - - - X Treeifications of collisions - X - - Metrics for hash collisions - - - X Key mutation detection - - - X
  • 34. 34 Technical Overview II Guava Caffeine EHCache3 cache2k Eviction Algorithm Q + LRU Q + W-TinyLFU „Scan8“ Clock-Pro+ Lock Free Cache Hit Lock free Lock + Wait free Lock free Lock + Wait free Limit by count X X X X Limit by memory size - - X - Weigher X X - X JCache / JSR107 - X X X Separete API jar - - - X
  • 35. 35 Try cache2k? ● Open Source, Apache 2 Licence ● On Maven Central ● Info and User Guide at: https://cache2.org ● JCache support ● Compatible with Android or pure Java ● Runs with hibernate, Spring, datanucleus ● Compatible with Java 8, 9, 10, 11, 12, …. because of no sun.misc.Unsafe magic
  • 36. 36 Summary ● LRU is simple but outdated ● Caffeine and cache2k use modern eviction algorithms and have (mostly) better eviction efficiency than LRU ● Caffeine likes to have cores ● EHCache3 likes to have memory ● cache2k optimizes on a fast/fastest access path for a „cache hit“ while having reasonable eviction efficiency ● Modern hardware needs modern algorithms ● Faster caches allows more fine grained caching
  • 37. 37 Keep Tuning! Questions? Jens Wilke @cruftex cruftex.net
  • 38. 38 Up Next Appendix / Backup Slides
  • 39. 39 Simple Cache – Part I public class LinkedHashMapCache<K,V> extends LinkedHashMap<K,V> { private final int cacheSize; public LinkedHashMapCache(int cacheSize) { super(16, 0.75F, true); this.cacheSize = cacheSize; } protected boolean removeEldestEntry(Map.Entry<K, V> eldest) { return size() >= cacheSize; } }
  • 40. 40 Simple Cache – Part II – Thread Safety public class SynchronizedLinkedHashMapCache<K,V> { final private LinkedHashMapCache<K,V> backingMap; public void put(K key, V value) { synchronized (backingMap) { backingMap.put(key, value); } } public V get(K key) { synchronized (backingMap) { return backingMap.get(key); } } }
  • 41. 41 Simple Cache – Part III – Partitioning/Segmentation public class PartitionedLinkedHashMapCache<K,V> { final private int PARTS = 4; final private int MASK = 3; final private LinkedHashMapCache<K, V>[] backingMaps = new LinkedHashMapCache[PARTS]; public void put(K key, V value) { LinkedHashMapCache<K, V> backingMap = backingMaps[key.hashCode() & MASK]; synchronized (backingMap) { backingMap.put(key, value); } } public V get(K key) { LinkedHashMapCache<K, V> backingMap = backingMaps[key.hashCode() & MASK]; synchronized (backingMap) { return backingMap.get(key); } } }