SlideShare a Scribd company logo
1 of 41
Download to read offline
1
Java In-Process Caching
Performance, Progress and Pitfalls
Tuesday, May 21, 2019
19th Software Performance Meetup, Munich
Jens Wilke
2
Links - Disclaimer - Copyright
talk slides and diagram source data
https://github.com/cruftex/talk-java-in-process-caching-performance-progress-pitfalls
used benchmarks
https://github.com/cache2k/cache2k-benchmark
disclaimer
No guarantees at all. Do not sue me for any mistake,
instead, send a pull request and correct it!
Copyright: Creative Commons Attribution
CC BY 4.0
3
About Me
● Performance Fan(atic)
● Java Hacker since 1998
● Author of cache2k
● 70+ answered questions on
StackOverflow about Caching
● JCache / JSR107 Contributor
Jens Wilke
@cruftex
cruftex.net
4
Up Next
Java In-Process Caching What and Why
5
Example 1: Geolocation Lookup
48° 4' 28" N 11° 40' 17" E
6
Example 2: Date Formatting
7
Expensive Operations per Web Request
0 1x 10x
How often is an operation executed per web
request or user interaction?
●
Less than once:
e.g. initialization on startup
●
Exactly once:
e.g. fetch data and resolve geolocation
●
More than once:
e.g. render a time or date
X X X
8
Reduce Expensive Operations
0 1 10x
Cache:
Less executions per
web request or user
interaction
X X X
9
(Java) Caching
● temporary data storage to serve
requests faster
● reduce expensive operations at the cost
of storage
● A tool to tune the space time tradeoff
problem
● Lower latency and improve UX
● If not because of great UX, let‘s save
computing costs!
technical benefits
10
Java In Process Caching
● temporary data storage to serve
requests faster
● reduce expensive operations at the cost
of storage heap memory
● keep data as close to the CPU as possible
● A tool to tune the space time tradeoff
problem
● Lower latency much more and improve
UX
● If not because of great UX, let‘s save
more computing costs!
technical benefits
11
Constructing an Java In Process Cache
The interface of a cache is similar
(sometimes identical) to a Java Map:
cache.put(key, value);
value = cache.get(key);
● A hash table
● An eviction strategy
– to limit the used memory
– but keep data that is „hot“
interface implementation
12
Up Next
Benchmark a Simple Cache
13
@Param({"100000"})
public int entryCount = 100 * 1000;
BenchmarkCache<Integer, Integer> cache;
Integer[] ints;
@Setup
public void setup() throws Exception {
cache = getFactory().create(entryCount);
ints = new Integer[PATTERN_COUNT];
RandomGenerator generator =
new XorShift1024StarRandomGenerator(1802);
for (int i = 0; i < PATTERN_COUNT; i++) {
ints[i] = generator.nextInt(entryCount);
}
for (int i = 0; i < entryCount; i++) {
cache.put(i, i);
}
}
@Benchmark @BenchmarkMode(Mode.Throughput)
public long read(ThreadState threadState) {
int idx = (int) (threadState.index++ % PATTERN_COUNT);
return cache.get(ints[idx]);
}
1
2
3
4
Read Only JMH Benchmark
1) create a cache, via a wrapper to
adapt to different implementations
2) create an array with random
integer objects. Value range does
not exceed entry count
3) fill cache once, not part of the
benchmark
4) benchmark operation does one
cache read with random key
Benchmark that does only read a
cache that is filled with data intially.
No eviction takes place, so we can
compare the read throughput with a
(concurrent) hash table.
14
Benchmark Parameters
● CPU: Intel(R) Xeon(R) CPU E3-1240
v5 @ 3.50GHz 4 physical cores
● Benchmarks are done with different
number of cores by Linux CPU
hotplugging
● Oracle JVM 1.8.0-131, JMH 1.18
● Ubuntu 14.04, 4.4.0-137-generic
● Google Guava Cache, Version 26
● Caffeine, Version 2.6.2
● cache2k, Version 1.2.0.Final
● EHCache, Version 3.6.1
hardware cache versions
● 2 forks, 2 warmup iterations, 3
measurement iterations, 15
second iterations times
● => 6 measurement iterations
JMH parameters
15
Results
 ConcurrentHashMap
 Simple Java Implementation with LRU
via LinkedHashMap
 Google Guava Cache
 Simple Java Implementation with LRU
via LinkedHashMap and Segmentation
Y axis: operations/s
X axis: Number of threads
0.0
20.0M
40.0M
60.0M
80.0M
100.0M
120.0M
140.0M
160.0M
180.0M
ops/s
threads-size-hitRate
CHM
SLHM
Guava
PLHM
Especially when multi-threaded the
ConcurrentHashMap is much faster
then a cache.
16
Up Next
LRU = Least Recently Used
17
LRU List Operations
head tail
11. put (1, x) insert new head
2 12. put (2, x) insert new head
1 23. get (1) move to front
3 1 24. put (3, x) insert new head
5. put (4,x) remove tail (key 2)
insert new head
remove tail
cache operation list operation
double linked list with three entries
4 3 1
18
LRU Properties
● Simple and smart algorithm for eviction
(or replacement)
● Everybody knows it from CS, „eviction
= LRU“
● List operations need synchronization
● A cache read means rewriting
references in 4 objects, most likely
touching 4 different CPU cache lines
● A read operation (happens often!) is
more expensive than an eviction
(happens not so often!)
● LRU is not scan resistent; scans wipe
out the working set in the cache
● Non frequently accessed objects need
a long time until evicted
cool... ...but:
19
LRU Alternatives?
● Reduce CPU cycles for the read operation
● Do more costly operations later when we
need to evict
● Also take frequency into account, keeping
more frequently accessed objects longer
Overview at:
Wikipedia:Page_replacement_algorithm
we look for lots of research
20
Up Next
Clock / Clock-Pro Eviction
21
Clock
10
0
1 ● Each cache entry has a
reference bit which
indicates whether the
entry was accessed
● Access: Sets reference bit
● Eviction, scan at clock
hand:
– Not-Referenced? Evict!
– Referenced? Clear reference
and move to the next
22
Clock-Pro
● Extra clock for hot data
● History of recently evicted keys
● cache2k: Use reference counter instead of
reference bit
10
0
1
4
1
0
1
0
3
5
0
hot
cold
history
Faster:
cache access is tracked by setting a
bit or incrementing a counter
23
Up Next
Will it Blend?
(more Benchmarks...)
24
Results
● Google Guava Cache and
EHCache3 are slow in
comparison to the
ConcurrentHashMap
● Caffeine is faster, if there
are sufficient CPU
cores/threads
● Cache2k is fastest, at about
half the speed of the
ConcurrentHashMap
0.0
20.0M
40.0M
60.0M
80.0M
100.0M
120.0M
140.0M
160.0M
180.0M
1-100K-100
2-100K-100
3-100K-100
4-100K-100
ops/s
threads-size-hitRate
cache2k
Caffeine
Guava
EhCache3
CHM
25
Up Next
What about eviction efficiency?
26
Benchmarking Eviction Quality
● Collect access sequences (traces)
● Replay the access sequence on a cache and count
hits and misses
More information about the traces in the blog article:
https://cruftex.net/2016/05/09/Java-Caching-
Benchmarks-2016-Part-2.html
27
Zipf10k
0
10
20
30
40
50
60
70
80
90
100
500
2000
8000
OPT
LRU
CLOCK
EHCache3
Guava
Caffeine
cache2k
RAND
cache size
hitrate
28cache size
0
10
20
30
40
50
60
70
80
90
100
1250
2500
5000
OPT
LRU
CLOCK
EHCache3
Guava
Caffeine
cache2k
RAND
OrmAccessBusytime
hitrate
29cache size
0
10
20
30
40
50
60
100000
200000
300000
OPT
LRU
CLOCK
EHCache3
Guava
Caffeine
cache2k
RAND
UMassWebsearch1
hitrate
30
Results
● Eviction Improvements Caffeine and cache2k
● Varying results depending on access
sequences / workloads
31
But… Isn‘t Clock O(n)?!
10
0
1 Accademic Objection:
Time for eviction grows linear with
the cache size
Yes, in theory….
32
Cache2k and Clock-Pro
● Uses counter instead of reference bit
● Heuristics to reduce intensive scanning,
if not much is gained
● test with a random sequence at 80%
hitrate, results in the following average
entry scan counts:
– 100K entries: 6.00398
– 1M entries: 6.00463
– 10M entries: 6.00482
=> Little increase, but practically irrelevant.
Improved algorithm battle tested
33
Technical Overview I
Guava Caffeine EHCache3 cache2k
Latest Version 26 2.6.2 3.6.1 1.2.0.Final
JDK compatibility 8+ 8+ 8+ 6+
sun.misc.Unsafe - X X -
Hash implementation own JVM CHM old CHM own
Single object per entry - - - X
Treeifications of
collisions
- X - -
Metrics for hash
collisions
- - - X
Key mutation detection - - - X
34
Technical Overview II
Guava Caffeine EHCache3 cache2k
Eviction Algorithm Q + LRU Q + W-TinyLFU „Scan8“ Clock-Pro+
Lock Free Cache Hit Lock free Lock + Wait free Lock free Lock + Wait free
Limit by count X X X X
Limit by memory size - - X -
Weigher X X - X
JCache / JSR107 - X X X
Separete API jar - - - X
35
Try cache2k?
● Open Source, Apache 2 Licence
● On Maven Central
● Info and User Guide at: https://cache2.org
● JCache support
● Compatible with Android or pure Java
● Runs with hibernate, Spring, datanucleus
● Compatible with Java 8, 9, 10, 11, 12, …. because of no
sun.misc.Unsafe magic
36
Summary
● LRU is simple but outdated
● Caffeine and cache2k use modern eviction algorithms and have
(mostly) better eviction efficiency than LRU
● Caffeine likes to have cores
● EHCache3 likes to have memory
● cache2k optimizes on a fast/fastest access path for a „cache hit“ while
having reasonable eviction efficiency
● Modern hardware needs modern algorithms
● Faster caches allows more fine grained caching
37
Keep Tuning! Questions?
Jens Wilke
@cruftex
cruftex.net
38
Up Next
Appendix / Backup Slides
39
Simple Cache – Part I
public class LinkedHashMapCache<K,V>
extends LinkedHashMap<K,V> {
private final int cacheSize;
public LinkedHashMapCache(int cacheSize) {
super(16, 0.75F, true);
this.cacheSize = cacheSize;
}
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() >= cacheSize;
}
}
40
Simple Cache – Part II – Thread Safety
public class SynchronizedLinkedHashMapCache<K,V> {
final private LinkedHashMapCache<K,V> backingMap;
public void put(K key, V value) {
synchronized (backingMap) {
backingMap.put(key, value);
}
}
public V get(K key) {
synchronized (backingMap) {
return backingMap.get(key);
}
}
}
41
Simple Cache – Part III –
Partitioning/Segmentation
public class PartitionedLinkedHashMapCache<K,V> {
final private int PARTS = 4;
final private int MASK = 3;
final private LinkedHashMapCache<K, V>[] backingMaps =
new LinkedHashMapCache[PARTS];
public void put(K key, V value) {
LinkedHashMapCache<K, V> backingMap = backingMaps[key.hashCode() & MASK];
synchronized (backingMap) {
backingMap.put(key, value);
}
}
public V get(K key) {
LinkedHashMapCache<K, V> backingMap = backingMaps[key.hashCode() & MASK];
synchronized (backingMap) {
return backingMap.get(key);
}
}
}

More Related Content

What's hot

How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
Denish Patel
 

What's hot (20)

Mastering java in containers - MadridJUG
Mastering java in containers - MadridJUGMastering java in containers - MadridJUG
Mastering java in containers - MadridJUG
 
GitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons LearnedGitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons Learned
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
 
Become a GC Hero
Become a GC HeroBecome a GC Hero
Become a GC Hero
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection Hero
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applications
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
Lets crash-applications
Lets crash-applicationsLets crash-applications
Lets crash-applications
 
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
 
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
16 artifacts to capture when there is a production problem
16 artifacts to capture when there is a production problem16 artifacts to capture when there is a production problem
16 artifacts to capture when there is a production problem
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming Patterns
 
Tools for Metaspace
Tools for MetaspaceTools for Metaspace
Tools for Metaspace
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
 

Similar to Java In-Process Caching - Performance, Progress and Pitfalls

Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Lucidworks
 

Similar to Java In-Process Caching - Performance, Progress and Pitfalls (20)

cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
 
Distributed caching and computing v3.7
Distributed caching and computing v3.7Distributed caching and computing v3.7
Distributed caching and computing v3.7
 
Think Distributed: The Hazelcast Way
Think Distributed: The Hazelcast WayThink Distributed: The Hazelcast Way
Think Distributed: The Hazelcast Way
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8
 
JCache Using JCache
JCache Using JCacheJCache Using JCache
JCache Using JCache
 
Cache is King
Cache is KingCache is King
Cache is King
 
Less and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developersLess and faster – Cache tips for WordPress developers
Less and faster – Cache tips for WordPress developers
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
cache concepts and varnish-cache
cache concepts and varnish-cachecache concepts and varnish-cache
cache concepts and varnish-cache
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Modern processors
Modern processorsModern processors
Modern processors
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Java In-Process Caching - Performance, Progress and Pitfalls

  • 1. 1 Java In-Process Caching Performance, Progress and Pitfalls Tuesday, May 21, 2019 19th Software Performance Meetup, Munich Jens Wilke
  • 2. 2 Links - Disclaimer - Copyright talk slides and diagram source data https://github.com/cruftex/talk-java-in-process-caching-performance-progress-pitfalls used benchmarks https://github.com/cache2k/cache2k-benchmark disclaimer No guarantees at all. Do not sue me for any mistake, instead, send a pull request and correct it! Copyright: Creative Commons Attribution CC BY 4.0
  • 3. 3 About Me ● Performance Fan(atic) ● Java Hacker since 1998 ● Author of cache2k ● 70+ answered questions on StackOverflow about Caching ● JCache / JSR107 Contributor Jens Wilke @cruftex cruftex.net
  • 4. 4 Up Next Java In-Process Caching What and Why
  • 5. 5 Example 1: Geolocation Lookup 48° 4' 28" N 11° 40' 17" E
  • 6. 6 Example 2: Date Formatting
  • 7. 7 Expensive Operations per Web Request 0 1x 10x How often is an operation executed per web request or user interaction? ● Less than once: e.g. initialization on startup ● Exactly once: e.g. fetch data and resolve geolocation ● More than once: e.g. render a time or date X X X
  • 8. 8 Reduce Expensive Operations 0 1 10x Cache: Less executions per web request or user interaction X X X
  • 9. 9 (Java) Caching ● temporary data storage to serve requests faster ● reduce expensive operations at the cost of storage ● A tool to tune the space time tradeoff problem ● Lower latency and improve UX ● If not because of great UX, let‘s save computing costs! technical benefits
  • 10. 10 Java In Process Caching ● temporary data storage to serve requests faster ● reduce expensive operations at the cost of storage heap memory ● keep data as close to the CPU as possible ● A tool to tune the space time tradeoff problem ● Lower latency much more and improve UX ● If not because of great UX, let‘s save more computing costs! technical benefits
  • 11. 11 Constructing an Java In Process Cache The interface of a cache is similar (sometimes identical) to a Java Map: cache.put(key, value); value = cache.get(key); ● A hash table ● An eviction strategy – to limit the used memory – but keep data that is „hot“ interface implementation
  • 12. 12 Up Next Benchmark a Simple Cache
  • 13. 13 @Param({"100000"}) public int entryCount = 100 * 1000; BenchmarkCache<Integer, Integer> cache; Integer[] ints; @Setup public void setup() throws Exception { cache = getFactory().create(entryCount); ints = new Integer[PATTERN_COUNT]; RandomGenerator generator = new XorShift1024StarRandomGenerator(1802); for (int i = 0; i < PATTERN_COUNT; i++) { ints[i] = generator.nextInt(entryCount); } for (int i = 0; i < entryCount; i++) { cache.put(i, i); } } @Benchmark @BenchmarkMode(Mode.Throughput) public long read(ThreadState threadState) { int idx = (int) (threadState.index++ % PATTERN_COUNT); return cache.get(ints[idx]); } 1 2 3 4 Read Only JMH Benchmark 1) create a cache, via a wrapper to adapt to different implementations 2) create an array with random integer objects. Value range does not exceed entry count 3) fill cache once, not part of the benchmark 4) benchmark operation does one cache read with random key Benchmark that does only read a cache that is filled with data intially. No eviction takes place, so we can compare the read throughput with a (concurrent) hash table.
  • 14. 14 Benchmark Parameters ● CPU: Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz 4 physical cores ● Benchmarks are done with different number of cores by Linux CPU hotplugging ● Oracle JVM 1.8.0-131, JMH 1.18 ● Ubuntu 14.04, 4.4.0-137-generic ● Google Guava Cache, Version 26 ● Caffeine, Version 2.6.2 ● cache2k, Version 1.2.0.Final ● EHCache, Version 3.6.1 hardware cache versions ● 2 forks, 2 warmup iterations, 3 measurement iterations, 15 second iterations times ● => 6 measurement iterations JMH parameters
  • 15. 15 Results  ConcurrentHashMap  Simple Java Implementation with LRU via LinkedHashMap  Google Guava Cache  Simple Java Implementation with LRU via LinkedHashMap and Segmentation Y axis: operations/s X axis: Number of threads 0.0 20.0M 40.0M 60.0M 80.0M 100.0M 120.0M 140.0M 160.0M 180.0M ops/s threads-size-hitRate CHM SLHM Guava PLHM Especially when multi-threaded the ConcurrentHashMap is much faster then a cache.
  • 16. 16 Up Next LRU = Least Recently Used
  • 17. 17 LRU List Operations head tail 11. put (1, x) insert new head 2 12. put (2, x) insert new head 1 23. get (1) move to front 3 1 24. put (3, x) insert new head 5. put (4,x) remove tail (key 2) insert new head remove tail cache operation list operation double linked list with three entries 4 3 1
  • 18. 18 LRU Properties ● Simple and smart algorithm for eviction (or replacement) ● Everybody knows it from CS, „eviction = LRU“ ● List operations need synchronization ● A cache read means rewriting references in 4 objects, most likely touching 4 different CPU cache lines ● A read operation (happens often!) is more expensive than an eviction (happens not so often!) ● LRU is not scan resistent; scans wipe out the working set in the cache ● Non frequently accessed objects need a long time until evicted cool... ...but:
  • 19. 19 LRU Alternatives? ● Reduce CPU cycles for the read operation ● Do more costly operations later when we need to evict ● Also take frequency into account, keeping more frequently accessed objects longer Overview at: Wikipedia:Page_replacement_algorithm we look for lots of research
  • 20. 20 Up Next Clock / Clock-Pro Eviction
  • 21. 21 Clock 10 0 1 ● Each cache entry has a reference bit which indicates whether the entry was accessed ● Access: Sets reference bit ● Eviction, scan at clock hand: – Not-Referenced? Evict! – Referenced? Clear reference and move to the next
  • 22. 22 Clock-Pro ● Extra clock for hot data ● History of recently evicted keys ● cache2k: Use reference counter instead of reference bit 10 0 1 4 1 0 1 0 3 5 0 hot cold history Faster: cache access is tracked by setting a bit or incrementing a counter
  • 23. 23 Up Next Will it Blend? (more Benchmarks...)
  • 24. 24 Results ● Google Guava Cache and EHCache3 are slow in comparison to the ConcurrentHashMap ● Caffeine is faster, if there are sufficient CPU cores/threads ● Cache2k is fastest, at about half the speed of the ConcurrentHashMap 0.0 20.0M 40.0M 60.0M 80.0M 100.0M 120.0M 140.0M 160.0M 180.0M 1-100K-100 2-100K-100 3-100K-100 4-100K-100 ops/s threads-size-hitRate cache2k Caffeine Guava EhCache3 CHM
  • 25. 25 Up Next What about eviction efficiency?
  • 26. 26 Benchmarking Eviction Quality ● Collect access sequences (traces) ● Replay the access sequence on a cache and count hits and misses More information about the traces in the blog article: https://cruftex.net/2016/05/09/Java-Caching- Benchmarks-2016-Part-2.html
  • 30. 30 Results ● Eviction Improvements Caffeine and cache2k ● Varying results depending on access sequences / workloads
  • 31. 31 But… Isn‘t Clock O(n)?! 10 0 1 Accademic Objection: Time for eviction grows linear with the cache size Yes, in theory….
  • 32. 32 Cache2k and Clock-Pro ● Uses counter instead of reference bit ● Heuristics to reduce intensive scanning, if not much is gained ● test with a random sequence at 80% hitrate, results in the following average entry scan counts: – 100K entries: 6.00398 – 1M entries: 6.00463 – 10M entries: 6.00482 => Little increase, but practically irrelevant. Improved algorithm battle tested
  • 33. 33 Technical Overview I Guava Caffeine EHCache3 cache2k Latest Version 26 2.6.2 3.6.1 1.2.0.Final JDK compatibility 8+ 8+ 8+ 6+ sun.misc.Unsafe - X X - Hash implementation own JVM CHM old CHM own Single object per entry - - - X Treeifications of collisions - X - - Metrics for hash collisions - - - X Key mutation detection - - - X
  • 34. 34 Technical Overview II Guava Caffeine EHCache3 cache2k Eviction Algorithm Q + LRU Q + W-TinyLFU „Scan8“ Clock-Pro+ Lock Free Cache Hit Lock free Lock + Wait free Lock free Lock + Wait free Limit by count X X X X Limit by memory size - - X - Weigher X X - X JCache / JSR107 - X X X Separete API jar - - - X
  • 35. 35 Try cache2k? ● Open Source, Apache 2 Licence ● On Maven Central ● Info and User Guide at: https://cache2.org ● JCache support ● Compatible with Android or pure Java ● Runs with hibernate, Spring, datanucleus ● Compatible with Java 8, 9, 10, 11, 12, …. because of no sun.misc.Unsafe magic
  • 36. 36 Summary ● LRU is simple but outdated ● Caffeine and cache2k use modern eviction algorithms and have (mostly) better eviction efficiency than LRU ● Caffeine likes to have cores ● EHCache3 likes to have memory ● cache2k optimizes on a fast/fastest access path for a „cache hit“ while having reasonable eviction efficiency ● Modern hardware needs modern algorithms ● Faster caches allows more fine grained caching
  • 37. 37 Keep Tuning! Questions? Jens Wilke @cruftex cruftex.net
  • 38. 38 Up Next Appendix / Backup Slides
  • 39. 39 Simple Cache – Part I public class LinkedHashMapCache<K,V> extends LinkedHashMap<K,V> { private final int cacheSize; public LinkedHashMapCache(int cacheSize) { super(16, 0.75F, true); this.cacheSize = cacheSize; } protected boolean removeEldestEntry(Map.Entry<K, V> eldest) { return size() >= cacheSize; } }
  • 40. 40 Simple Cache – Part II – Thread Safety public class SynchronizedLinkedHashMapCache<K,V> { final private LinkedHashMapCache<K,V> backingMap; public void put(K key, V value) { synchronized (backingMap) { backingMap.put(key, value); } } public V get(K key) { synchronized (backingMap) { return backingMap.get(key); } } }
  • 41. 41 Simple Cache – Part III – Partitioning/Segmentation public class PartitionedLinkedHashMapCache<K,V> { final private int PARTS = 4; final private int MASK = 3; final private LinkedHashMapCache<K, V>[] backingMaps = new LinkedHashMapCache[PARTS]; public void put(K key, V value) { LinkedHashMapCache<K, V> backingMap = backingMaps[key.hashCode() & MASK]; synchronized (backingMap) { backingMap.put(key, value); } } public V get(K key) { LinkedHashMapCache<K, V> backingMap = backingMaps[key.hashCode() & MASK]; synchronized (backingMap) { return backingMap.get(key); } } }