How to Stop Worrying and
Start Caching in Java
SriSatish Ambati , Azul Systems
sris@azulsystems.com
Manik Surtani, RedHat ...
The Trail
Examples
Elements of Cache Performance
Theory
Metrics
200GB Cache Design
Whodunit
Overheads in Java – Objects, G...
Wie die wahl hat, hat die Qual!
He who has the choice has the agony!
Some example caches
Homegrown caches – Surprisingly work well.
(do it yourself! It’s a giant hash)
Infinispan, Coherence, ...
Visualize Cache
Simple example
Visualize Cache
Replicated Cache Distributed Cache
Elements of Cache Performance
Hot or Not: The 80/20 rule.
A small set of objects are very popular!
Hit or Miss: Hit Ratio
...
Elements of Cache Performance : Metrics
Inserts: Puts/sec, Latencies
Reads: Gets/sec, Latencies, Indexing
Updates: mods/se...
Partitioning & Distributed Caches
Near Cache/L1 Cache
Bring data close to the Logic that is using it.
Birds of feather flo...
I/O considerations
Asynchronous
Sockets
Queues & Threads serving the sockets
Bandwidth
Persistence –
File, DB (CacheLoader...
Buckets–Partitions, Hashing function
Birthdays, Hashmaps & Prime Numbers
Collisions, Chaining
Unbalanced HashMap
- behaves...
Imagine
– John Lennon
How many nodes to get a 200G cache?
Who needs a 200G cache?
Disk is the new Tape!
200 nodes @ 1GB heap each
2 nodes @ 100G...
SIDE ONE
Join together with the band
I don’t even know myself
SIDE TWO
Let’s see action
Relay
Don’t happen that way at all...
Java Limits: Objects are not cheap!
How many bytes for a 8 char String ?
(assume 32-bit)
How many objects in a Tomcat idle...
Picking the right collection: Mozart or Bach?
100 elements of:
Treemap <Double, Double>
82% overhead, 88 bytes constant co...
JEE is not cheap either!
Class name Size (B) Count Avg (B)
Total 21,580,592 228,805 94.3
char[] 4,215,784 48,574 86.8
byte...
Java Limits: Garbage Collection
GC defines cache configuration
Pause Times: If stop_the_world_pause > time_to_live
⇒
node ...
Fragmentation, Generations
Fragmentation – compact often, uniform sized objects
[Finding seats for a gang-of-four
is easie...
Locks: Why Amdahl’s law trumps Moore’s!
Schemes
Optimistic, Pessimistic
Consistency
Eventually vs. ACID
Contention, Waits
...
Inter-node communication
• TCP for mgmt & data
– Infinispan
• TCP for mgmt, UDP for data
– Coherence, Infinispan
• UDP for...
Sockets, Queues, Threads, Stages
How many sockets?
gemstone (VMWare) : Multi-socket implementation,
infinispan
alt: increa...
Marshal Arts:
Serialization
/Deserialization
java.io.Serializable is S.L..O.…W
+ Use “transient”
+ jserial, avro, etc
+ Go...
Serialization + Deserialization uBench
http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
Count what is countable, measure what is measurable, and
what is not measurable, make measurable
-Galileo
Latency:
Where have all the millis gone?
Measure. 90th
percentile. Look for consistency.
=> JMX is great! JMX is also very...
Q&A
References:
Making Sense of Large Heaps, Nick Mitchell, IBM
Oracle Coherence 3.5, Aleksandar Seovic
Large Pages in Jav...
Optimization hinders evolution
– Alan Perlis
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
Upcoming SlideShare
Loading in …5
×

How to Stop Worrying and Start Caching in Java

2,434 views

Published on

Caching in Java - A review of different caching vendors (Oracle Coherence, Apache Cassandra, Infinispan, Ehcache/Terracotta, etc) and limitations presented by the underlying Java Platform.

Presented at RedHat Summit 2010, Boston
Speakers: SriSatish Ambati, Performance Engg
Manik Surtani, InfiniSpan Lead


Presentation details from RH Summit:

How to Stop Worrying & Start Caching in Java
SriSatish Ambati — Performance & Partner Engineer, Azul Systems, Inc.
Manik Surtani — Principal Software Engineer, Red Hat

Application data caching has come of age as distributed and large cache clusters are now common. The next generation of applications that depend on efficient caching has come into being and data and cache size explosion has set in.

In this session, Azul Systems’ SriSatish Ambati and Red Hat’s Manik Surtani will survey performance characteristics of different cache algorithms, their implementations (e.g., implementing a 200Gb data cache size), and how well they work in practical JVM deployments. In each scenario, they will present patterns of architecture that scale, and demonstrate where read and write performance stands in the context of increasing cache sizes and concurrency.

Throughout this discussion, they will recognize several villains, including heap fragmentation, long-lived objects, multi-VM communication, socket handlers, and queue managers. SriSatish and Manik will take a fun-filled “whodunit” approach to portray the roles played by each villain in killing cache performance.

http://www.redhat.com/promo/summit/2010/sessions/jboss.html

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
  • I love the title!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,434
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
83
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide
  • Description of Graph
    Shows the average number of cache misses expected when inserting into a hash table with various collision resolution mechanisms; on modern machines, this is a good estimate of actual clock time required. This seems to confirm the common heuristic that performance begins to degrade at about 80% table density.
    It is based on a simulated model of a hash table where the hash function chooses indexes for each insertion uniformly at random. The parameters of the model were:
    You may be curious what happens in the case where no cache exists. In other words, how does the number of probes (number of reads, number of comparisons) rise as the table fills? The curve is similar in shape to the one above, but shifted left: it requires an average of 24 probes for an 80% full table, and you have to go down to a 50% full table for only 3 probes to be required on average. This suggests that in the absence of a cache, ideally your hash table should be about twice as large for probing as for chaining.
  • This is the zero of your Application Platform
  • How to Stop Worrying and Start Caching in Java

    1. 1. How to Stop Worrying and Start Caching in Java SriSatish Ambati , Azul Systems sris@azulsystems.com Manik Surtani, RedHat Inc msurtani@redhat.com
    2. 2. The Trail Examples Elements of Cache Performance Theory Metrics 200GB Cache Design Whodunit Overheads in Java – Objects, GC Locks, Communication, Sockets, Queues (SEDA) Serialization Measure
    3. 3. Wie die wahl hat, hat die Qual! He who has the choice has the agony!
    4. 4. Some example caches Homegrown caches – Surprisingly work well. (do it yourself! It’s a giant hash) Infinispan, Coherence, Gemstone, GigaSpaces, EhCache, etc NoSQL stores (Apache Cassandra) Non java alternatives: MemCached & clones.
    5. 5. Visualize Cache Simple example Visualize Cache Replicated Cache Distributed Cache
    6. 6. Elements of Cache Performance Hot or Not: The 80/20 rule. A small set of objects are very popular! Hit or Miss: Hit Ratio How effective is your cache? LRU, LFU, FIFO, LIRS.. Expiration Long-lived objects, better locality Spikes happen Cascading events: Node load, Node(s) dead. Cache Thrash: Full Table scan.
    7. 7. Elements of Cache Performance : Metrics Inserts: Puts/sec, Latencies Reads: Gets/sec, Latencies, Indexing Updates: mods/sec, latencies (Locate, Modify & Notify) Replication, Consistency, Persistence Size of Objects Number of Objects Size of Cache # of cacheserver Nodes (read only, read write) # of clients
    8. 8. Partitioning & Distributed Caches Near Cache/L1 Cache Bring data close to the Logic that is using it. Birds of feather flock together - related data live closer Read-only nodes, Read-Write nodes Management nodes Communication Costs Balancing (buckets) Serialization (more later)
    9. 9. I/O considerations Asynchronous Sockets Queues & Threads serving the sockets Bandwidth Persistence – File, DB (CacheLoaders) Write Behind Data Access Patterns of Doom, ex: “Death by a million cuts” – Batch your reads.
    10. 10. Buckets–Partitions, Hashing function Birthdays, Hashmaps & Prime Numbers Collisions, Chaining Unbalanced HashMap - behaves like a list O(n) retrieval Partition-aware Hashmaps Non-blocking Hashmaps (see: locking) Performance Degrades with 80% table density
    11. 11. Imagine – John Lennon
    12. 12. How many nodes to get a 200G cache? Who needs a 200G cache? Disk is the new Tape! 200 nodes @ 1GB heap each 2 nodes @ 100GB heap each (plus overhead)
    13. 13. SIDE ONE Join together with the band I don’t even know myself SIDE TWO Let’s see action Relay Don’t happen that way at all The seeker
    14. 14. Java Limits: Objects are not cheap! How many bytes for a 8 char String ? (assume 32-bit) How many objects in a Tomcat idle instance? char[] String book keeping fields 12 bytes JVM Overhead 16 bytes Pointer 4 bytes data 16 bytes JVM Overhead 16 bytes A. 64bytes 31% overhead Size of String Varies with JVM
    15. 15. Picking the right collection: Mozart or Bach? 100 elements of: Treemap <Double, Double> 82% overhead, 88 bytes constant cost per element [pro: enables updates while maintaining order] double[], double[] – 2% overhead, amortized [con: load-then-use] Sparse collections, empty collections. wrong collections. TreeMap Fixed Overhead: 48 bytes TreeMap$Entry data Per-entry Overhead: 40 bytes Double doubl e *From one 32-bit JVM. Varies with JVM Architecture Double JVM Overhead 16 bytes data 8 bytes doubl e
    16. 16. JEE is not cheap either! Class name Size (B) Count Avg (B) Total 21,580,592 228,805 94.3 char[] 4,215,784 48,574 86.8 byte[] 3,683,984 5,024 733.3 Built-in VM methodKlass 2,493,064 16,355 152.4 Built-in VM constMethodKlass 1,955,696 16,355 119.6 Built-in VM constantPoolKlass 1,437,240 1,284 1,119.30 Built-in VM instanceKlass 1,078,664 1,284 840.1 java.lang.Class[] 922,808 45,354 20.3 Built-in VM constantPoolCacheKlass 903,360 1,132 798 java.lang.String 753,936 31,414 24 java.lang.Object[] 702,264 8,118 86.5 java.lang.reflect.Method 310,752 2,158 144 short[] 261,112 3,507 74.5 java.lang.Class 255,904 1,454 176 int[][] 184,680 2,032 90.9 java.lang.String[] 173,176 1,746 99.2 java.util.zip.ZipEntry 172,080 2,390 72 Apache Tomcat 6.0 Allocated Class name Size (B) Count Avg (B) Total 1,410,764,512 19,830,135 71.1 char[] 423,372,528 4,770,424 88.7 byte[] 347,332,152 1,971,692 176.2 int[] 85,509,280 1,380,642 61.9 java.lang.String 73,623,024 3,067,626 24 java.lang.Object[] 64,788,840 565,693 114.5 java.util.regex.Matcher 51,448,320 643,104 80 java.lang.reflect.Method 43,374,528 301,212 144 java.util.HashMap$Entry[] 27,876,848 140,898 197.9 java.util.TreeMap$Entry 22,116,136 394,931 56 java.util.HashMap$Entry 19,806,440 495,161 40 java.nio.HeapByteBuffer 17,582,928 366,311 48 java.nio.HeapCharBuffer 17,575,296 366,152 48 java.lang.StringBuilder 15,322,128 638,422 24 java.util.TreeMap$EntryIterator 15,056,784 313,683 48 java.util.ArrayList 11,577,480 289,437 40 java.util.HashMap 7,829,056 122,329 64 java.util.TreeMap 7,754,688 107,704 72 Million Objects allocated live JBoss 5.1 20 4 Apache Tomcat 6.0 0.25 0.1 Live JBoss 5.1 Allocated
    17. 17. Java Limits: Garbage Collection GC defines cache configuration Pause Times: If stop_the_world_pause > time_to_live ⇒ node is declared dead Allocation Rate: Write, Insertion Speed. Live Objects (residency) if residency > 50%. GC overheads dominate. Increasing Heap Size only increases pause times. 64-bit is not going to rescue us either: Increases object header, alignment & pointer overhead 40-50% increase in heap sizes for same workloads. Overheads – cycles spent GC (vs. real work); space
    18. 18. Fragmentation, Generations Fragmentation – compact often, uniform sized objects [Finding seats for a gang-of-four is easier in an empty theater!] Face your fears, Face them Often! Generational Hypothesis Long-lived objects promote often, inter-generational pointers, more old-gen collections. Entropy: How many flags does it take to tune your GC ? ⇒ Avoid OOM, configure node death if OOM ⇒ Shameless plug: Azul’s Pauseless GC (now software edition) , Cooperative-Memory (swap space for your jvm under spike: No more OOM!)
    19. 19. Locks: Why Amdahl’s law trumps Moore’s! Schemes Optimistic, Pessimistic Consistency Eventually vs. ACID Contention, Waits java.util.concurrent, critical sections: Use Lock Striping MVCC, Lock-free, wait-free DataStructures. (NBHM) Transactions are expensive ⇒ Reduce JTA abuse, Set the right isolation levels.
    20. 20. Inter-node communication • TCP for mgmt & data – Infinispan • TCP for mgmt, UDP for data – Coherence, Infinispan • UDP for mgmt, TCP for data – Cassandra, Infinispan • Instrumentation – EHCache/Terracotta • Bandwidth & Latency considerations ⇒Ensure proper network configuration in the kernel ⇒Run Datagram tests ⇒Limit number of management nodes & nodes
    21. 21. Sockets, Queues, Threads, Stages How many sockets? gemstone (VMWare) : Multi-socket implementation, infinispan alt: increase ports, nodes, clients How many threads per socket? Mux Asynchronous IO/Events (apache mina, jgroups) Curious Case of a single threaded Queue Manager  Reduce context switching  SEDA
    22. 22. Marshal Arts: Serialization /Deserialization java.io.Serializable is S.L..O.…W + Use “transient” + jserial, avro, etc + Google Protocol Buffers, PortableObjectFormat (Coherence) + JBossMarshalling + Externalizable + byte[] + Roll your own
    23. 23. Serialization + Deserialization uBench http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
    24. 24. Count what is countable, measure what is measurable, and what is not measurable, make measurable -Galileo
    25. 25. Latency: Where have all the millis gone? Measure. 90th percentile. Look for consistency. => JMX is great! JMX is also very slow. Reduced number of nodes means less MBeans! Monitor (network, memory, cpu), ganglia, Know thyself: Application Footprint, Trend data.
    26. 26. Q&A References: Making Sense of Large Heaps, Nick Mitchell, IBM Oracle Coherence 3.5, Aleksandar Seovic Large Pages in Java http://andrigoss.blogspot.com/2008/02/jvm-performance-tuning.html Patterns of Doom http://3.latest.googtst23.appspot.com/ Infinispan Demos http://community.jboss.org/wiki/5minutetutorialonInfinispan RTView, Tom Lubinski, http://www.sl.com/pdfs/SL-BACSIG-100429-final.pdf Google Protocol Buffers, http://code.google.com/p/protobuf/ Azul’s Pauseless GC http://www.azulsystems.com/technology/zing-virtual-machine Cliff Click’s Non-Blocking Hash Map http://sourceforge.net/projects/high-scale-lib/ JVM Serialization Benchmarks: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
    27. 27. Optimization hinders evolution – Alan Perlis

    ×