Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
NUMA
&
Java Databases
Should we worry
Raghavendra Prabhu
me@rdprabhu.com
@randomsurfer
NUMA Reference architecture
What is NUMA
● Stands for Non Uniform Memory Access
○ Non Uniform to whom.
○ Von Neumann bottleneck.
○ Cache coherent NUMA...
What is NUMA
● Constraints
○ Speed of light
■ Higher latency of accessing remote memory.
○ Interconnect saturation
■ Perfo...
Exotic cases
● Network cards
● PCIe storage
● NVRAM
● Nodes without memory
● Nodes without processors
● Unbalanced
● Centr...
Numa statistics
Tools/libraries for NUMA
● Supported by Linux since 2.5
○ Symmetric and CPU/Memory
● Numactl
● Hwloc / lstopo
● Numad
● Nu...
Tools/libraries for NUMA
● KVM for simulation and testing
● Useful for testing databases.
qemu-system-x86_64 -enable-kvm -...
NUMA Policies
● MPOL_DEFAULT
● MPOL_BIND
● MPOL_INTERLEAVE
○ Memory striping in hardware
● MPOL_PREFERRED
● MPOL_MF_MOVE |...
JVM GC spaces
● Concepts
○ Weak Generational Hypothesis:
■ Most objects soon become unreachable.
■ References from old obj...
GC graphs
JVM GC spaces
● Generations:
○ Young Generation
■ Eden space
● Mutable Space.
● Thread Local Allocation Buffer.
● Mark and...
Garbage collectors
Located in hotspot/src/share/vm/gc_implementation
● Serial
● Parallel
○ Only GC which is fully NUMA awa...
GC Options
● UseNUMA
● UseNUMAInterleaving
● ForceNUMA
● NUMAStats
● ParallelGC only
○ NUMAChunkResizeWeight
○ NUMASpaceResizeRate
○ ...
NUMA and Collectors
● -XX:+UseNUMA -XX:+UseNUMAInterleaving: All GC spaces.
○ Independent of GC choices.
○ NUMA interleave...
Cassandra
● JVM options are supported through environment variable.
● Cassandra’s ‘supported’ NUMA is through numactl in s...
Cassandra off-heap
● Why off-heap
○ Reduce GC pressure
○ Access patterns
○ Lack of support for primitives such as O_DIRECT...
Cassandra off-heap
● Cache Providers:
○ SerializingCache
■ Issues with serialization and CPU usage.
○ OHCP - org.caffinita...
Numa issues
● Numactl --interleave:
○ Thread-local native allocations - Bad [X]
■ Tons of them throughout code which bypas...
Interpretation
● Low off-heap usage
○ Use the JVM NUMA options. Don’t interleave with numactl, it is a hammer.
● High off-...
Interpretation
● JVM is (still) not good with native primitives such as O_DIRECT or NUMA (there is a
jnuma which is not th...
Wishlist for cassandra
● Use whatever GC fits best. (G1?)
■ Ask for NUMA support in this.
● Use the JVM NUMA options when ...
AutoNUMA
● Introduced late in 4.x kernel
● CPU follows memory
○ Reschedule tasks on same nodes as memory
● Memory follows ...
Tunings and observables
● /proc/zoneinfo
○ Sysctl vm.zone_reclaim_mode OR /proc/sys/vm/zone_reclaim
○ /proc/sys/vm/min_unm...
Numa statistics
Further
● http://frankdenneman.nl/2016/07/07/numa-deep-dive-part-1-uma-numa/
● http://queue.acm.org/detail.cfm?id=2852078
...
Credits!
● http://queue.acm.org/detail.cfm?id=2513149
● www.linux-kvm.org/images/7/75/01x07b-NumaAutobalancing.pdf
● http:...
NUMA and Java Databases
NUMA and Java Databases
NUMA and Java Databases
Upcoming SlideShare
Loading in …5
×

NUMA and Java Databases

2,356 views

Published on

Talk given on state of NUMA with Java databases such as Cassandra and how it can improved / ameliorated, and compared with traditional storage engines.

Published in: Engineering
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

NUMA and Java Databases

  1. 1. NUMA & Java Databases Should we worry Raghavendra Prabhu me@rdprabhu.com @randomsurfer
  2. 2. NUMA Reference architecture
  3. 3. What is NUMA ● Stands for Non Uniform Memory Access ○ Non Uniform to whom. ○ Von Neumann bottleneck. ○ Cache coherent NUMA ● How does it work ○ Memory is placed local to the processes. ○ Balancing access to data over the available processors on multiple nodes. ● Large memory installations are becoming the norm ○ The i2 series on AWS. ○ Databases are the main consumers. ● Constraints ○ Speed of light ○ Interconnect saturation
  4. 4. What is NUMA ● Constraints ○ Speed of light ■ Higher latency of accessing remote memory. ○ Interconnect saturation ■ Performance counters. ● Slow abundant memory ○ Fast limited memory ● Cache coherence ○ Processor threads and cores share resources ■ Execution units (between HT threads) ■ Cache (between threads and cores)
  5. 5. Exotic cases ● Network cards ● PCIe storage ● NVRAM ● Nodes without memory ● Nodes without processors ● Unbalanced ● Central/Large memory ● Big Little architecture ● GPU
  6. 6. Numa statistics
  7. 7. Tools/libraries for NUMA ● Supported by Linux since 2.5 ○ Symmetric and CPU/Memory ● Numactl ● Hwloc / lstopo ● Numad ● Numatop ● Libnuma ● Numastat ● Taskset ● KVM for simulation and testing ● Perf
  8. 8. Tools/libraries for NUMA ● KVM for simulation and testing ● Useful for testing databases. qemu-system-x86_64 -enable-kvm -drive file=./debian-8.1-lxc-puppet.qcow2 -net nic,macaddr=52:54:00:00:EE:03 -net vde -smp sockets=2,cores=2,threads=2,maxcpus=16 -numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7 -numa node,nodeid=2,cpus=8-15 -m 2G
  9. 9. NUMA Policies ● MPOL_DEFAULT ● MPOL_BIND ● MPOL_INTERLEAVE ○ Memory striping in hardware ● MPOL_PREFERRED ● MPOL_MF_MOVE | MPOL_MF_MOVE_ALL
  10. 10. JVM GC spaces ● Concepts ○ Weak Generational Hypothesis: ■ Most objects soon become unreachable. ■ References from old objects to young objects only exist in small numbers. ■ The ones that do not usually survive for a (very) long time ○ Garbage Collection Roots ○ Mark & ■ Copy ■ Compact ■ Sweep ○ Minor and Major GC ○ Stop-the-World
  11. 11. GC graphs
  12. 12. JVM GC spaces ● Generations: ○ Young Generation ■ Eden space ● Mutable Space. ● Thread Local Allocation Buffer. ● Mark and Copy. ■ Survivor spaces (S0 and S1). ○ Old/Tenured Generation ○ Permanent Generation ■ => native MetaSpace in Java8 ● Cross-generation links. ● Card-marking
  13. 13. Garbage collectors Located in hotspot/src/share/vm/gc_implementation ● Serial ● Parallel ○ Only GC which is fully NUMA aware. ● ParNew ● Concurrent Mark and Sweep (CMS) ● Garbage First (G1) ● Official Oracle documentation is notoriously bad! ○ Code and comments are the (only) documentation (sadly). ■ Try searching for ‘NUMAPageScanRate’ - find a page from 2008 with links to sun.com and Solaris examples.
  14. 14. GC Options
  15. 15. ● UseNUMA ● UseNUMAInterleaving ● ForceNUMA ● NUMAStats ● ParallelGC only ○ NUMAChunkResizeWeight ○ NUMASpaceResizeRate ○ UseAdaptiveNUMAChunkSizing ○ NUMAPageScanRate Defined in hotspot/src/share/vm/runtime/globals.hpp and used in hotspot/src/os/linux/vm NUMA options
  16. 16. NUMA and Collectors ● -XX:+UseNUMA -XX:+UseNUMAInterleaving: All GC spaces. ○ Independent of GC choices. ○ NUMA interleaved allocation. (numactl --interleave) ● ParallelGC (in addition to above) ○ Supports all exotic NUMA options. ○ Eden mutableSpace (even without NUMA) ■ Pretouching the pages. ○ Eden mutableNUMASpace (with above NUMA options) ■ Space split into LG chunks. ● Adaptive Resizing. ■ Does thread-local NUMA allocation. ● allocations performed in chunk corresponding to the home locality.
  17. 17. Cassandra ● JVM options are supported through environment variable. ● Cassandra’s ‘supported’ NUMA is through numactl in shell wrapper. ○ This interleaves ‘everything’. ○ When you have numactl (hammer), everything looks like a (binary?) nail. ● Cassandra memory model ○ JVM GC spaces. ○ OHC - off heap cache: https://github.com/snazy/ohc ■ Written specifically for Cassandra 2.x ○ MemoryUtil.java ■ com.sun.jna.Native - Native.malloc ■ sun.nio.ch.DirectBuffer ■ sun.misc.Unsafe - unsafe.allocateMemory ■ java.nio.ByteBuffer - ByteBuffer.allocateDirect
  18. 18. Cassandra off-heap ● Why off-heap ○ Reduce GC pressure ○ Access patterns ○ Lack of support for primitives such as O_DIRECT. (https://bugs.openjdk.java.net/browse/JDK-8164900) ○ Lack of NUMA support in newer GCs. ■ ( JEP 157: G1 GC: NUMA-Aware Allocation http://openjdk.java.net/jeps/157) ● Off-heap caches are used for: ○ Row cache ○ Key cache ○ Counter cache ● 2.x onwards, actually better with 2.2.
  19. 19. Cassandra off-heap ● Cache Providers: ○ SerializingCache ■ Issues with serialization and CPU usage. ○ OHCP - org.caffinitas.ohc.OHCacheBuilder - 2.2 onwards ■ “OHC shall provide a good performance on both commodity hardware and big systems using non-uniform-memory-architectures.” ■ sun.misc.Unsafe: unsafe.allocateMemory ■ Linked: For Larger entries ● Malloc and fragmentation ■ Chunked: For smaller entries
  20. 20. Numa issues ● Numactl --interleave: ○ Thread-local native allocations - Bad [X] ■ Tons of them throughout code which bypass JVM. ○ JVM’s Eden space will also be interleaved - Bad [X] ● JVM’s options only: ○ Native allocations will be local. ○ Large off-heap allocations can suffer. ● Numactl + JVM ■ JVM-aware GC (Parallel) ● Best possible combination (without invasive code changes in cassandra). ● JVM’s memory options will override numactl. ● But, ParallelGC is not comparable to new ones (G1).
  21. 21. Interpretation ● Low off-heap usage ○ Use the JVM NUMA options. Don’t interleave with numactl, it is a hammer. ● High off-heap usage (like cassandra) ○ Just go with the flow, and do numactl. ■ -XX:+AlwaysPreTouch? (MAP_POPULATE) ○ Cost-benefit analysis. ● ParallelGC is too old (and bad for latency) - don’t use it just for NUMA. ○ Well-implemented NUMA can easily pique anyone’s geeky senses. :) ○ Ask Cassandra or Oracle to add NUMA support to G1 ;) ● In newer kernels (Xenial), one can try AutoNUMA. ○ Completely managed by kernel based on access patterns. ○ Has caveats but one can always benchmark and see. :)
  22. 22. Interpretation ● JVM is (still) not good with native primitives such as O_DIRECT or NUMA (there is a jnuma which is not that well maintained). ○ Many database authors write their own off-JVM implementations for these. (there are so many java databases these days) ○ Some also do things like this. ○ MySQL (InnoDB) can (and does) take advantage of these for good performance. ■ InnoDB was in Cassandra’s place about two years ago, till fixes landed. ● How InnoDB does it. ○ May be ScyllaDB in future. ;)
  23. 23. Wishlist for cassandra ● Use whatever GC fits best. (G1?) ■ Ask for NUMA support in this. ● Use the JVM NUMA options when supported. ■ Having NUMA support for Eden spaces will help a lot. ● Don’t use numactl. ○ Let all native allocations be local (OS default). ○ Use jnuma (or equivalent, it is just a JNI wrapper) for OHCP and other large non-local caches. ■ Use numa interleaving here. ■ This requires cassandra or OHCP code to be changed. ● Changing OHCP code is easier. ● Benchmark ○ ?? ○ Profit!
  24. 24. AutoNUMA ● Introduced late in 4.x kernel ● CPU follows memory ○ Reschedule tasks on same nodes as memory ● Memory follows CPU ○ Copy memory pages to same nodes as tasks/threads ● Heuristics ○ Fault statistics ○ Task grouping ○ Multi-resource optimization - cache, cpu, memory, starvation ■ Avoid thrashing
  25. 25. Tunings and observables ● /proc/zoneinfo ○ Sysctl vm.zone_reclaim_mode OR /proc/sys/vm/zone_reclaim ○ /proc/sys/vm/min_unmapped_ratio ● /proc/meminfo ● /proc/vmstat ● Ftrace / Perf ● Cgroup hierarchy ○ Memory ● Per process: ○ /proc/<pid>/numa_maps ○ /proc/<pid>/sched
  26. 26. Numa statistics
  27. 27. Further ● http://frankdenneman.nl/2016/07/07/numa-deep-dive-part-1-uma-numa/ ● http://queue.acm.org/detail.cfm?id=2852078 ● https://plumbr.eu/java-garbage-collection-handbook ● http://mechanical-sympathy.blogspot.in/2013/07/java-garbage-collection- distilled.html
  28. 28. Credits! ● http://queue.acm.org/detail.cfm?id=2513149 ● www.linux-kvm.org/images/7/75/01x07b-NumaAutobalancing.pdf ● http://events.linuxfoundation.org/sites/events/files/slides/Normal%20and %20Exotic%20use%20cases%20for%20NUMA%20features.pdf ● https://en.wikipedia.org/wiki/Non-uniform_memory_access ● https://lihz1990.gitbooks.io/transoflptg/content/02.%E7%9B%91%E6%8E %A7%E5%92%8C%E5%8E%8B%E6%B5%8B%E5%B7%A5%E5%85%B7/sam ple-output-of-the-numastat-command.png

×