SlideShare a Scribd company logo
1 of 10
GC-stall and Page Scan Attacks
by Linux
Cuong Tran
LinkedIn Performance Group
Agenda
• GC attacks by Linux
• Page scan attacks by Linux
• Recommendations
Examples of
GC attacks by Linux
• 2013-10-05T05:01:04.179+0000:…. : 216982K>9328K(256000K), 0.0666320 secs] 377835K-

>170188K(768000K), 0.0675850 secs] [Times:

user=0.17

sys=0.00, real=3.18 secs]
• 2013-09-19T06:14:03.632+0000: 44372.834: [GC [1 CMS-initial-mark:
703914K(921600K)] 718372K(1433600K), 126.1196340 secs] [Times:

user=0.00 sys=127.31, real=126.10 secs]
• GC stopped the world for minutes but:
– Did no real work (CPU time in user mode = 0)
– Burned cycles in Linux kernel
GC attacks by Linux
• IO starvation
– Symptom: GC log shows “low user time, low system
time, long GC pause”.
– Cause: GC threads stuck in kernel waiting for
IO, usually due to journal commits or FS flush of
changes by gzip of log rolling

• Memory starvation.
– Symptom: GC log shows “Low user time, high system
time, long GC pause”
– Cause: Memory pressure triggers swapping or
scanning for free memory

4
Solutions for GC-attacks
• IO Starvation
– Strategy: Even out workload to disk drives (flush every 5 s rather
than 30 s)
sysctl –w vm.dirty_writeback_centisecs = 500
sysctl –w vm.dirty_expire_centisecs = 500

– In progress: Direct IO with gzip or gzip as-you-go

• Memory Starvation
– Strategy: Pre-allocate memory to JVM heap and protect it
against swapping or scanning
– Turn on –XX:+AlwaysPreTouch option in JVM
– Sysctl –w vm.swappiness=0 to protect heap and
anonymous memory
– JVM start up has 2 second delay to allocate all memory (17GB)

5
Page scan attacks by Linux
Measured: 7,000,000 scans/sec
Stall: 2+ minutes
Goal: 0 scans/sec

6
Cause : Page Scan Attacks

Transparent Huge Page (THP)
• A Redhat enhancement for performance
–
–
–
–

2MB huge pages vs. 4KB regular pages
Less TLB miss and page table walk
Only work for anonymous memory (malloc)
Improve 10% performance for SPECjbb, app server workload

• But THP can degrade performance severely
– Collapsing, Compacting, Splitting, Migration
– Very high pgscand/s
– Very busy khugepaged
– Very high system time when process compacts memory or
khugepaged runs

• THP optimization can increase GC stall time by minutes
Cause : Page Scan Attacks

NUMA Optimization
• A Linux optimization for NUMA
– 2 CPU sockets, each having 12 cores and local memory.
– Memory accessible by all 24 cores but local memory is faster
– Linux tries to allocate local memory to application
threads, i.e., from local zone
– Best suited for applications that can fit in one local zone

• NUMA optimization can degrade performance severely
– Very high pgscand/s
– Linux zone-reclaim insists on finding memory on local
zone although memory is plentiful on the other zone
– Linux migrates memory including THP, creating a viscous cycle of
breaking up 2 MB pages, scanning for 4 KB free pages, and reassembling 4KB into 2 MB pages
Cause : Page Scan Attacks

Solutions
• Turn off THP optimization and thus

khugepaged
– echo never >
/sys/kernel/mm/redhat_transparent_hugepa
ge/enabled

– Will not affect file-IO or memory mapped files
– Redhat, Oracle, Hadoop recommends no THP

• Turn off zone-reclaim optimization
– sysctl –w vm.zone_reclaim_mode=0

– Twitter recommends NUMA interleaving
9
Recommendations
• Gate keepers: SRE and SysOps
• Safe to roll-out fixes for GC attacks now
– Linux: Flush changes more frequently and protect heap
• sysctl –w vm.dirty_writeback_centisecs = 500
• sysctl –w vm.dirty_expire_centisecs = 500

• sysctl –w vm.swappiness=0
– JVM: Give JVM heap all memory it needs when started
• –XX:+AlwaysPreTouch
• Heap size per AutoTune

• Gradual roll-out fixes of page scan attacks.
– Best for back-end servers
– Linux: Turn off THP and NUMA optimization
• echo never >
/sys/kernel/mm/redhat_transparent_hugepage/enabled
• sysctl –w vm.zone_reclaim_mode = 0

– Work with product groups to test on small group of servers before
applying changes to the rest

More Related Content

What's hot

Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
HostedbyConfluent
 
Concurrency Patterns with MongoDB
Concurrency Patterns with MongoDBConcurrency Patterns with MongoDB
Concurrency Patterns with MongoDB
Yann Cluchey
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 

What's hot (20)

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKDoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDK
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
Highly Available Kafka Consumers and Kafka Streams on Kubernetes with Adrian ...
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Concurrency Patterns with MongoDB
Concurrency Patterns with MongoDBConcurrency Patterns with MongoDB
Concurrency Patterns with MongoDB
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
Developing MIPS Exploits to Hack Routers
Developing MIPS Exploits to Hack RoutersDeveloping MIPS Exploits to Hack Routers
Developing MIPS Exploits to Hack Routers
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Trace memory leak with gdb (GDB로 메모리 누수 찾기)
Trace memory  leak with gdb (GDB로 메모리 누수 찾기)Trace memory  leak with gdb (GDB로 메모리 누수 찾기)
Trace memory leak with gdb (GDB로 메모리 누수 찾기)
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
HOT Understanding this important update optimization
HOT Understanding this important update optimizationHOT Understanding this important update optimization
HOT Understanding this important update optimization
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 

Viewers also liked

Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
Vinoth Chandar
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
The Linux Foundation
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-en
Andri Yabu
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
sprdd
 

Viewers also liked (20)

Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
2015 08-20-criu support-in_docker_for_native_checkpoint_and_restore
 
OS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutionsOS caused Large JVM pauses: Deep dive and solutions
OS caused Large JVM pauses: Deep dive and solutions
 
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsSpecification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
 
Dulloor xen-summit
Dulloor xen-summitDulloor xen-summit
Dulloor xen-summit
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-en
 
BKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingBKK16-404A PCI Development Meeting
BKK16-404A PCI Development Meeting
 
Virtualization overheads
Virtualization overheadsVirtualization overheads
Virtualization overheads
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
 
Linux numa evolution
Linux numa evolutionLinux numa evolution
Linux numa evolution
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
 

Similar to Gc and-pagescan-attacks-by-linux

Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
Tier1 App
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutions
Jean-Philippe BEMPEL
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
Miklos Szel
 

Similar to Gc and-pagescan-attacks-by-linux (20)

Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Low latency & mechanical sympathy issues and solutions
Low latency & mechanical sympathy  issues and solutionsLow latency & mechanical sympathy  issues and solutions
Low latency & mechanical sympathy issues and solutions
 
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) CollectorJava Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbage
 
How (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructureHow (not) to kill your MySQL infrastructure
How (not) to kill your MySQL infrastructure
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
 
Become a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceBecome a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo Conference
 
G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?G1 Garbage Collector - Big Heaps and Low Pauses?
G1 Garbage Collector - Big Heaps and Low Pauses?
 
Resolving Firebird performance problems
Resolving Firebird performance problemsResolving Firebird performance problems
Resolving Firebird performance problems
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection Hero
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common Patterns
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)
 
OpenDS_Jazoon2010
OpenDS_Jazoon2010OpenDS_Jazoon2010
OpenDS_Jazoon2010
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issues
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Gc and-pagescan-attacks-by-linux

  • 1. GC-stall and Page Scan Attacks by Linux Cuong Tran LinkedIn Performance Group
  • 2. Agenda • GC attacks by Linux • Page scan attacks by Linux • Recommendations
  • 3. Examples of GC attacks by Linux • 2013-10-05T05:01:04.179+0000:…. : 216982K>9328K(256000K), 0.0666320 secs] 377835K- >170188K(768000K), 0.0675850 secs] [Times: user=0.17 sys=0.00, real=3.18 secs] • 2013-09-19T06:14:03.632+0000: 44372.834: [GC [1 CMS-initial-mark: 703914K(921600K)] 718372K(1433600K), 126.1196340 secs] [Times: user=0.00 sys=127.31, real=126.10 secs] • GC stopped the world for minutes but: – Did no real work (CPU time in user mode = 0) – Burned cycles in Linux kernel
  • 4. GC attacks by Linux • IO starvation – Symptom: GC log shows “low user time, low system time, long GC pause”. – Cause: GC threads stuck in kernel waiting for IO, usually due to journal commits or FS flush of changes by gzip of log rolling • Memory starvation. – Symptom: GC log shows “Low user time, high system time, long GC pause” – Cause: Memory pressure triggers swapping or scanning for free memory 4
  • 5. Solutions for GC-attacks • IO Starvation – Strategy: Even out workload to disk drives (flush every 5 s rather than 30 s) sysctl –w vm.dirty_writeback_centisecs = 500 sysctl –w vm.dirty_expire_centisecs = 500 – In progress: Direct IO with gzip or gzip as-you-go • Memory Starvation – Strategy: Pre-allocate memory to JVM heap and protect it against swapping or scanning – Turn on –XX:+AlwaysPreTouch option in JVM – Sysctl –w vm.swappiness=0 to protect heap and anonymous memory – JVM start up has 2 second delay to allocate all memory (17GB) 5
  • 6. Page scan attacks by Linux Measured: 7,000,000 scans/sec Stall: 2+ minutes Goal: 0 scans/sec 6
  • 7. Cause : Page Scan Attacks Transparent Huge Page (THP) • A Redhat enhancement for performance – – – – 2MB huge pages vs. 4KB regular pages Less TLB miss and page table walk Only work for anonymous memory (malloc) Improve 10% performance for SPECjbb, app server workload • But THP can degrade performance severely – Collapsing, Compacting, Splitting, Migration – Very high pgscand/s – Very busy khugepaged – Very high system time when process compacts memory or khugepaged runs • THP optimization can increase GC stall time by minutes
  • 8. Cause : Page Scan Attacks NUMA Optimization • A Linux optimization for NUMA – 2 CPU sockets, each having 12 cores and local memory. – Memory accessible by all 24 cores but local memory is faster – Linux tries to allocate local memory to application threads, i.e., from local zone – Best suited for applications that can fit in one local zone • NUMA optimization can degrade performance severely – Very high pgscand/s – Linux zone-reclaim insists on finding memory on local zone although memory is plentiful on the other zone – Linux migrates memory including THP, creating a viscous cycle of breaking up 2 MB pages, scanning for 4 KB free pages, and reassembling 4KB into 2 MB pages
  • 9. Cause : Page Scan Attacks Solutions • Turn off THP optimization and thus khugepaged – echo never > /sys/kernel/mm/redhat_transparent_hugepa ge/enabled – Will not affect file-IO or memory mapped files – Redhat, Oracle, Hadoop recommends no THP • Turn off zone-reclaim optimization – sysctl –w vm.zone_reclaim_mode=0 – Twitter recommends NUMA interleaving 9
  • 10. Recommendations • Gate keepers: SRE and SysOps • Safe to roll-out fixes for GC attacks now – Linux: Flush changes more frequently and protect heap • sysctl –w vm.dirty_writeback_centisecs = 500 • sysctl –w vm.dirty_expire_centisecs = 500 • sysctl –w vm.swappiness=0 – JVM: Give JVM heap all memory it needs when started • –XX:+AlwaysPreTouch • Heap size per AutoTune • Gradual roll-out fixes of page scan attacks. – Best for back-end servers – Linux: Turn off THP and NUMA optimization • echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled • sysctl –w vm.zone_reclaim_mode = 0 – Work with product groups to test on small group of servers before applying changes to the rest