SlideShare a Scribd company logo
Linux Huge Pages
Why? How? When?
1
• What are you talking about?
• Linux kernel map
• Memory Allocation
• Paging Model
• Page Fault
• Swapping
• Why Huge Pages
• How to configure
• When to configure
• Summary
2
Agenda
• This is mainly about X86-64
(Intel and AMD CPUs produced after 2004)
• There are some differences on huge pages among different
hardware architectures that are out of our scope
• We will not explore MMU, TLB and all the internals of virtual memory
management
• Some images are outdated
(e.g.: Linux kernel 2.6 while current version is 5.5)
but it illustrates very well the aspects discussed in this presentation
3
Premises
4
What are you talking about?
5
This is the Linux
kernel map on
version 2.6.36
While it is dated
by 10 years, it
gives us the big
picture
6
Memory Allocation
.
.
.
.
.
.
7
Paging Model
8
Page Fault
9
Swapping
• As we can see, memory management is complicated process
involving many ‘round-trips’
• Huge pages is about allocating larger blocks of memory at once
Thus, cutting the ‘round-trips’ associated with small pages
• Huge Pages cannot be swapped out
• A set of 4 KB pages can turn into a single 2 MB (with PAE), 4 MB or
even 1 GB
10
Why Huge Pages
Number of Pages (4 KB) Number of Huge Pages Huge Page Equivalence
512 1 2 MB (2048 KB)
1024 1 4 MB (4096 KB)
262.144 1 1 GB (1024 MB or 1.048.576 KB)
• There are 2 huge page variants
• HugeTLB File System
• Works as a pseudo filesystem where you need to manually define the allocation
• We will use this approach
• Transparent Huge Pages
• Works transparently – Linux kernel will decide on its own if the application requires or
not huge pages but it is not recommended for latency sensitive applications
11
Why Huge Pages
• Checking if it is possible to enable huge pages
12
How to Configure
netto@bella:~$ getconf PAGESIZE
4096
netto@bella:~$ cat /proc/cpuinfo | grep 'pse|pdpe' | tail -1
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch CPUid_fault
epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2
smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln
pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
getconf returns the standard page size
for a given CPU architecture in bytes
/proc/cpuinfo contains all data related
to CPU
pse => supports huge page of 2MB
Pdpe1gb => supports huge page of 1GB
• Installing the required packages to configure huge pages as root
• WARNING: your distribution might require a slightly different setup
(e.g.: different package manager/names, less steps)
13
How to Configure
Red Hat / CentOS Debian / Ubuntu
root@bella:~$ yum -y install libhugetlbfs libhugetlbfs-utils root@bella:~$ apt-get -y install hugepages
• In the following case, we can select which huge page size is more
convenient for your application
14
How to Configure
# this is the pseudo directory where huge pages will be mapped, it needs to be an existing directory
# RedHat configuration differs a little
root@bella:~$ mkdir –p /dev/hugepages
# this can be converted to a /etc/fstab entry
root@bella:~$ mount -t hugetlbfs -o gid=<group id>, pagesize=<2M or 1G>,... none /dev/hugepages
# formula: (2 MB / 4 KB) or (1 GB / 4 KB) * size required for your scenario
# there are situations like Oracle DB where it is recommended to allocate huge pages only for SGA
vm.nr_hugepages = <number of pages>
# the same group gid on mount that must be associated with the group where your application is running
vm.hugetlb_shm_group = <group id>
• Add to sysctl.conf
• Reboot
# if huge pages are correctly setup, at least one pool will be displayed
netto@bella:~$ hugeadm --pool-list
Size Minimum Current Maximum Default
2097152 0 0 0 *
1073741824 1 1 1
# hugepages enabled if HugePages_Total is > 0
netto@bella:~$ cat /proc/meminfo
...
HugePages_Total: <huge pages pool size>
HugePages_Free: <number of huge pages that are not allocated>
HugePages_Rsvd: <number of huge pages that are reserved but not allocated>
HugePages_Surp: <maximum number of huge pages>
Hugepagesize: 2048 kB
Hugetlb: 1048576 kB
DirectMap4k: 572400 kB
DirectMap2M: 12943360 kB
DirectMap1G: 19922944 kB 15
How to Configure
16
How to Configure
Application Where Syntax
Oracle JDK/OpenJDK Command line argument –XX:+UseLargePages
MySQL my.cnf, inside the block [mysqld] large_pages=ON
PHP php.ini, opcache block opcache.huge_code_pages 1
Python Using mmap module MADV_HUGEPAGE
PostgreSQL postgresql.conf huge_pages=ON
Docker Command line argument --device=/dev/hugepages:/dev/hugepages
17
When to Configure
Advantages Disadvantages
Huge Pages can reduce pressure on TLB/MMU
Internal and external memory fragmentation will be
potentialized if not configured properly
Huge Pages are not swappable
“Swappability” avoids quick memory starvation imposing
some performance cost
Any data-intensive application that properly use mmap(),
madvise(), shmget(), shmat() and some other calls can
benefit from it
It’s a POSIX extension, other Unix like Solaris, FreeBSD and
even Windows have similar feature with a totally different
setup
Any memory-bound application can benefit from it
NUMA (non uniform memory access) systems may not
have all the benefits from an UMA system
(hardware with uniform/unified memory management)
When latency/response time is critical
Transparent Huge Pages is not recommended in general
(has very specific use cases)
• Many other advantages and disadvantages can come up but most importantly: test!
• It might be required to increase memory allocation on /etc/security/limits.conf
• Operating System Concepts
Silberschatz, Gagne, Galvin
John Wiley & Sons
• Understanding Linux Kernel
Daniel Bovet, Marco Cesati
O'Reilly Media; 3rd edition
• Professional Linux Kernel Architecture
Wolfgang Mauerer
Wrox Press
• Low level programming
Igor Zhirkov
Apress
• Systems Performance – enterprise and the cloud
Brendan Gregg
Prentice Hall
18
References
• Configuring huge pages for your PostgreSQL instance, Debian version
• Performance Tuning: HugePages In Linux
• KVM - Using Hugepages
• LinuxMM: HugePages
• Configuring HugePages for Oracle on Linux (x86-64)
• How to enable huge page support in a Dockerfile
• ZGC
• PostgreSQL and Hugepages: Working with an abundance of memory in
modern servers
• How to configure HugePage using hugeadm (RHEL/CentOS 7)
• RedHat 7 Documentation: Configuring HugeTLB HUGE PAGES
19
References
• PHP 7 - runtime configuration
• PostgreSQL 9.4 Resource Consumption
• Python mmap module
• 7 easy steps to configure HugePages for your Oracle Database Server
• Redis latency problems troubleshooting
• Wikipedia: Linux Kernel
• Interactive map of Linux Kernel
• Huge pages part 1 (Introduction)
• Huge pages part 2: Interfaces
• Huge pages part 3: Administration
• Memory part 3: Virtual Memory
20
References
21
Thank you!
Geraldo Netto
geraldo.netto@gmail.com

More Related Content

What's hot

malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
Adrian Huang
 
Linux Memory
Linux MemoryLinux Memory
Linux Memory
Vitaly Nahshunov
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
Adrian Huang
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
shimosawa
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
SHAJANA BASHEER
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
Vadim Nikitin
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
Pankaj Suryawanshi
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Anne Nicolas
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
Adrian Huang
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
Teja Bheemanapally
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Adrian Huang
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdf
ycelgemici1
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
Kamal Maiti
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
Adrian Huang
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
Adrian Huang
 
Linux device drivers
Linux device drivers Linux device drivers
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
Macpaul Lin
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
Adrian Huang
 
linux device driver
linux device driverlinux device driver
linux device driver
Rahul Batra
 

What's hot (20)

malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
Linux Memory
Linux MemoryLinux Memory
Linux Memory
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdf
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
 
Linux device drivers
Linux device drivers Linux device drivers
Linux device drivers
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
linux device driver
linux device driverlinux device driver
linux device driver
 

Similar to Linux Huge Pages

PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Equnix Business Solutions
 
MySQL Oslayer performace optimization
MySQL  Oslayer performace optimizationMySQL  Oslayer performace optimization
MySQL Oslayer performace optimization
Louis liu
 
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelThe Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux Kernel
Yasunori Goto
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
Great Wide Open
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
solarisyougood
 
os
osos
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
Joao Galdino Mello de Souza
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
Colin Charles
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld
 
Time For D.I.M.E?
Time For D.I.M.E?Time For D.I.M.E?
Time For D.I.M.E?
Martin Packer
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
PyData
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
Mike Pittaro
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
Szymon Haly
 
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)
Pekka Männistö
 
Comparison of foss distributed storage
Comparison of foss distributed storageComparison of foss distributed storage
Comparison of foss distributed storage
Marian Marinov
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
xKinAnx
 
Time For DIME
Time For DIMETime For DIME
Time For DIME
Martin Packer
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
RedWireServices
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 

Similar to Linux Huge Pages (20)

PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
 
MySQL Oslayer performace optimization
MySQL  Oslayer performace optimizationMySQL  Oslayer performace optimization
MySQL Oslayer performace optimization
 
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelThe Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux Kernel
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
os
osos
os
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
 
Time For D.I.M.E?
Time For D.I.M.E?Time For D.I.M.E?
Time For D.I.M.E?
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
 
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)
 
Comparison of foss distributed storage
Comparison of foss distributed storageComparison of foss distributed storage
Comparison of foss distributed storage
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
Time For DIME
Time For DIMETime For DIME
Time For DIME
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 

Recently uploaded

Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 

Recently uploaded (20)

Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 

Linux Huge Pages

  • 1. Linux Huge Pages Why? How? When? 1
  • 2. • What are you talking about? • Linux kernel map • Memory Allocation • Paging Model • Page Fault • Swapping • Why Huge Pages • How to configure • When to configure • Summary 2 Agenda
  • 3. • This is mainly about X86-64 (Intel and AMD CPUs produced after 2004) • There are some differences on huge pages among different hardware architectures that are out of our scope • We will not explore MMU, TLB and all the internals of virtual memory management • Some images are outdated (e.g.: Linux kernel 2.6 while current version is 5.5) but it illustrates very well the aspects discussed in this presentation 3 Premises
  • 4. 4 What are you talking about?
  • 5. 5 This is the Linux kernel map on version 2.6.36 While it is dated by 10 years, it gives us the big picture
  • 10. • As we can see, memory management is complicated process involving many ‘round-trips’ • Huge pages is about allocating larger blocks of memory at once Thus, cutting the ‘round-trips’ associated with small pages • Huge Pages cannot be swapped out • A set of 4 KB pages can turn into a single 2 MB (with PAE), 4 MB or even 1 GB 10 Why Huge Pages Number of Pages (4 KB) Number of Huge Pages Huge Page Equivalence 512 1 2 MB (2048 KB) 1024 1 4 MB (4096 KB) 262.144 1 1 GB (1024 MB or 1.048.576 KB)
  • 11. • There are 2 huge page variants • HugeTLB File System • Works as a pseudo filesystem where you need to manually define the allocation • We will use this approach • Transparent Huge Pages • Works transparently – Linux kernel will decide on its own if the application requires or not huge pages but it is not recommended for latency sensitive applications 11 Why Huge Pages
  • 12. • Checking if it is possible to enable huge pages 12 How to Configure netto@bella:~$ getconf PAGESIZE 4096 netto@bella:~$ cat /proc/cpuinfo | grep 'pse|pdpe' | tail -1 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch CPUid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d getconf returns the standard page size for a given CPU architecture in bytes /proc/cpuinfo contains all data related to CPU pse => supports huge page of 2MB Pdpe1gb => supports huge page of 1GB
  • 13. • Installing the required packages to configure huge pages as root • WARNING: your distribution might require a slightly different setup (e.g.: different package manager/names, less steps) 13 How to Configure Red Hat / CentOS Debian / Ubuntu root@bella:~$ yum -y install libhugetlbfs libhugetlbfs-utils root@bella:~$ apt-get -y install hugepages
  • 14. • In the following case, we can select which huge page size is more convenient for your application 14 How to Configure # this is the pseudo directory where huge pages will be mapped, it needs to be an existing directory # RedHat configuration differs a little root@bella:~$ mkdir –p /dev/hugepages # this can be converted to a /etc/fstab entry root@bella:~$ mount -t hugetlbfs -o gid=<group id>, pagesize=<2M or 1G>,... none /dev/hugepages # formula: (2 MB / 4 KB) or (1 GB / 4 KB) * size required for your scenario # there are situations like Oracle DB where it is recommended to allocate huge pages only for SGA vm.nr_hugepages = <number of pages> # the same group gid on mount that must be associated with the group where your application is running vm.hugetlb_shm_group = <group id> • Add to sysctl.conf • Reboot
  • 15. # if huge pages are correctly setup, at least one pool will be displayed netto@bella:~$ hugeadm --pool-list Size Minimum Current Maximum Default 2097152 0 0 0 * 1073741824 1 1 1 # hugepages enabled if HugePages_Total is > 0 netto@bella:~$ cat /proc/meminfo ... HugePages_Total: <huge pages pool size> HugePages_Free: <number of huge pages that are not allocated> HugePages_Rsvd: <number of huge pages that are reserved but not allocated> HugePages_Surp: <maximum number of huge pages> Hugepagesize: 2048 kB Hugetlb: 1048576 kB DirectMap4k: 572400 kB DirectMap2M: 12943360 kB DirectMap1G: 19922944 kB 15 How to Configure
  • 16. 16 How to Configure Application Where Syntax Oracle JDK/OpenJDK Command line argument –XX:+UseLargePages MySQL my.cnf, inside the block [mysqld] large_pages=ON PHP php.ini, opcache block opcache.huge_code_pages 1 Python Using mmap module MADV_HUGEPAGE PostgreSQL postgresql.conf huge_pages=ON Docker Command line argument --device=/dev/hugepages:/dev/hugepages
  • 17. 17 When to Configure Advantages Disadvantages Huge Pages can reduce pressure on TLB/MMU Internal and external memory fragmentation will be potentialized if not configured properly Huge Pages are not swappable “Swappability” avoids quick memory starvation imposing some performance cost Any data-intensive application that properly use mmap(), madvise(), shmget(), shmat() and some other calls can benefit from it It’s a POSIX extension, other Unix like Solaris, FreeBSD and even Windows have similar feature with a totally different setup Any memory-bound application can benefit from it NUMA (non uniform memory access) systems may not have all the benefits from an UMA system (hardware with uniform/unified memory management) When latency/response time is critical Transparent Huge Pages is not recommended in general (has very specific use cases) • Many other advantages and disadvantages can come up but most importantly: test! • It might be required to increase memory allocation on /etc/security/limits.conf
  • 18. • Operating System Concepts Silberschatz, Gagne, Galvin John Wiley & Sons • Understanding Linux Kernel Daniel Bovet, Marco Cesati O'Reilly Media; 3rd edition • Professional Linux Kernel Architecture Wolfgang Mauerer Wrox Press • Low level programming Igor Zhirkov Apress • Systems Performance – enterprise and the cloud Brendan Gregg Prentice Hall 18 References
  • 19. • Configuring huge pages for your PostgreSQL instance, Debian version • Performance Tuning: HugePages In Linux • KVM - Using Hugepages • LinuxMM: HugePages • Configuring HugePages for Oracle on Linux (x86-64) • How to enable huge page support in a Dockerfile • ZGC • PostgreSQL and Hugepages: Working with an abundance of memory in modern servers • How to configure HugePage using hugeadm (RHEL/CentOS 7) • RedHat 7 Documentation: Configuring HugeTLB HUGE PAGES 19 References
  • 20. • PHP 7 - runtime configuration • PostgreSQL 9.4 Resource Consumption • Python mmap module • 7 easy steps to configure HugePages for your Oracle Database Server • Redis latency problems troubleshooting • Wikipedia: Linux Kernel • Interactive map of Linux Kernel • Huge pages part 1 (Introduction) • Huge pages part 2: Interfaces • Huge pages part 3: Administration • Memory part 3: Virtual Memory 20 References