SlideShare a Scribd company logo
Tim Vaillancourt
Sr. Technical Operations Architect
Tuning Linux for MongoDB
About Me
• Joined Percona in January 2016
• Sr Technical Operations Architect for MongoDB
• Previous:
• EA DICE (MySQL DBA)
• EA SPORTS (Sys/NoSQL DBA Ops)
• Amazon/AbeBooks Inc (Sys/MySQL+NoSQL DBA Ops)
• Main techs: MySQL, MongoDB, Cassandra, Solr, Redis, queues, etc
• 10+ years tuning Linux for database workloads (off and on)
• Not a kernel-guy, learned from breaking things
Linux
• UNIX-like, mostly POSIX-compliant operating system
• First released on September 17th, 1991 by Linus Torvalds
• 50Mhz CPUs were considered fast
• CPUs had 1 core
• RAM was measured in megabytes
• Ethernet speed was 1 - 10mbps
• General purpose
• It will run on a Raspberry Pi -> Mainframes
• Geared towards many different users and use cases
• Linux 3.2+ is much more efficient
MongoDB
• Document-oriented database first released in 2009
• Thread per connection model
• Non-contiguous memory access pattern
• Storage Engines
• MMAPv1
• Calls ‘mmap()’ to map on-disk data to RAM
• Keeps warm data in Linux filesystem cache
• Highly random I/O pattern
• Scales with RAM and Disk only**
• Cache uses all the RAM it can get
MongoDB
• Storage Engines
• WiredTiger and RocksDB
• Built-in Compression
• Uses combination of in-heap cache and filesystem cache
• In-heap cache: uncompressed pages
• Filesystem cache: compressed pages
• Relatively sequential write patterns, low write overhead
• Scales with RAM, Disk and CPUs
Ulimit
• Allows per-Linux-user resource
constraints
• Number of User-level Processes
• Number of Open Files
• CPU Seconds
• Scheduling Priority
• Others…
• MongoDB
• Should probably have it’s own VM,
container or server
• Creates a process for each connection
Ulimit
• MongoDB (continued)
• Creates an open file for each active data file on disk
• 64,000 open files and 64,000 max processes is a good start
• Read current ulimit: “ulimit -a” (run as mongo user)
• Set ulimit for mongo user in ‘/etc/security/limits.d/‘ or in
‘/etc/security/limits.conf’:
• Restart mongod/mongos after the ulimit change to apply it
Virtual Memory: Dirty Ratio
• Dirty Pages
• Pages stored in-cache, but needs to be written to storage
• VM Dirty Ratio
• Max percent of total memory that can be dirty
• VM stalls and flushes
when this limit is reached
• Start with ’10’, default (30) too high
• VM Dirty Background Ratio
• Separate threshold for
background dirty page flushing
• Flushes without pauses
• Start with ‘3’, default (15) too high
Virtual Memory: Swappiness
• A Linux kernel sysctl setting for preferring
RAM or disk for swap
• Linux default: 60
• To avoid disk-based swap: 1 (not zero!)
• To allow some disk-based swap: 10
• ‘0’ can cause unpredicted behaviour
Virtual Memory: Transparent HugePages
• Introduced in RHEL/CentOS 6, Linux 2.6.38+
• Merges 4kb pages into 2mb HugePages (512x) in background
(Khugepaged process)
• Decreases overall performance when used with MongoDB!
• Disable it
• Add “transparent_hugepage=never” to kernel command-line (GRUB)
• Reboot
NUMA (Non-Uniform Memory Access)
• A memory architecture that takes into
account the locality of memory, caches and
CPUs for lower latency
• MongoDB code base is not NUMA “aware”,
causing unbalanced allocations
• Disable NUMA
• In the server BIOS
• Using ‘numactl’ in mongod init script
BEFORE ‘mongod’ command:
numactl --interleave=all /usr/bin/mongod <other flags>
Block Devices: Type and Layout
• Isolation
• Run Mongod dbPaths on separate volume
• Optionally, run Mongod journal on separate volume
• RAID Level
• RAID 10 == performance/durability sweet spot
• RAID 0 == fast and dangerous
• SSDs
• Benefit MMAPv1 a lot
• Benefit WT and RocksDB a bit less
• Keep about 30% free for internal GC on the SSD
• EBS
• Network-attached can be risky
• JBOD + Replset as Data Redundancy (use at own risk)
• Number of Replset Members
• Read and Write Concern
• Proper Geolocation/Node Redundancy
Block Devices: IO Scheduler
• Algorithm kernel uses to commit reads and
writes to disk
• CFQ
• Linux default
• Perhaps too clever/inefficient for database
workloads
• Deadline
• Best general default IMHO
• Predictable I/O request latencies
• Noop
• Use with virtualisation or (sometimes) with
BBU RAID controllers
Block Devices: Block Read-ahead
• Tuning that causes data ahead of a block on
disk to be read and then cached
• Assumption: there is a sequential read
pattern and something will benefit from the
extra cached blocks
• Risk: too high waste cache space and
increases eviction work
• MongoDB tends to have very random disk
patterns
• A good start for MongoDB volumes is a ’32’
(16kb) read-ahead
Block Devices: Udev rule
/etc/udev/rules.d/60-mongodb-disk.rules:
# set deadline scheduler and 32/16kb read-ahead for /dev/sda
ACTION=="add|change", KERNEL=="sda", ATTR{queue/scheduler}="deadline", ATTR{bdi/read_ahead_kb}="16"
• Add file to ‘/etc/udev/rules.d’
• Reboot (or use CLI tools to apply)
Filesystems and Options
• Use XFS or EXT4, not EXT3
• Use XFS only on WiredTiger
• Set ‘noatime’ on MongoDB data volumes in ‘/etc/fstab’:
• Remount the filesystem after an options change, or reboot
Network Stack
• Defaults are not good for > 100mbps Ethernet
• Suggested starting point (add to ‘/etc/sysctl.conf’):
• Run “sysctl -p” as root to reload Network Stack settings
NTPd (Network Time Protocol)
• Replication and Clustering needs consistent
clocks
• Run NTP daemon on all MongoDB and
Monitoring hosts
• Enable on restart
• Use a consistent time source/server
SELinux (Security-Enhanced Linux)
• A kernel-level security access control module
• Modes of SELinux
• Enforcing: Block and log policy violations
• Permissive: Log policy violations only
• Disabled: Completely disabled
• Recommended: Enforcing
• Percona Server for MongoDB 3.2+ RPMs
install an SELinux policy on RedHat/CentOS!
• A “framework” for applying
tunings to Linux
• RedHat/CentOS 7
• Debian added it, not sure on
official status
• Watch my/Percona-Lab GitHub
for profiles in the future!
Tuned
CPUs and Frequency Scaling
• Lots of cores > faster cores
• ‘cpufreq’: a daemon for dynamic scaling of the CPU frequency
• Terrible idea for databases
• Disable or set governor to 100% frequency always, i.e mode: ‘performance’
• Disable any BIOS-level performance/efficiency tuneable
• ENERGY_PERF_BIAS
• A CentOS/RedHat tuning for energy vs performance balance
• RHEL 6 = ‘performance’
• RHEL 7 = ‘normal’ (!)
• Advice: use ‘tuned’ to set to ‘performance’
Monitoring: Percona PMM
• Open-source
monitoring suite
from Percona!
• MongoDB
visualisations by
cluster, shard,
replset, engine, etc
• DB stats groupings
with OS metrics
• Simple deployment
Monitoring: Prometheus + Grafana
• PerconaLab GitHub Repositories
• grafana_mongodb_dashboards
• prometheus_mongodb_exporter
Links
• https://www.percona.com/blog/2016/08/12/tuning-linux-for-mongodb/
• https://docs.mongodb.com/manual/administration/production-notes/
• http://www.brendangregg.com/linuxperf.html ==>
• https://www.percona.com/doc/percona-monitoring-and-management/index.html
• https://github.com/Percona-Lab/grafana_mongodb_dashboards
• https://github.com/Percona-Lab/prometheus_mongodb_exporter
• https://www.percona.com/blog/2014/04/28/oom-relation-vm-swappiness0-new-kernel/
Questions?
DATABASE PERFORMANCE
MATTERS

More Related Content

What's hot

FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기
Jongwon Kim
 
Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)
Anastasia Lubennikova
 
Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )
Mari Kupatadze
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Node.js and the MySQL Document Store
Node.js and the MySQL Document StoreNode.js and the MySQL Document Store
Node.js and the MySQL Document Store
Rui Quelhas
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdso
Viller Hsiao
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
Taro L. Saito
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
MongoDB
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
Ontico
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
MyDUMPER : Faster logical backups and restores
MyDUMPER : Faster logical backups and restores MyDUMPER : Faster logical backups and restores
MyDUMPER : Faster logical backups and restores
Mydbops
 
14- Tumbling Window Trigger dependency in Azure Data Factory.pptx
14- Tumbling Window Trigger dependency in Azure Data Factory.pptx14- Tumbling Window Trigger dependency in Azure Data Factory.pptx
14- Tumbling Window Trigger dependency in Azure Data Factory.pptx
BRIJESH KUMAR
 
Mongo db 최범균
Mongo db 최범균Mongo db 최범균
Mongo db 최범균
beom kyun choi
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redis
Zhichao Liang
 
Long live to CMAN!
Long live to CMAN!Long live to CMAN!
Long live to CMAN!
Ludovico Caldara
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLMorgan Tocker
 
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
준철 박
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
Raghu Udiyar
 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
MariaDB plc
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 

What's hot (20)

FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기
 
Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)
 
Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )Oracle GoldenGate Microservices Overview ( with Demo )
Oracle GoldenGate Microservices Overview ( with Demo )
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Node.js and the MySQL Document Store
Node.js and the MySQL Document StoreNode.js and the MySQL Document Store
Node.js and the MySQL Document Store
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdso
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
MyDUMPER : Faster logical backups and restores
MyDUMPER : Faster logical backups and restores MyDUMPER : Faster logical backups and restores
MyDUMPER : Faster logical backups and restores
 
14- Tumbling Window Trigger dependency in Azure Data Factory.pptx
14- Tumbling Window Trigger dependency in Azure Data Factory.pptx14- Tumbling Window Trigger dependency in Azure Data Factory.pptx
14- Tumbling Window Trigger dependency in Azure Data Factory.pptx
 
Mongo db 최범균
Mongo db 최범균Mongo db 최범균
Mongo db 최범균
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redis
 
Long live to CMAN!
Long live to CMAN!Long live to CMAN!
Long live to CMAN!
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 

Similar to Tuning Linux for MongoDB

Tuning linux for mongo db
Tuning linux for mongo dbTuning linux for mongo db
Tuning linux for mongo db
Soumya Bhattacharyya
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
MongoDB
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
MongoDB
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin Charles
NETWAYS
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
Colin Charles
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
Great Wide Open
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildTim Vaillancourt
 
Deployment
DeploymentDeployment
Deployment
rogerbodamer
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems Baruch Osoveskiy
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
Miklos Szel
 
Deployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS LinuxDeployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS Linux
WO Community
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linux
mountpoint.io
 
Apache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling UpApache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling Up
Sander Temme
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
FITC
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
LetsConnect
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
eurobsdcon
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
MongoDB
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
BIOVIA
 

Similar to Tuning Linux for MongoDB (20)

Tuning linux for mongo db
Tuning linux for mongo dbTuning linux for mongo db
Tuning linux for mongo db
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin Charles
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
 
Deployment
DeploymentDeployment
Deployment
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
 
Deployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS LinuxDeployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS Linux
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linux
 
Apache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling UpApache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling Up
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 

Tuning Linux for MongoDB

  • 1. Tim Vaillancourt Sr. Technical Operations Architect Tuning Linux for MongoDB
  • 2. About Me • Joined Percona in January 2016 • Sr Technical Operations Architect for MongoDB • Previous: • EA DICE (MySQL DBA) • EA SPORTS (Sys/NoSQL DBA Ops) • Amazon/AbeBooks Inc (Sys/MySQL+NoSQL DBA Ops) • Main techs: MySQL, MongoDB, Cassandra, Solr, Redis, queues, etc • 10+ years tuning Linux for database workloads (off and on) • Not a kernel-guy, learned from breaking things
  • 3. Linux • UNIX-like, mostly POSIX-compliant operating system • First released on September 17th, 1991 by Linus Torvalds • 50Mhz CPUs were considered fast • CPUs had 1 core • RAM was measured in megabytes • Ethernet speed was 1 - 10mbps • General purpose • It will run on a Raspberry Pi -> Mainframes • Geared towards many different users and use cases • Linux 3.2+ is much more efficient
  • 4. MongoDB • Document-oriented database first released in 2009 • Thread per connection model • Non-contiguous memory access pattern • Storage Engines • MMAPv1 • Calls ‘mmap()’ to map on-disk data to RAM • Keeps warm data in Linux filesystem cache • Highly random I/O pattern • Scales with RAM and Disk only** • Cache uses all the RAM it can get
  • 5. MongoDB • Storage Engines • WiredTiger and RocksDB • Built-in Compression • Uses combination of in-heap cache and filesystem cache • In-heap cache: uncompressed pages • Filesystem cache: compressed pages • Relatively sequential write patterns, low write overhead • Scales with RAM, Disk and CPUs
  • 6. Ulimit • Allows per-Linux-user resource constraints • Number of User-level Processes • Number of Open Files • CPU Seconds • Scheduling Priority • Others… • MongoDB • Should probably have it’s own VM, container or server • Creates a process for each connection
  • 7. Ulimit • MongoDB (continued) • Creates an open file for each active data file on disk • 64,000 open files and 64,000 max processes is a good start • Read current ulimit: “ulimit -a” (run as mongo user) • Set ulimit for mongo user in ‘/etc/security/limits.d/‘ or in ‘/etc/security/limits.conf’: • Restart mongod/mongos after the ulimit change to apply it
  • 8. Virtual Memory: Dirty Ratio • Dirty Pages • Pages stored in-cache, but needs to be written to storage • VM Dirty Ratio • Max percent of total memory that can be dirty • VM stalls and flushes when this limit is reached • Start with ’10’, default (30) too high • VM Dirty Background Ratio • Separate threshold for background dirty page flushing • Flushes without pauses • Start with ‘3’, default (15) too high
  • 9. Virtual Memory: Swappiness • A Linux kernel sysctl setting for preferring RAM or disk for swap • Linux default: 60 • To avoid disk-based swap: 1 (not zero!) • To allow some disk-based swap: 10 • ‘0’ can cause unpredicted behaviour
  • 10. Virtual Memory: Transparent HugePages • Introduced in RHEL/CentOS 6, Linux 2.6.38+ • Merges 4kb pages into 2mb HugePages (512x) in background (Khugepaged process) • Decreases overall performance when used with MongoDB! • Disable it • Add “transparent_hugepage=never” to kernel command-line (GRUB) • Reboot
  • 11. NUMA (Non-Uniform Memory Access) • A memory architecture that takes into account the locality of memory, caches and CPUs for lower latency • MongoDB code base is not NUMA “aware”, causing unbalanced allocations • Disable NUMA • In the server BIOS • Using ‘numactl’ in mongod init script BEFORE ‘mongod’ command: numactl --interleave=all /usr/bin/mongod <other flags>
  • 12. Block Devices: Type and Layout • Isolation • Run Mongod dbPaths on separate volume • Optionally, run Mongod journal on separate volume • RAID Level • RAID 10 == performance/durability sweet spot • RAID 0 == fast and dangerous • SSDs • Benefit MMAPv1 a lot • Benefit WT and RocksDB a bit less • Keep about 30% free for internal GC on the SSD • EBS • Network-attached can be risky • JBOD + Replset as Data Redundancy (use at own risk) • Number of Replset Members • Read and Write Concern • Proper Geolocation/Node Redundancy
  • 13. Block Devices: IO Scheduler • Algorithm kernel uses to commit reads and writes to disk • CFQ • Linux default • Perhaps too clever/inefficient for database workloads • Deadline • Best general default IMHO • Predictable I/O request latencies • Noop • Use with virtualisation or (sometimes) with BBU RAID controllers
  • 14. Block Devices: Block Read-ahead • Tuning that causes data ahead of a block on disk to be read and then cached • Assumption: there is a sequential read pattern and something will benefit from the extra cached blocks • Risk: too high waste cache space and increases eviction work • MongoDB tends to have very random disk patterns • A good start for MongoDB volumes is a ’32’ (16kb) read-ahead
  • 15. Block Devices: Udev rule /etc/udev/rules.d/60-mongodb-disk.rules: # set deadline scheduler and 32/16kb read-ahead for /dev/sda ACTION=="add|change", KERNEL=="sda", ATTR{queue/scheduler}="deadline", ATTR{bdi/read_ahead_kb}="16" • Add file to ‘/etc/udev/rules.d’ • Reboot (or use CLI tools to apply)
  • 16. Filesystems and Options • Use XFS or EXT4, not EXT3 • Use XFS only on WiredTiger • Set ‘noatime’ on MongoDB data volumes in ‘/etc/fstab’: • Remount the filesystem after an options change, or reboot
  • 17. Network Stack • Defaults are not good for > 100mbps Ethernet • Suggested starting point (add to ‘/etc/sysctl.conf’): • Run “sysctl -p” as root to reload Network Stack settings
  • 18. NTPd (Network Time Protocol) • Replication and Clustering needs consistent clocks • Run NTP daemon on all MongoDB and Monitoring hosts • Enable on restart • Use a consistent time source/server
  • 19. SELinux (Security-Enhanced Linux) • A kernel-level security access control module • Modes of SELinux • Enforcing: Block and log policy violations • Permissive: Log policy violations only • Disabled: Completely disabled • Recommended: Enforcing • Percona Server for MongoDB 3.2+ RPMs install an SELinux policy on RedHat/CentOS!
  • 20. • A “framework” for applying tunings to Linux • RedHat/CentOS 7 • Debian added it, not sure on official status • Watch my/Percona-Lab GitHub for profiles in the future! Tuned
  • 21. CPUs and Frequency Scaling • Lots of cores > faster cores • ‘cpufreq’: a daemon for dynamic scaling of the CPU frequency • Terrible idea for databases • Disable or set governor to 100% frequency always, i.e mode: ‘performance’ • Disable any BIOS-level performance/efficiency tuneable • ENERGY_PERF_BIAS • A CentOS/RedHat tuning for energy vs performance balance • RHEL 6 = ‘performance’ • RHEL 7 = ‘normal’ (!) • Advice: use ‘tuned’ to set to ‘performance’
  • 22. Monitoring: Percona PMM • Open-source monitoring suite from Percona! • MongoDB visualisations by cluster, shard, replset, engine, etc • DB stats groupings with OS metrics • Simple deployment
  • 23. Monitoring: Prometheus + Grafana • PerconaLab GitHub Repositories • grafana_mongodb_dashboards • prometheus_mongodb_exporter
  • 24. Links • https://www.percona.com/blog/2016/08/12/tuning-linux-for-mongodb/ • https://docs.mongodb.com/manual/administration/production-notes/ • http://www.brendangregg.com/linuxperf.html ==> • https://www.percona.com/doc/percona-monitoring-and-management/index.html • https://github.com/Percona-Lab/grafana_mongodb_dashboards • https://github.com/Percona-Lab/prometheus_mongodb_exporter • https://www.percona.com/blog/2014/04/28/oom-relation-vm-swappiness0-new-kernel/