SlideShare a Scribd company logo
1 of 49
Download to read offline
Enterprise Linux
Kernel Tuning & Customizing for Performance
한진구(HAN, JINKOO)
Email: jkoohan@gmail.com
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑ 시작하기

❑ 모니터링

❑ 주요 요소별 튜닝 방안

✓ 메모리

✓ Swap/Cache 

✓ IO/파일시스템 

✓ 네트워킹
Agenda
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑세션의 한계점

▪ 1시간내에 시스템 튜닝에 대해서 모두 전달하는 것에 대한 한계

=> 주요 개념 및 기본적인 튜닝에 초점

❑튜닝시 Pre-requirement

▪ 먼저 튜닝을 위해서는 하드웨어와 소프트웨어 모두에 대한 이해 필요

▪ 더불어 시스템간의 상호작용에 대한 이해 필요

❑튜닝시 고려사항

▪ 사용자/관리자 요소도 반드시 고려

▪ 사용자 실수?, 개념의 오해?

▪ 모든 사람이 튜닝에 대해 이해하고 있다고 가정하면 안됨

❑튜닝시 주의사항

▪ 시스템 튜닝은 마법이 아님

▪ 종종 하드웨어 업그레이드와 부하 분산이 필요
시작하기
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
시스템 튜닝에 대해 설명하거나 개선을 목표로 할 때 반드시 구분해서 사용해야할 두가지

❑Low-latency – Latency is a measure of time delay experienced in a system, the precise
definition of which depends on the system and the time being measured.[1]

❑High-throughput – The system throughput or aggregate throughput is the sum of the
data rates that are delivered to all terminals in a network or disk-drive.[1]

[1] : wikipedia.org
시작하기
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑하드웨어와 펌웨어의 튜닝을 가장 먼저

▪ 많은 경우, 하드웨어 및 펌웨어 업데이트가 소프트웨어 튜닝보다 더 나은 결과를 가져옴

▪ 하드웨어 벤더의 하드웨어 매뉴얼을 참조하라

❑저전력 기능 제거

▪ 저전력 기능을 Disable함으로써, 전반적인 성능(특히 Latency) 향상 효과

❑불필요한 서비스 제거
시작하기
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Monitoring
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑System log

❑CPU & NUMA

❑BIOS

❑BUS
모니터링
# dmesg

# cat /var/log/messages
# lscpu

# x86info // x86info package

# numactl --hardware
# dmidecode
# lspci // pciutils package

# lsusb // usbutils package
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑vmstat

❑mpstat
모니터링
# vmstat 10

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

0 0 0 327088 9380 84424 0 0 88 6 1003 30 1 1 97 1 0 

0 0 0 327080 9380 84412 0 0 0 0 991 4 0 0 100 0 0 

0 0 0 327080 9380 84412 0 0 0 0 991 4 0 0 100 0 0 

0 0 0 327080 9380 84412 0 0 0 0 989 5 0 0 100 0 0
# mpstat -P ALL 10

11:07:47 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

11:07:57 PM all 0.00 0.00 0.05 0.15 0.00 0.30 0.00 0.00 99.49

11:07:57 PM 0 0.10 0.00 0.10 0.61 0.00 0.20 0.00 0.00 98.99

11:07:57 PM 1 0.00 0.00 0.00 0.00 0.00 0.31 0.00 0.00 99.69

11:07:57 PM 2 0.00 0.00 0.00 0.00 0.00 0.41 0.00 0.00 99.59

11:07:57 PM 3 0.00 0.00 0.10 0.00 0.00 0.30 0.00 0.00 99.60

11:07:57 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

11:08:07 PM all 0.03 0.00 0.08 0.13 0.00 0.33 0.00 0.00 99.44

11:08:07 PM 0 0.10 0.00 0.31 0.51 0.00 0.20 0.00 0.00 98.88

11:08:07 PM 1 0.00 0.00 0.10 0.00 0.00 0.31 0.00 0.00 99.59

11:08:07 PM 2 0.00 0.00 0.00 0.00 0.00 0.31 0.00 0.00 99.69

11:08:07 PM 3 0.00 0.00 0.00 0.00 0.00 0.51 0.00 0.00 99.49
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑iostat

❑sar
모니터링
# iostat -x /dev/sda 10

avg-cpu: %user %nice %system %iowait %steal %idle

0.88 0.00 1.94 1.09 0.00 96.09

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 6.36 0.67 13.77 0.65 527.35 10.02 37.26 0.06 4.51 3.38 4.88

avg-cpu: %user %nice %system %iowait %steal %idle

0.03 0.00 0.30 0.00 0.00 99.67

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 0.00 0.00 0.50 0.00 4.00 8.00 0.00 1.60 0.40 0.02
# sar -q -f /var/log/sa/sa13

....

11:00:01 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15

11:05:01 PM 0 200 0.08 0.09 0.06

11:10:01 PM 0 200 0.30 0.16 0.09

11:15:01 PM 0 200 0.00 0.06 0.07

11:20:01 PM 1 200 0.12 0.06 0.06
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑netstat

❑ethtool
모니터링
# netstat -natu

Active Internet connections (servers and established)

Proto Recv-Q Send-Q Local Address Foreign Address State 

tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 

tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 

tcp 0 0 192.168.219.117:22 192.168.219.105:33376 ESTABLISHED 

tcp 0 0 :::22 :::* LISTEN 

tcp 0 0 ::1:25 :::* LISTEN 

udp 0 0 0.0.0.0:68 0.0.0.0:*
# ethtool -S eth0

NIC statistics:

rx_packets: 0

tx_packets: 0

rx_bytes: 0

tx_bytes: 0

....

rx_multicast: 0

tx_multicast: 0

rx_errors: 0

....
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Basic Tuning
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑Turn off tickless kernel

▪ Tickless 기능은 CPU의 전력소모를 줄일 수 있도록 도와주는 기능

▪ 이 기능은 때때로 시스템의 latency를 증가시키는 효과를 가져옴

❑Limit ACPI and Intel’s C-State

▪ ACPI Standard and Intel describes CPU's sleep state as a power reduction method.

▪ Your servers will make more noize!
기본 튜닝
nohz=off
processor.max_cstate=1 

intel_idle.max_cstate=0
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑Turn off ‘Transparent Huge Page’

▪ 이 기능은 일반적으로 ‘latency’와 ‘throughput’ 모두에 영향을 줄 수 있음

▪ Therefore, carefully consider if turning it off is helpful

❑Turn off ‘CGroup’ feature

▪ CGroup은 관리자가 CPU, memory, network과 같은 시스템 리소스를 관리할 수 있는 기능

▪ This can be a delay point of the system latency
기본 튜닝
transparent_hugepage=never
cgroup_disable=memory
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑Check what services are running

❑Disable unused services
기본 튜닝
# service –-status-all

# chkconfig –list | grep on
# service bluetooth stop

# chkconfig bluetooth off

# yum remove bluez
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
If you don’t know what to do, then use tuned instead

Tuned is a daemon that monitors the use of system components and dynamically tunes
system settings based on that monitoring information.

It includes predefined profiles for specific use cases.
기본 튜닝
# yum install tuned

# service tuned start

# chkconfig tuned on
# tuned-adm active // ‘default’ profile

# tuned-adm list

# tuned-adm profile [profile_name]
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
The predefined profiles (in EL 6)

프로파일의 customize 가능!
기본 튜닝
# tuned-adm list

- laptop-ac-powersave

- desktop-powersave

- enterprise-storage

- default

- virtual-guest

- throughput-performance

- laptop-battery-powersave

- server-powersave

- latency-performance

- spindown-disk

- virtual-host

# tuned-adm profile latency-performance
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Memory Tuning
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Linear address space (Virtual Address)
TLB
Physical 

Memory
MMU (in CPU)
Linear address space (Virtual Address)
Offset within PGD Offset within PMD Offset within PTE Offset within Data
Yes
No (TLB miss)
page fault
메모리 어드레싱 개요
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑물리적 메모리는 page단위로 나눠지며, 기본 사이즈는 4KiB

❑시스템의 물리적 메모리가 큰 경우, 크고 연속적인 공간을 접근하는데 있어서 부하도가 높아지며, 이
는 TLB miss가 급격하게 증가하게 됨
대용량 물리 메모리 환경에서의 문제
Translation Lookaside Buffer (TLB)
l Translating linear addresses into physical addresses takes time, so most processors

lhave a small cache known as a TLB that stores the physical addresses associated

lwith the most recently accessed virtual addresses. 

l TLB is a small cache so large memory applications can incur high TLB miss rates, 

land TLB misses are extremely expensive on today’s very fast, pipelined CPUs.
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑ IA-32 architecture는 4KiB, 2MiB 그리고 4KiB page를 지원

❑ 리눅스 커널 역시 HugePage 메카니즘을 통해서 2MB와 1GB large sized page를 제공

❑ Having fewer TLB entries that point to more memory means that a TLB hit is more likely
to occur.
대용량 메모리 환경 성능개선방안 - Hugepage
Standard HugePage (EL 4, 5, 6)

2 MB per page

Reserve/Free via /proc/sys/vm/nr_hugepages

Used via hugetlbfs
GB HugePage (EL 6, 7)

1 GB per page

Reserved at boot time/No freeing

Used via hugetlbfs
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑To check if your CPU supports HugePage

❑Enable/Disable HugePage with kernel parameter

❑In case of 2MB HugePage, possible to enable/disable dynamically via /sys interface

❑To use 1 GB HugePage
대용량 메모리 환경 성능개선방안 - Hugepage
hugepages=2048 

hugepagesz=2M
default_hugepagesz=1G 

hugepages=10 

hugepagesz=1G
# grep --color pse /proc/cpuinfo // 2MB

# grep --color pdpe1gb /proc/cpuinfo // 1GB
# sysctl -w vm.nr_hugepages = 20
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑ 대부분의 databases들은 성능 향상을 위해서 HugePage 사용을 요구
❑HugePage 사용을 위해서는 application이 반드시 mmap 또는 shmat/shmget 시스템콜을 사용해
서 이용하며, mmap 시스템콜은 hugetlbfs 마운트가 필요

❑To configure hugetlbfs
대용량 메모리 환경 성능개선방안 - Hugepage
# mkdir /mnt/hugepages

# mount -t hugetlbfs hugetlbfs /mnt/hugepages
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
❑ EL6에서는 기본적으로 HugePage가 Enabled되어 있음
❑커널은 필요시에 hugepage 할당을 시도하며, 모든 프로세스들은 2MB page를 받게 됨

❑만약 hugepage가 가능하지 않다면, 커널은 일반적인 4KB page로 되돌아 감

❑THP는 hugetlbfs와는 달리 swap될 수 있으며, 이때 4KB page로 나눠져서 swap out 되어짐

❑No modification is required for applications

❑Big Data 또는 DBMS 솔류션을 운영할때는 사용에 대해서 주의깊게 고려해야 함
대용량 메모리 환경 성능개선방안 – Transparent Hugepage
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
lEnable/disable with the kernel parameter

lDynamically enable/disable

lMonitor THP
대용량 메모리 환경 성능개선방안 – Transparent Hugepage
transparent_hugepage=always|never
# echo always > /sys/kernel/mm/redhat_transparent_hugepage/enabled

# echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
# egrep 'trans|thp' /proc/vmstat // EL 6.2 or later

nr_anon_transparent_hugepages 2018

thp_fault_alloc 7302

thp_fault_fallback 0

thp_collapse_alloc 401

thp_collapse_alloc_failed 0

thp_split 21
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Swap/Cache Tuning
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Swap에 대한 이해
❑ swap 공간은 메모리의 효율성을 높여주며, free memory가 부족해지며 오래된 page들
은 다른 사용을 위한 메모리 공간을 확보하고자 디스크로 paged out됨
❑Anonymous pages but inactive will be selected

❑Recently, systems have the large amount of physical memory. Is swap space obsolete?

▪ Without swap space

anonymous page들은 flushed될 수 없으며, 그 메모리는 사용을 하지않더라도 release될 때
까지 메모리에 남아있게 됨

▪ Flushing pages to swap is actually a bit easier and quicker than flushing them to
disk: the code is much simpler, and there are no directory trees to update.
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Swap space 성능개선방안
❑ swap 성능 향상

▪ One large swap area could result in bad performance. Split swap area to multiple
disks (Max 32)

▪ kernel uses highest priority first, uses round-robin for swap areas of equal priority

❑Place swap areas on lowest numbered partitions of fastest disk.

❑Monitoring whether a system is swapping or not
# vmstat 5

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

0 0 0 1730936 9976 78396 0 0 4 0 968 4 0 1 99 0 0

0 0 0 1730936 9976 78396 0 0 0 0 3618 14 0 1 99 0 0 

0 0 0 1730068 9984 78388 0 0 0 12 3576 36 0 2 97 1 0
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Swapping 성능개선방안
❑Kernel prefers to swap out anonymous pages when % of memory mapped into page
tables + vm.swappiness >= 100

❑이 파라미터의 기본값은 60이며, 값이 높아질수록 swap out이 자주 발생하게 됨

❑General tuning guide

▪ For batch jobs, increasing it

▪ For DBMS, Big Data tasks, set 0 or decresing it
# cat proc/sys/vm.swappiness

60

# cat 10 > /proc/sys/vm/swappiness 

// Or,

# sysctl -w vm.swappiness=10
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Cache Memory에 대한 이해
❑To reduce service time for slower subsystems (I/O), kernel uses different type of caches:

❑Slab Cache : 

▪ Store various types of data structures kernel uses and these data structures don’t fit
into a single page of memory. 

▪ slab은 사전에 할당된 메모리 공간으로 부터 할당됨

❑Swap Cache : 

▪ Track of pages previously swapped out and now swapped in. 

▪ 사전에 swap out했던 page를 다시 swap out 할때, 먼저 swap cache entry를 확인하며, 확
인 시 disk로의 writing은 발생하지 않음
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Cache Memory에 대한 이해
❑Page Cache (File-backed, no swapable) : 

▪ To improve the overall performance of a system, the kernel tends to use memory as
a cache to store data being read from or written to disk as much as possible.

▪ 이 데이터들은 물리적 disk까지의 직접적인 I/O request 없이 RAM으로 부터 재활용함

❑In some cases, page cache brings issues:

▪ Cache size constantly goes up and the speed of freeing page cache cannot follow
the speed of growth.

▪ The system performance drops down due to seeking free pages or swapping the
pages out to free space in despite of large page cache.
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Cache Memory 성능개선방안
❑Increase the tendency of page cache reclaim

❑Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim
dentries and inodes.

❑Increasing min_free_kbytes up to 5% of the physical memory and it will keep that
amount of memory as free

❑Let kernel flushing dirty pages more early
# sysctl -w vm.dirty_background_ratio = 10 // decreasing it

// by flushed, smaller I/O stream, less dirty page cache

# sysctl -w vm.dirty_ratio = 20 // decreasing it

// by application, synchronous writes
# sysctl -w vm.vfs_cache_pressure = 120 // increasing it
# sysctl -w vm.min_free_kbytes = 14066 // increasing it
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Cache Memory 성능개선방안
❑Decrease swappiness

▪ Even though there is plenty of cache memory that could be easily freed, kernel can
swap out data to keep the pages of memory that are likely to be needed in the near
future

▪ Less likely to swap, and thus more likely to write data out to disk

❑Reclaim all clean pages

❑Useful to free the cache before running jobs what requires the large amount of fee
space..
# cat 10 > /proc/sys/vm/swappiness
# echo 1 > /proc/sys/vm/drop_caches
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
IO/Filesystem Tuning
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
I/O 서브시스템의 이해
❑Read or write requests are transformed into block device requests that go into a queue. 

▪ The I/O subsystem then batches similar requests that come within a specific time
window and processes them all at once.

❑Generally, the I/O subsystem does not operate in a true FIFO manner. It processes
queued read/write requests depending on the selected scheduler algorithms called
elevators because they operate in the same manner that real-life building elevators do.
# cat /sys/block/<device>/queme/schduler

noop anticipatory deadline [cfq]
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
I/O 서브시스템의 이해
❑Think about how Hard disk drive works

❑To improve the overall I/O performance 

▪ Re-arrange the requests, 

▪ Wisely choice when will the requests are served
I/O Queue
New I/O requests
Drop the performance to seek 

the location for each requests
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
I/O 서브시스템의 성능개선방안
❑Completely Fair Queuing – cfq

▪ Default I/O scheduler in EL 5, 6, 7

▪ Equally divide all available I/O bandwidth among all processes issuing I/O requests.

❑Deadline - deadline 

▪ large sequential read-mostly workload

▪ Guarantee a response time for each request. Once a request reaches its expiration
time, it is serviced immediately
# echo dealine > /sys/block/<device>/queue/schduler
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
I/O 서브시스템의 성능개선방안
❑Anticipatory – anticipatory

▪ Optimize systems with small or slow disk subsystems. 

▪ Recommended for servers running data processing applications that are not
regularly interrupted by external requests.

❑NOOP - noop 

▪ For systems which consumes heavy CPU workload

▪ All requests into a simple unordered queue

▪ Recommended for virtualized guests
elevator=noop // kernel parameter
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
저널링 파일시스템의 이해
❑Journaling file system is quickly recovered by a log book for the file system. 

▪ Any change of the file system will be made in a journal as a transaction before
committing them to the actual file system.

▪ In the event of a system crash or power failure, the file systems are quickly
recovered and less likely to be corrupted.

▪ It’s very important feature in Enterprise market

❑ext3, ext4, xfs are journaling file systems

❑EL6 uses ext4, EL7 uses xfs as its default file system
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
저널링 파일시스템의 성능개선방안
❑To improve journal performance

▪ Place the journal from the file system on a separate device like an SSD. 

▪ Reduce the visit count on the actual file system.

❑But needs to carefully manage the journal-filesystem pair
// Create a new external journal disk

# mkfs.ext4 -O journal_dev -v 4096 /dev/sdj1

// Create a new filesystem with the external journal disk

# mkfs.ext4 -J device=/dev/sdj1 -b 4096 /dev/sde1
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
파일시스템 Barrier의 이해
❑A write barrier is used to mitigate the risk of data corruption during power loss.

❑Storage devices may have write caches. They will report I/O as “complete” when the
data is still in cache. If the cache loses power, it also loses data.

❑Some storage devices use battery-backed write caches. The data will be survive in
power failure. However, it could change the original metadata ordering. The commit
block may be present on disk without associated transaction in place.

❑Therefore, ext4 and xfs turn on barriers by default in EL 6, and 7
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
파일시스템 Barrier 성능개선방안
❑Enabling write barriers causes a significant performance penalty.

❑Enabling it can have an impact on workloads that create or remove lots of small files,
much less (close to no) impact on streaming write workloads and no impact on read
workloads.

❑In general, the write barrier can be disabled if the storage device uses battery-backed
write caches.

❑Write barriers are also unnecessary whenever the system uses hardware RAID
controllers with battery-backed write cache.
# mount -o nobarrier ....
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
Networking Tuning
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
네트워크 성능 저하에 따른 패킷 손실
Overrun : usually seen under heavy UDP traffic

Dropped : seen under both heavy UDP/TCP traffic
bond1 Link encap:Ethernet HWaddr 00:AA:BB:CC:DD:EE

inet addr:192.168.10.33 Bcast:192.168.10.255 Mask:255.255.255.0

UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 

RX packets:8344569671 errors:0 dropped:0 overruns:46295 frame:0

TX packets:53614 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:2952210470156 (2.6 TiB) TX bytes:5251386 (5.0 MiB)
eth0 Link encap:Ethernet HWaddr 

UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1

RX packets:27051811 errors:0 dropped:696311 overruns:0 frame:0 

TX packets:110147381 errors:0 dropped:0 overruns:0 carrier:0
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
네트워크 패킷 손실 최소화를 위한 성능개선방안
❑ Update kernel and firmware to the latest versions

▪ The many of packet loss problems can be resolved with the latest kernel and
firmware.

❑Increase the NIC’s ring buffer 

▪ NIC has their own buffer called ‘ring buffer’ 

▪ Increasing its size may avoid overrun as well as packet drop.
# ethtool -g eth1

Pre-set maximums:

RX: 4096

RX Mini: 0

RX Jumbo: 0

TX: 4096

....

# ethtool --set-ring eth1 4096 // set the pre-set maximums
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
네트워크 패킷 손실 최소화를 위한 성능개선방안
❑ Increase the rate of draining the receive queues

▪ net.core.dev_weight : how many packets any individual network interface can
process during each NAPI poll.

▪ net.core.netdev_budget : maximum number of packets taken from all interfaces in
one polling cycle (NAPI poll).

❑Balance interrupt handling, or pin interrupts to proper CPUs if needed
# cat /etc/sysctl.conf

net.core.dev_weight = 64 // increasing it

net.core.netdev_budget = 300 // “
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
TCP Window에 대한 이해
❑Whenever receiving a packet, the receiver needs to send an ACK to the sender under
TCP protocol. The sender also needs to wait for the ack. 

▪ Will affect network throughput and CPU utilization

❑If the network is long and slow like satellite network, or has a larger bandwidth, more
packets can be on the link between a sender and receiver at a time. 

▪ TCP window allows the sender sending more packets without ACKs.

▪ The length of TCP window is variable based on the size of TCP socket buffer.
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
TCP Window에 대한 이해
❑If applications slowly fetch packets from socket buffers, the buffers are going to be full
and start to drop packets

❑Get better performance by increasing TCP socket buffer size.
Receiver
Sender
Receiver
Receive window : 4
Sender
Sender
Ack : receive window 2
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
TCP Throughput 성능개선방안
❑Need to set the proper values carefully because each socket will consume memory in
size of the default buffer size when opening it. 

❑Setting the TCP buffer size too large will seriously affect network speed and latencies for
connections that send small amounts of data (such as HTTP or SSH)
# cat /etc/sysctl.conf

net.core.rmem_default = 4194304

net.core.rmem_max = 8388608

net.core.wmem_max = 8388608

net.core.rmem_default = 8388608

net.core.wmem_default = 8388608

net.ipv4.tcp_rmem = 8172 4194304 8288608

// net.ipv4.tcp_rmem 4096 8288608 67108864 // more aggressive
‹#›
難攻不落(난공불락) 오픈소스 인프라 세미나
감사합니다

More Related Content

What's hot

Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
QCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsQCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsBrendan Gregg
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Asible
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Asible제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Asible
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-AsibleTommy Lee
 
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Anne Nicolas
 
Intro to linux performance analysis
Intro to linux performance analysisIntro to linux performance analysis
Intro to linux performance analysisChris McEniry
 
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...Nagios
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems PerformanceBrendan Gregg
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudBrendan Gregg
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webSzymon Haly
 
Linux con europe_2014_f
Linux con europe_2014_fLinux con europe_2014_f
Linux con europe_2014_fsprdd
 
NVDIMM block drivers with NFIT
NVDIMM block drivers with NFITNVDIMM block drivers with NFIT
NVDIMM block drivers with NFITjoeylikernel
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFBrendan Gregg
 
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Anne Nicolas
 

What's hot (20)

Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
QCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsQCon 2015 Broken Performance Tools
QCon 2015 Broken Performance Tools
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Asible
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Asible제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Asible
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Asible
 
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
 
Intro to linux performance analysis
Intro to linux performance analysisIntro to linux performance analysis
Intro to linux performance analysis
 
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the Cloud
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
 
Linux con europe_2014_f
Linux con europe_2014_fLinux con europe_2014_f
Linux con europe_2014_f
 
NVDIMM block drivers with NFIT
NVDIMM block drivers with NFITNVDIMM block drivers with NFIT
NVDIMM block drivers with NFIT
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
 
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
 

Viewers also liked

Kris.art history
Kris.art historyKris.art history
Kris.art historykrouzan
 
Efor ve performans beklentisi
Efor ve performans beklentisiEfor ve performans beklentisi
Efor ve performans beklentisiTaner Kaşıkçı
 
Kris.art history
Kris.art historyKris.art history
Kris.art historykrouzan
 
מדיה חדשה וארגונים גדולים - ולמה זה לא ממש מסתדר ביחד
מדיה חדשה וארגונים גדולים - ולמה זה לא ממש מסתדר ביחדמדיה חדשה וארגונים גדולים - ולמה זה לא ממש מסתדר ביחד
מדיה חדשה וארגונים גדולים - ולמה זה לא ממש מסתדר ביחדtamnun-marketing
 
Genel isletmecilik bilgileri
Genel isletmecilik bilgileriGenel isletmecilik bilgileri
Genel isletmecilik bilgileriTaner Kaşıkçı
 
Kpss 2013/1 İktisat Yerleştirme Puanları
Kpss 2013/1 İktisat Yerleştirme PuanlarıKpss 2013/1 İktisat Yerleştirme Puanları
Kpss 2013/1 İktisat Yerleştirme PuanlarıTaner Kaşıkçı
 
Olutee eng’g int’l waste mgt presenation
Olutee eng’g int’l waste mgt presenationOlutee eng’g int’l waste mgt presenation
Olutee eng’g int’l waste mgt presenationEkrakene E. Peter
 
Ch01 ppt godwin
Ch01 ppt godwinCh01 ppt godwin
Ch01 ppt godwinKaran Shah
 
Makro iktisat IS-LM Kpss Çıkmış Sorular
Makro iktisat IS-LM Kpss Çıkmış SorularMakro iktisat IS-LM Kpss Çıkmış Sorular
Makro iktisat IS-LM Kpss Çıkmış SorularTaner Kaşıkçı
 
Maliye politikası Sınav Soruları
Maliye politikası Sınav SorularıMaliye politikası Sınav Soruları
Maliye politikası Sınav SorularıTaner Kaşıkçı
 

Viewers also liked (18)

Yönetim islevleri
Yönetim islevleriYönetim islevleri
Yönetim islevleri
 
Kris.art history
Kris.art historyKris.art history
Kris.art history
 
Isletmenin temel islevleri
Isletmenin temel islevleriIsletmenin temel islevleri
Isletmenin temel islevleri
 
Orgutleme
OrgutlemeOrgutleme
Orgutleme
 
Efor ve performans beklentisi
Efor ve performans beklentisiEfor ve performans beklentisi
Efor ve performans beklentisi
 
Kris.art history
Kris.art historyKris.art history
Kris.art history
 
מדיה חדשה וארגונים גדולים - ולמה זה לא ממש מסתדר ביחד
מדיה חדשה וארגונים גדולים - ולמה זה לא ממש מסתדר ביחדמדיה חדשה וארגונים גדולים - ולמה זה לא ממש מסתדר ביחד
מדיה חדשה וארגונים גדולים - ולמה זה לא ממש מסתדר ביחד
 
Genel isletmecilik bilgileri
Genel isletmecilik bilgileriGenel isletmecilik bilgileri
Genel isletmecilik bilgileri
 
Impressionistic period
Impressionistic periodImpressionistic period
Impressionistic period
 
Kpss 2013/1 İktisat Yerleştirme Puanları
Kpss 2013/1 İktisat Yerleştirme PuanlarıKpss 2013/1 İktisat Yerleştirme Puanları
Kpss 2013/1 İktisat Yerleştirme Puanları
 
Manual etabs 2013
Manual etabs 2013Manual etabs 2013
Manual etabs 2013
 
Yönetim ve örgüt
Yönetim ve örgütYönetim ve örgüt
Yönetim ve örgüt
 
Olutee eng’g int’l waste mgt presenation
Olutee eng’g int’l waste mgt presenationOlutee eng’g int’l waste mgt presenation
Olutee eng’g int’l waste mgt presenation
 
Ch01 ppt godwin
Ch01 ppt godwinCh01 ppt godwin
Ch01 ppt godwin
 
Tam rekabet piyasası
Tam rekabet piyasasıTam rekabet piyasası
Tam rekabet piyasası
 
Makro iktisat IS-LM Kpss Çıkmış Sorular
Makro iktisat IS-LM Kpss Çıkmış SorularMakro iktisat IS-LM Kpss Çıkmış Sorular
Makro iktisat IS-LM Kpss Çıkmış Sorular
 
Maliyet teorisi
Maliyet teorisiMaliyet teorisi
Maliyet teorisi
 
Maliye politikası Sınav Soruları
Maliye politikası Sınav SorularıMaliye politikası Sınav Soruları
Maliye politikası Sınav Soruları
 

Similar to 20150918 klug el performance tuning-v1.4

YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisJignesh Shah
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingMarian Marinov
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalTommy Lee
 
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Ontico
 
Introduction to Java Profiling
Introduction to Java ProfilingIntroduction to Java Profiling
Introduction to Java ProfilingJerry Yoakum
 
hacking-embedded-devices.pptx
hacking-embedded-devices.pptxhacking-embedded-devices.pptx
hacking-embedded-devices.pptxssuserfcf43f
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingScyllaDB
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawnGábor Nyers
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSFelipe Prado
 
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...Nagios
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d methodAjith Narayanan
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoringMiguel Rodriguez
 

Similar to 20150918 klug el performance tuning-v1.4 (20)

YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on Solaris
 
test
testtest
test
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
Introduction to Java Profiling
Introduction to Java ProfilingIntroduction to Java Profiling
Introduction to Java Profiling
 
hacking-embedded-devices.pptx
hacking-embedded-devices.pptxhacking-embedded-devices.pptx
hacking-embedded-devices.pptx
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
QCon London.pdf
QCon London.pdfQCon London.pdf
QCon London.pdf
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawn
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
 
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...Nagios Conference 2011 - Daniel Wittenberg -  Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
 
Optimizing Linux Servers
Optimizing Linux ServersOptimizing Linux Servers
Optimizing Linux Servers
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d method
 
Guide to alfresco monitoring
Guide to alfresco monitoringGuide to alfresco monitoring
Guide to alfresco monitoring
 

Recently uploaded

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 

Recently uploaded (20)

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 

20150918 klug el performance tuning-v1.4

  • 1. Enterprise Linux Kernel Tuning & Customizing for Performance 한진구(HAN, JINKOO) Email: jkoohan@gmail.com
  • 2. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑ 시작하기 ❑ 모니터링 ❑ 주요 요소별 튜닝 방안 ✓ 메모리 ✓ Swap/Cache ✓ IO/파일시스템 ✓ 네트워킹 Agenda
  • 3. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑세션의 한계점 ▪ 1시간내에 시스템 튜닝에 대해서 모두 전달하는 것에 대한 한계
 => 주요 개념 및 기본적인 튜닝에 초점 ❑튜닝시 Pre-requirement ▪ 먼저 튜닝을 위해서는 하드웨어와 소프트웨어 모두에 대한 이해 필요 ▪ 더불어 시스템간의 상호작용에 대한 이해 필요 ❑튜닝시 고려사항 ▪ 사용자/관리자 요소도 반드시 고려 ▪ 사용자 실수?, 개념의 오해? ▪ 모든 사람이 튜닝에 대해 이해하고 있다고 가정하면 안됨 ❑튜닝시 주의사항 ▪ 시스템 튜닝은 마법이 아님 ▪ 종종 하드웨어 업그레이드와 부하 분산이 필요 시작하기
  • 4. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 시스템 튜닝에 대해 설명하거나 개선을 목표로 할 때 반드시 구분해서 사용해야할 두가지 ❑Low-latency – Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured.[1] ❑High-throughput – The system throughput or aggregate throughput is the sum of the data rates that are delivered to all terminals in a network or disk-drive.[1] [1] : wikipedia.org 시작하기
  • 5. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑하드웨어와 펌웨어의 튜닝을 가장 먼저 ▪ 많은 경우, 하드웨어 및 펌웨어 업데이트가 소프트웨어 튜닝보다 더 나은 결과를 가져옴 ▪ 하드웨어 벤더의 하드웨어 매뉴얼을 참조하라 ❑저전력 기능 제거 ▪ 저전력 기능을 Disable함으로써, 전반적인 성능(특히 Latency) 향상 효과 ❑불필요한 서비스 제거 시작하기
  • 7. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑System log ❑CPU & NUMA ❑BIOS ❑BUS 모니터링 # dmesg # cat /var/log/messages # lscpu # x86info // x86info package # numactl --hardware # dmidecode # lspci // pciutils package # lsusb // usbutils package
  • 8. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑vmstat ❑mpstat 모니터링 # vmstat 10 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 327088 9380 84424 0 0 88 6 1003 30 1 1 97 1 0 0 0 0 327080 9380 84412 0 0 0 0 991 4 0 0 100 0 0 0 0 0 327080 9380 84412 0 0 0 0 991 4 0 0 100 0 0 0 0 0 327080 9380 84412 0 0 0 0 989 5 0 0 100 0 0 # mpstat -P ALL 10 11:07:47 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 11:07:57 PM all 0.00 0.00 0.05 0.15 0.00 0.30 0.00 0.00 99.49 11:07:57 PM 0 0.10 0.00 0.10 0.61 0.00 0.20 0.00 0.00 98.99 11:07:57 PM 1 0.00 0.00 0.00 0.00 0.00 0.31 0.00 0.00 99.69 11:07:57 PM 2 0.00 0.00 0.00 0.00 0.00 0.41 0.00 0.00 99.59 11:07:57 PM 3 0.00 0.00 0.10 0.00 0.00 0.30 0.00 0.00 99.60 11:07:57 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 11:08:07 PM all 0.03 0.00 0.08 0.13 0.00 0.33 0.00 0.00 99.44 11:08:07 PM 0 0.10 0.00 0.31 0.51 0.00 0.20 0.00 0.00 98.88 11:08:07 PM 1 0.00 0.00 0.10 0.00 0.00 0.31 0.00 0.00 99.59 11:08:07 PM 2 0.00 0.00 0.00 0.00 0.00 0.31 0.00 0.00 99.69 11:08:07 PM 3 0.00 0.00 0.00 0.00 0.00 0.51 0.00 0.00 99.49
  • 9. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑iostat ❑sar 모니터링 # iostat -x /dev/sda 10 avg-cpu: %user %nice %system %iowait %steal %idle 0.88 0.00 1.94 1.09 0.00 96.09 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 6.36 0.67 13.77 0.65 527.35 10.02 37.26 0.06 4.51 3.38 4.88 avg-cpu: %user %nice %system %iowait %steal %idle 0.03 0.00 0.30 0.00 0.00 99.67 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 0.00 0.50 0.00 4.00 8.00 0.00 1.60 0.40 0.02 # sar -q -f /var/log/sa/sa13 .... 11:00:01 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 11:05:01 PM 0 200 0.08 0.09 0.06 11:10:01 PM 0 200 0.30 0.16 0.09 11:15:01 PM 0 200 0.00 0.06 0.07 11:20:01 PM 1 200 0.12 0.06 0.06
  • 10. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑netstat ❑ethtool 모니터링 # netstat -natu Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN tcp 0 0 192.168.219.117:22 192.168.219.105:33376 ESTABLISHED tcp 0 0 :::22 :::* LISTEN tcp 0 0 ::1:25 :::* LISTEN udp 0 0 0.0.0.0:68 0.0.0.0:* # ethtool -S eth0 NIC statistics: rx_packets: 0 tx_packets: 0 rx_bytes: 0 tx_bytes: 0 .... rx_multicast: 0 tx_multicast: 0 rx_errors: 0 ....
  • 12. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑Turn off tickless kernel ▪ Tickless 기능은 CPU의 전력소모를 줄일 수 있도록 도와주는 기능 ▪ 이 기능은 때때로 시스템의 latency를 증가시키는 효과를 가져옴 ❑Limit ACPI and Intel’s C-State ▪ ACPI Standard and Intel describes CPU's sleep state as a power reduction method. ▪ Your servers will make more noize! 기본 튜닝 nohz=off processor.max_cstate=1 intel_idle.max_cstate=0
  • 13. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑Turn off ‘Transparent Huge Page’ ▪ 이 기능은 일반적으로 ‘latency’와 ‘throughput’ 모두에 영향을 줄 수 있음 ▪ Therefore, carefully consider if turning it off is helpful ❑Turn off ‘CGroup’ feature ▪ CGroup은 관리자가 CPU, memory, network과 같은 시스템 리소스를 관리할 수 있는 기능 ▪ This can be a delay point of the system latency 기본 튜닝 transparent_hugepage=never cgroup_disable=memory
  • 14. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑Check what services are running ❑Disable unused services 기본 튜닝 # service –-status-all # chkconfig –list | grep on # service bluetooth stop # chkconfig bluetooth off # yum remove bluez
  • 15. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 If you don’t know what to do, then use tuned instead Tuned is a daemon that monitors the use of system components and dynamically tunes system settings based on that monitoring information. It includes predefined profiles for specific use cases. 기본 튜닝 # yum install tuned # service tuned start # chkconfig tuned on # tuned-adm active // ‘default’ profile # tuned-adm list # tuned-adm profile [profile_name]
  • 16. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 The predefined profiles (in EL 6) 프로파일의 customize 가능! 기본 튜닝 # tuned-adm list - laptop-ac-powersave - desktop-powersave - enterprise-storage - default - virtual-guest - throughput-performance - laptop-battery-powersave - server-powersave - latency-performance - spindown-disk - virtual-host # tuned-adm profile latency-performance
  • 18. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 Linear address space (Virtual Address) TLB Physical Memory MMU (in CPU) Linear address space (Virtual Address) Offset within PGD Offset within PMD Offset within PTE Offset within Data Yes No (TLB miss) page fault 메모리 어드레싱 개요
  • 19. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑물리적 메모리는 page단위로 나눠지며, 기본 사이즈는 4KiB ❑시스템의 물리적 메모리가 큰 경우, 크고 연속적인 공간을 접근하는데 있어서 부하도가 높아지며, 이 는 TLB miss가 급격하게 증가하게 됨 대용량 물리 메모리 환경에서의 문제 Translation Lookaside Buffer (TLB) l Translating linear addresses into physical addresses takes time, so most processors lhave a small cache known as a TLB that stores the physical addresses associated lwith the most recently accessed virtual addresses. l TLB is a small cache so large memory applications can incur high TLB miss rates, land TLB misses are extremely expensive on today’s very fast, pipelined CPUs.
  • 20. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑ IA-32 architecture는 4KiB, 2MiB 그리고 4KiB page를 지원 ❑ 리눅스 커널 역시 HugePage 메카니즘을 통해서 2MB와 1GB large sized page를 제공 ❑ Having fewer TLB entries that point to more memory means that a TLB hit is more likely to occur. 대용량 메모리 환경 성능개선방안 - Hugepage Standard HugePage (EL 4, 5, 6) 2 MB per page Reserve/Free via /proc/sys/vm/nr_hugepages Used via hugetlbfs GB HugePage (EL 6, 7) 1 GB per page Reserved at boot time/No freeing Used via hugetlbfs
  • 21. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑To check if your CPU supports HugePage ❑Enable/Disable HugePage with kernel parameter ❑In case of 2MB HugePage, possible to enable/disable dynamically via /sys interface ❑To use 1 GB HugePage 대용량 메모리 환경 성능개선방안 - Hugepage hugepages=2048 hugepagesz=2M default_hugepagesz=1G hugepages=10 hugepagesz=1G # grep --color pse /proc/cpuinfo // 2MB # grep --color pdpe1gb /proc/cpuinfo // 1GB # sysctl -w vm.nr_hugepages = 20
  • 22. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑ 대부분의 databases들은 성능 향상을 위해서 HugePage 사용을 요구 ❑HugePage 사용을 위해서는 application이 반드시 mmap 또는 shmat/shmget 시스템콜을 사용해 서 이용하며, mmap 시스템콜은 hugetlbfs 마운트가 필요 ❑To configure hugetlbfs 대용량 메모리 환경 성능개선방안 - Hugepage # mkdir /mnt/hugepages # mount -t hugetlbfs hugetlbfs /mnt/hugepages
  • 23. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 ❑ EL6에서는 기본적으로 HugePage가 Enabled되어 있음 ❑커널은 필요시에 hugepage 할당을 시도하며, 모든 프로세스들은 2MB page를 받게 됨 ❑만약 hugepage가 가능하지 않다면, 커널은 일반적인 4KB page로 되돌아 감 ❑THP는 hugetlbfs와는 달리 swap될 수 있으며, 이때 4KB page로 나눠져서 swap out 되어짐 ❑No modification is required for applications ❑Big Data 또는 DBMS 솔류션을 운영할때는 사용에 대해서 주의깊게 고려해야 함 대용량 메모리 환경 성능개선방안 – Transparent Hugepage
  • 24. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 lEnable/disable with the kernel parameter lDynamically enable/disable lMonitor THP 대용량 메모리 환경 성능개선방안 – Transparent Hugepage transparent_hugepage=always|never # echo always > /sys/kernel/mm/redhat_transparent_hugepage/enabled # echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled # egrep 'trans|thp' /proc/vmstat // EL 6.2 or later nr_anon_transparent_hugepages 2018 thp_fault_alloc 7302 thp_fault_fallback 0 thp_collapse_alloc 401 thp_collapse_alloc_failed 0 thp_split 21
  • 26. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 Swap에 대한 이해 ❑ swap 공간은 메모리의 효율성을 높여주며, free memory가 부족해지며 오래된 page들 은 다른 사용을 위한 메모리 공간을 확보하고자 디스크로 paged out됨 ❑Anonymous pages but inactive will be selected ❑Recently, systems have the large amount of physical memory. Is swap space obsolete? ▪ Without swap space
 anonymous page들은 flushed될 수 없으며, 그 메모리는 사용을 하지않더라도 release될 때 까지 메모리에 남아있게 됨 ▪ Flushing pages to swap is actually a bit easier and quicker than flushing them to disk: the code is much simpler, and there are no directory trees to update.
  • 27. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 Swap space 성능개선방안 ❑ swap 성능 향상 ▪ One large swap area could result in bad performance. Split swap area to multiple disks (Max 32) ▪ kernel uses highest priority first, uses round-robin for swap areas of equal priority ❑Place swap areas on lowest numbered partitions of fastest disk. ❑Monitoring whether a system is swapping or not # vmstat 5 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 1730936 9976 78396 0 0 4 0 968 4 0 1 99 0 0 0 0 0 1730936 9976 78396 0 0 0 0 3618 14 0 1 99 0 0 0 0 0 1730068 9984 78388 0 0 0 12 3576 36 0 2 97 1 0
  • 28. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 Swapping 성능개선방안 ❑Kernel prefers to swap out anonymous pages when % of memory mapped into page tables + vm.swappiness >= 100 ❑이 파라미터의 기본값은 60이며, 값이 높아질수록 swap out이 자주 발생하게 됨 ❑General tuning guide ▪ For batch jobs, increasing it ▪ For DBMS, Big Data tasks, set 0 or decresing it # cat proc/sys/vm.swappiness 60 # cat 10 > /proc/sys/vm/swappiness // Or, # sysctl -w vm.swappiness=10
  • 29. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 Cache Memory에 대한 이해 ❑To reduce service time for slower subsystems (I/O), kernel uses different type of caches: ❑Slab Cache : ▪ Store various types of data structures kernel uses and these data structures don’t fit into a single page of memory. ▪ slab은 사전에 할당된 메모리 공간으로 부터 할당됨 ❑Swap Cache : ▪ Track of pages previously swapped out and now swapped in. ▪ 사전에 swap out했던 page를 다시 swap out 할때, 먼저 swap cache entry를 확인하며, 확 인 시 disk로의 writing은 발생하지 않음
  • 30. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 Cache Memory에 대한 이해 ❑Page Cache (File-backed, no swapable) : ▪ To improve the overall performance of a system, the kernel tends to use memory as a cache to store data being read from or written to disk as much as possible. ▪ 이 데이터들은 물리적 disk까지의 직접적인 I/O request 없이 RAM으로 부터 재활용함 ❑In some cases, page cache brings issues: ▪ Cache size constantly goes up and the speed of freeing page cache cannot follow the speed of growth. ▪ The system performance drops down due to seeking free pages or swapping the pages out to free space in despite of large page cache.
  • 31. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 Cache Memory 성능개선방안 ❑Increase the tendency of page cache reclaim ❑Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes. ❑Increasing min_free_kbytes up to 5% of the physical memory and it will keep that amount of memory as free ❑Let kernel flushing dirty pages more early # sysctl -w vm.dirty_background_ratio = 10 // decreasing it // by flushed, smaller I/O stream, less dirty page cache # sysctl -w vm.dirty_ratio = 20 // decreasing it // by application, synchronous writes # sysctl -w vm.vfs_cache_pressure = 120 // increasing it # sysctl -w vm.min_free_kbytes = 14066 // increasing it
  • 32. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 Cache Memory 성능개선방안 ❑Decrease swappiness ▪ Even though there is plenty of cache memory that could be easily freed, kernel can swap out data to keep the pages of memory that are likely to be needed in the near future ▪ Less likely to swap, and thus more likely to write data out to disk ❑Reclaim all clean pages ❑Useful to free the cache before running jobs what requires the large amount of fee space.. # cat 10 > /proc/sys/vm/swappiness # echo 1 > /proc/sys/vm/drop_caches
  • 34. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 I/O 서브시스템의 이해 ❑Read or write requests are transformed into block device requests that go into a queue. ▪ The I/O subsystem then batches similar requests that come within a specific time window and processes them all at once. ❑Generally, the I/O subsystem does not operate in a true FIFO manner. It processes queued read/write requests depending on the selected scheduler algorithms called elevators because they operate in the same manner that real-life building elevators do. # cat /sys/block/<device>/queme/schduler noop anticipatory deadline [cfq]
  • 35. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 I/O 서브시스템의 이해 ❑Think about how Hard disk drive works ❑To improve the overall I/O performance ▪ Re-arrange the requests, ▪ Wisely choice when will the requests are served I/O Queue New I/O requests Drop the performance to seek the location for each requests
  • 36. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 I/O 서브시스템의 성능개선방안 ❑Completely Fair Queuing – cfq ▪ Default I/O scheduler in EL 5, 6, 7 ▪ Equally divide all available I/O bandwidth among all processes issuing I/O requests. ❑Deadline - deadline ▪ large sequential read-mostly workload ▪ Guarantee a response time for each request. Once a request reaches its expiration time, it is serviced immediately # echo dealine > /sys/block/<device>/queue/schduler
  • 37. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 I/O 서브시스템의 성능개선방안 ❑Anticipatory – anticipatory ▪ Optimize systems with small or slow disk subsystems. ▪ Recommended for servers running data processing applications that are not regularly interrupted by external requests. ❑NOOP - noop ▪ For systems which consumes heavy CPU workload ▪ All requests into a simple unordered queue ▪ Recommended for virtualized guests elevator=noop // kernel parameter
  • 38. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 저널링 파일시스템의 이해 ❑Journaling file system is quickly recovered by a log book for the file system. ▪ Any change of the file system will be made in a journal as a transaction before committing them to the actual file system. ▪ In the event of a system crash or power failure, the file systems are quickly recovered and less likely to be corrupted. ▪ It’s very important feature in Enterprise market ❑ext3, ext4, xfs are journaling file systems ❑EL6 uses ext4, EL7 uses xfs as its default file system
  • 39. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 저널링 파일시스템의 성능개선방안 ❑To improve journal performance ▪ Place the journal from the file system on a separate device like an SSD. ▪ Reduce the visit count on the actual file system. ❑But needs to carefully manage the journal-filesystem pair // Create a new external journal disk # mkfs.ext4 -O journal_dev -v 4096 /dev/sdj1 // Create a new filesystem with the external journal disk # mkfs.ext4 -J device=/dev/sdj1 -b 4096 /dev/sde1
  • 40. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 파일시스템 Barrier의 이해 ❑A write barrier is used to mitigate the risk of data corruption during power loss. ❑Storage devices may have write caches. They will report I/O as “complete” when the data is still in cache. If the cache loses power, it also loses data. ❑Some storage devices use battery-backed write caches. The data will be survive in power failure. However, it could change the original metadata ordering. The commit block may be present on disk without associated transaction in place. ❑Therefore, ext4 and xfs turn on barriers by default in EL 6, and 7
  • 41. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 파일시스템 Barrier 성능개선방안 ❑Enabling write barriers causes a significant performance penalty. ❑Enabling it can have an impact on workloads that create or remove lots of small files, much less (close to no) impact on streaming write workloads and no impact on read workloads. ❑In general, the write barrier can be disabled if the storage device uses battery-backed write caches. ❑Write barriers are also unnecessary whenever the system uses hardware RAID controllers with battery-backed write cache. # mount -o nobarrier ....
  • 43. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 네트워크 성능 저하에 따른 패킷 손실 Overrun : usually seen under heavy UDP traffic Dropped : seen under both heavy UDP/TCP traffic bond1 Link encap:Ethernet HWaddr 00:AA:BB:CC:DD:EE inet addr:192.168.10.33 Bcast:192.168.10.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 RX packets:8344569671 errors:0 dropped:0 overruns:46295 frame:0 TX packets:53614 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2952210470156 (2.6 TiB) TX bytes:5251386 (5.0 MiB) eth0 Link encap:Ethernet HWaddr UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:27051811 errors:0 dropped:696311 overruns:0 frame:0 TX packets:110147381 errors:0 dropped:0 overruns:0 carrier:0
  • 44. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 네트워크 패킷 손실 최소화를 위한 성능개선방안 ❑ Update kernel and firmware to the latest versions ▪ The many of packet loss problems can be resolved with the latest kernel and firmware. ❑Increase the NIC’s ring buffer ▪ NIC has their own buffer called ‘ring buffer’ ▪ Increasing its size may avoid overrun as well as packet drop. # ethtool -g eth1 Pre-set maximums: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 4096 .... # ethtool --set-ring eth1 4096 // set the pre-set maximums
  • 45. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 네트워크 패킷 손실 최소화를 위한 성능개선방안 ❑ Increase the rate of draining the receive queues ▪ net.core.dev_weight : how many packets any individual network interface can process during each NAPI poll. ▪ net.core.netdev_budget : maximum number of packets taken from all interfaces in one polling cycle (NAPI poll). ❑Balance interrupt handling, or pin interrupts to proper CPUs if needed # cat /etc/sysctl.conf net.core.dev_weight = 64 // increasing it net.core.netdev_budget = 300 // “
  • 46. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 TCP Window에 대한 이해 ❑Whenever receiving a packet, the receiver needs to send an ACK to the sender under TCP protocol. The sender also needs to wait for the ack. ▪ Will affect network throughput and CPU utilization ❑If the network is long and slow like satellite network, or has a larger bandwidth, more packets can be on the link between a sender and receiver at a time. ▪ TCP window allows the sender sending more packets without ACKs. ▪ The length of TCP window is variable based on the size of TCP socket buffer.
  • 47. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 TCP Window에 대한 이해 ❑If applications slowly fetch packets from socket buffers, the buffers are going to be full and start to drop packets ❑Get better performance by increasing TCP socket buffer size. Receiver Sender Receiver Receive window : 4 Sender Sender Ack : receive window 2
  • 48. ‹#› 難攻不落(난공불락) 오픈소스 인프라 세미나 TCP Throughput 성능개선방안 ❑Need to set the proper values carefully because each socket will consume memory in size of the default buffer size when opening it. ❑Setting the TCP buffer size too large will seriously affect network speed and latencies for connections that send small amounts of data (such as HTTP or SSH) # cat /etc/sysctl.conf net.core.rmem_default = 4194304 net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 net.core.rmem_default = 8388608 net.core.wmem_default = 8388608 net.ipv4.tcp_rmem = 8172 4194304 8288608 // net.ipv4.tcp_rmem 4096 8288608 67108864 // more aggressive