3. 3
難攻不落(난공불락) 오픈소스 인프라 세미나
세션의 한계점
1시간내에 시스템 튜닝에 대해서 모두 젂달하는 것에 대한 한계
=> 주요 개념 및 기본적인 튜닝에 초점
튜닝시 Pre-requirement
먼저 튜닝을 위해서는 하드웨어와 소프트웨어 모두에 대한 이해 필요
더불어 시스템간의 상호작용에 대한 이해 필요
튜닝시 고려사항
사용자/관리자 요소도 반드시 고려
사용자 실수?, 개념의 오해?
모든 사람이 튜닝에 대해 이해하고 있다고 가정하면 안됨
튜닝시 주의사항
시스템 튜닝은 마법이 아님
종종 하드웨어 업그레이드와 부하 분산이 필요
시작하기
4. 4
難攻不落(난공불락) 오픈소스 인프라 세미나
시스템 튜닝에 대해 설명하거나 개선을 목표로 할때 반드시 구분해서 사용해야할 두가지
Low-latency – Latency is a measure of time delay experienced in a system, the
precise definition of which depends on the system and the time being
measured.[1]
High-throughput – The system throughput or aggregate throughput is the sum of
the data rates that are delivered to all terminals in a network or disk-drive.[1]
[1] : wikipedia.org
시작하기
5. 5
難攻不落(난공불락) 오픈소스 인프라 세미나
Tuning the hardware and firmware first
In many cases, it will bring much better result than the software tuning
Refer to the hardware manual
As a trade-off, power reduction features usually affect the overall performance,
especially latency more than what we expect.
Disabling and removing unused services
시작하기
12. 12
難攻不落(난공불락) 오픈소스 인프라 세미나
Turn off tickless kernel
Limit ACPI and Intel’s C-State
Turn off ‘Transparent Huge Page’
Turn off ‘CGroup’ feature
Check what services are running
Disable unused services
기본 튜닝
13. 13
難攻不落(난공불락) 오픈소스 인프라 세미나
If you don’t know what to do, then use tuned instead
Tuned is a daemon that monitors the use of system components and dynamically
tunes system settings based on that monitoring information.
It includes predefined profiles for specific use cases.
기본 튜닝
# yum install tuned
# service tuned start
# chkconfig tuned on
# tuned-adm active // ‘default’ profile
# tuned-adm list
# tuned-adm profile [profile_name]
14. 14
難攻不落(난공불락) 오픈소스 인프라 세미나
The predefined profiles (in EL 6)
It is possible to customize the profile
기본 튜닝
# tuned-adm list
- laptop-ac-powersave
- desktop-powersave
- enterprise-storage
- default
- virtual-guest
- throughput-performance
- laptop-battery-powersave
- server-powersave
- latency-performance
- spindown-disk
- virtual-host
# tuned-adm profile latency-performance
16. 16
難攻不落(난공불락) 오픈소스 인프라 세미나
Linear address space (Virtual Address)
TLB
Physical
Memory
MMU (in CPU)
Linear address space (Virtual Address)
Offset within PGD Offset within PMD Offset within PTE Offset within Data
Yes
No (TLB miss)
page fault
메모리 어드레싱 개요
17. 17
難攻不落(난공불락) 오픈소스 인프라 세미나
Physical memory is divided into a page and the default page size is 4 KiB.
If a system has a large amount of memory and the workload requires accessing a
large and continuous memory space, TLB miss will be rapidly increasing.
대용량 물리 메모리 환경에서의 문제
Translation Lookaside Buffer (TLB)
Translating linear addresses into physical addresses takes time, so most processors
have a small cache known as a TLB that stores the physical addresses associated
with the most recently accessed virtual addresses.
TLB is a small cache so large memory applications can incur high TLB miss rates,
and TLB misses are extremely expensive on today’s very fast, pipelined CPUs.
18. 18
難攻不落(난공불락) 오픈소스 인프라 세미나
The IA-32 architecture supports either 4 KiB, 2 MiB or 4 MiB pages.
The Linux kernel also supports large sized pages – 2MB and 1GB - through the
HugePage mechanism.
Having fewer TLB entries that point to more memory means that a TLB hit is more
likely to occur.
대용량 메모리 환경 성능개선방안 - Hugepage
Standard HugePage (EL 4, 5, 6)
2 MB per page
Reserve/Free via /proc/sys/vm/nr_hugepages
Used via hugetlbfs
GB HugePage (EL 6, 7)
1 GB per page
Reserved at boot time/No freeing
Used via hugetlbfs
19. 19
難攻不落(난공불락) 오픈소스 인프라 세미나
Enabled by default in EL 6 for all applications.
The kernel attempts to allocate hugepages whenever possible and any process will
receive 2MB pages if the mmap region is 2MB naturally aligned.
If no hugepages are available, the kernel will fall back to the regular 4KB pages.
THP are also swappable (unlike hugetlbfs). This is achieved by breaking the huge
page to smaller 4KB pages, which are then swapped out normally.
No modification is required for applications
Carefully use when running with Big Data or DBMS solutions
대용량 메모리 환경 성능개선방안 – Transparent Hugepage
21. 21
難攻不落(난공불락) 오픈소스 인프라 세미나
Swap에 대한 이해
Swap space increases the amount of effective memory on a system. As free
memory drops, old pages can be paged out to disk to free memory space for
other uses.
Anonymous pages but inactive will be selected
Recently, systems have the large amount of physical memory. Is swap space
obsolete?
Without swap space, the anonymous pages can't be flushed. They have to
stay in memory until they're deleted. Even if they're never used again.
Flushing pages to swap is actually a bit easier and quicker than flushing them
to disk: the code is much simpler, and there are no directory trees to update.
22. 22
難攻不落(난공불락) 오픈소스 인프라 세미나
Cache Memory에 대한 이해
To reduce service time for slower subsystems (I/O), kernel uses different type of
caches:
Slab Cache :
Store various types of data structures kernel uses and these data structures
don’t fit into a single page of memory.
Allocate the slab from the pre-allocated memory area.
Swap Cache :
Track of pages previously swapped out and now swapped in.
If needs to swap out the page again, and it finds an entry in the swap cache,
the page doesn’t need to be written to disk.
23. 23
難攻不落(난공불락) 오픈소스 인프라 세미나
Cache Memory에 대한 이해
Page Cache (File-backed, no swapable) :
To improve the overall performance of a system, the kernel tends to use
memory as a cache to store data being read from or written to disk as much
as possible.
These data can be re-used from RAM without having I/O requests to the disk.
In some cases, page cache brings issues:
Cache size constantly goes up and the speed of freeing page cache cannot
follow the speed of growth.
The system performance drops down due to seeking free pages or swapping
the pages out to free space in despite of large page cache.
25. 25
難攻不落(난공불락) 오픈소스 인프라 세미나
I/O 서브시스템의 이해
Read or write requests are transformed into block device requests that go into a
queue.
The I/O subsystem then batches similar requests that come within a specific
time window and processes them all at once.
Generally, the I/O subsystem does not operate in a true FIFO manner. It processes
queued read/write requests depending on the selected scheduler algorithms called
elevators because they operate in the same manner that real-life building elevators
do.
# cat /sys/block/<device>/queme/schduler
noop anticipatory deadline [cfq]
26. 26
難攻不落(난공불락) 오픈소스 인프라 세미나
I/O 서브시스템의 이해
Think about how Hard disk drive works
To improve the overall I/O performance
Re-arrange the requests,
Wisely choice when will the requests are served
I/O Queue
New I/O requests
Drop the performance to seek
the location for each requests
27. 27
難攻不落(난공불락) 오픈소스 인프라 세미나
I/O 서브시스템의 성능개선방안
Completely Fair Queuing – cfq
Default I/O scheduler in EL 5, 6, 7
Equally divide all available I/O bandwidth among all processes issuing I/O
requests.
Deadline - deadline
large sequential read-mostly workload
Guarantee a response time for each request. Once a request reaches its
expiration time, it is serviced immediately
# echo dealine > /sys/block/<device>/queue/schduler
28. 28
難攻不落(난공불락) 오픈소스 인프라 세미나
I/O 서브시스템의 성능개선방안
Anticipatory – anticipatory
Optimize systems with small or slow disk subsystems.
Recommended for servers running data processing applications that are not
regularly interrupted by external requests.
NOOP - noop
For systems which consumes heavy CPU workload
All requests into a simple unordered queue
Recommended for virtualized guests
elevator=noop // kernel parameter
29. 29
難攻不落(난공불락) 오픈소스 인프라 세미나
저널링 파일시스템의 이해
Journaling file system is quickly recovered by a log book for the file system.
Any change of the file system will be made in a journal as a transaction
before committing them to the actual file system.
In the event of a system crash or power failure, the file systems are quickly
recovered and less likely to be corrupted.
It’s very important feature in Enterprise market
ext3, ext4, xfs are journaling file systems
EL6 uses ext4, EL7 uses xfs as its default file system
31. 31
難攻不落(난공불락) 오픈소스 인프라 세미나
네트워크 성능 저하에 따른 패킷 손실
Overrun : usually seen under heavy UDP traffic
Dropped : seen under both heavy UDP/TCP traffic
bond1 Link encap:Ethernet HWaddr 00:AA:BB:CC:DD:EE
inet addr:192.168.10.33 Bcast:192.168.10.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500
RX packets:8344569671 errors:0 dropped:0 overruns:46295 frame:0
TX packets:53614 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2952210470156 (2.6 TiB) TX bytes:5251386 (5.0 MiB)
eth0 Link encap:Ethernet HWaddr
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:27051811 errors:0 dropped:696311 overruns:0 frame:0
TX packets:110147381 errors:0 dropped:0 overruns:0 carrier:0
32. 32
難攻不落(난공불락) 오픈소스 인프라 세미나
TCP Window에 대한 이해
Whenever receiving a packet, the receiver needs to send an ACK to the sender
under TCP protocol. The sender also needs to wait for the ack.
Will affect network throughput and CPU utilization
If the network is long and slow like satellite network, or has a larger bandwidth,
more packets can be on the link between a sender and receiver at a time.
TCP window allows the sender sending more packets without ACKs.
The length of TCP window is variable based on the size of TCP socket buffer.
33. 33
難攻不落(난공불락) 오픈소스 인프라 세미나
TCP Window에 대한 이해
If applications slowly fetch packets from socket buffers, the buffers are going to be
full and start to drop packets
Get better performance by increasing TCP socket buffer size.
Receiver
Sender
Receiver
Receive window : 4
Sender
Sender
Ack : receive window 2