[Pgday.Seoul 2017] 2. PostgreSQL을 위한 리눅스 커널 최적화 - 김상욱

PostgreSQL을 위한
리눅스 커널 최적화
김상욱
PgDay.Seoul 2017

2
김상욱
• Co-founder and CEO @ Apposha
• 성균관대 컴퓨터공학 박사과정
• 4편의 SCI 저널 저술, 6편의 국제학술대회 논문 발표
• 클라우드/가상화 분야
• 멀티코어 스케줄링 [ASPLOS’13, VEE’14]
• 그룹 기반 메모리 관리 [JPDC’14]
• 데이터베이스/저장장치 분야
• 비휘발성 캐시 관리 [USENIX ATC’15, ApSys’16]
• 리퀘스트 중심 I/O 우선 처리 [FAST’17, HotStorage’17]

Apposha?
3
완전 관리형 서비스
MySQL 대비 5배 성능
상용 제품 대비 1/10 비용
for

Contents
• PostgreSQL 성능 관점에서의 리눅스 커널 최적화
• 백그라운드 작업 영향 최소화
• Full page writes 제거
• Parallel query 처리 최적화
4

Contents
5

PostgreSQL 구조
6
* 내부 프로세스 종류
- Backend (foreground)
- Checkpointer
- Autovacuum workers
- WAL writer
- Writer
- …
Storage Device
Linux Kernel
P1
클라이언트
P2
I/O
P3 P4
요청 응답
I/O I/O I/O
PostgreSQL
DB 성능

PostgreSQL 구조
7
* 내부 프로세스 종류
- Backend (foreground)
- Checkpointer
- Autovacuum workers
- WAL writer
- Writer
- …
Storage Device
Linux Kernel
P1
클라이언트
P2
I/O
P3 P4
요청 응답
I/O I/O I/O
PostgreSQL

PostgreSQL Checkpoint
8
http://www.interdb.jp/pg/pgsql09.html

PostgreSQL Autovacuum
9
Free Space Map
http://bstar36.tistory.com/308

백그라운드 태스크 영향
10
• Dell Poweredge R730
• 32 cores
• 132GB DRAM
• 1 SAS SSD
• PostgreSQL v9.5.6
• 52GB shared_buf
• 10GB max_wal
• TPC-C workload
• 50GB dataset
• 1 hour run
• 50 clients
0
200
400
600
800
1000
1200
0 500 1000 1500 2000 2500 3000
최대응답지연(ms)
경과시간 (초)
Default
최대 1.1초

11
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1 50 100 150 200 250 300
처리량(trx/sec)
클라이언트 수
Default Aggressive AV
처리량 40%
autovacuum_max_workers
3 => 6
autovacuum_naptime
1min => 15s
autovacuum_vacuum_threshold
50 => 25
autovacuum_analyze_threshold
50 => 10
autovacuum_vacuum_scale_factor
0.2 => 0.1
autovacuum_analyze_scale_factor
0.1 => 0.05
autovacuum_vacuum_cost_delay
20ms => -1
autovacuum_vacuum_cost_limit
-1 => 1000

12
• Dell Poweredge R730
• 32 cores
• 132GB DRAM
• 1 SAS SSD
• 52GB shared_buf
• 10GB max_wal
• TPC-C workload
• 50GB dataset
• 1 hour run
• 50 clients
0
500
1000
1500
2000
2500
0 500 1000 1500 2000 2500 3000
경과시간 (초)
최대 2.5초

I/O 우선순위 적용의 어려움
• 각 계층에서의 독립적인 I/O 처리
13
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction

14
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Buffer Cache
read() write()
admission
control

15
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Buffer Cache
read() write()
Block-level Q
admission
control

16
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Buffer Cache
read() write()
Block-level Q
reorder
FG FG BGBG

17
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Buffer Cache
read() write()
Block-level Q
reorder
FG FG BGBG
Device-internal Q
admission
control

18
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Buffer Cache
read() write()
Block-level Q
reorder
FG FG BGBG
Device-internal QBG FG BGBG
reorder

19
• I/O 우선순위 역전
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Locks
Condition variables

20
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Locks
Condition variablesCondition variables
I/OFG
lock
BG
wait

21
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Locks
I/OFG
lock
BG
wait
I/OFG
lock
BG
wait
FG
wait
wait
BGvar
wake

22
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Locks
I/OFG
lock
BG
wait
I/OFG
lock
BG
wait
FG
wait
wait
BGvar
wake
I/O
FG
wait
wait
BGuser
var
wake
FG
wait

리퀘스트 중심 I/O 우선처리
• 솔루션 v1 (사후대책)
• 전체 계층에서의 우선순위 적용 [FAST’17]
• 동적 우선순위 상속 [USENIX ATC’15, FAST’17]
23
FG
lock
BG I/OFG BG
submit
complete
FG BG
FG
wait
BG
register
BG
inherit
FG BGI/O
submit
complete
wake
CV CV CV
[USENIX ATC’15] Request-Oriented Durable Write Caching for Application Performance
[FAST’17] Enlightening the I/O Path: A Holistic Approach for Application Performance

• 솔루션 v1 (문제점)
24

• 솔루션 v2 (사전예방)
25
Device Driver
Noop CFQ Deadline Apposha I/O Scheduler
Block Layer
Ext4 XFS F2FS
VFS
Apposha Front-End File System
Etc
리눅스 커널 I/O 스택
PageCache
- 우선순위 기반 I/O 스케줄링
- 디바이스 큐 혼잡 제어
- 우선순위 기반 쓰기 I/O 제어
- OS 캐싱 효율성 향상

26
Apposha 최적화 엔진
MongoDB
Library
PostgreSQL
Library
Elasticsearch
Library
V12-M V12-P V12-E
- 태스크 중요도, 파일 접근 패턴 분류
Front-End File System I/O Scheduler
• V12 성능 최적화 엔진

27
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1 50 100 150 200 250 300
처리량(trx/sec)
클라이언트 수

28
0
2000
4000
6000
8000
10000
12000
1 50 100 150 200 250 300
처리량(trx/sec)
클라이언트 수
Default Aggressive AV V12-P
처리량
71-86%

29
0
500
1000
1500
2000
2500
0 500 1000 1500 2000 2500 3000
경과시간 (초)
Default Aggressive AV V12-P
최대 0.15초

Contents
30

Full Page Writes
31
Tuning PostgreSQL for High Write Workloads – Grant McAlister

Full Page Writes
32

Full Page Writes
33

Full Page Writes
34

Full Page Writes
35

Full Page Writes
36

Full Page Writes
37

Full Page Writes
38

Full Page Writes
• WAL 크기
39
0
2
4
6
8
10
12
14
0 100 200 300 400 500 600
WAL크기(GB)
경과시간 (초)
Default
• 52GB shared_buf
• 1GB max_wal
• TPC-C workload
• 50GB dataset
• 10 min run
• 200 clients

Full Page Writes
40
https://www.toadworld.com/platforms/postgres/b/weblog/archive/2017/06/07/a-first-look-at-
amazon-aurora-with-postgresql-compatibility-benefits-and-drawbacks-part-v

WAL Compression
• WAL 크기
41
0
2
4
6
8
10
12
14
0 100 200 300 400 500 600
WAL크기(GB)
경과시간 (초)
Default WAL compression

WAL Compression
• 성능 영향
42
0
2000
4000
6000
8000
10000
12000
처리량(trx/sec)

WAL Compression
• CPU 사용률
43
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600
CPU사용률(%)
경과시간 (초)

Atomic Write 지원
44
매핑 레이어
PostgreSQL
Linux Kernel

Atomic Write 지원
45
매핑 레이어
DB 파일
open()
DB 쉐도우 파일
PostgreSQL
Linux Kernel

Atomic Write 지원
46
매핑 레이어
DB 파일
write()
DB 쉐도우 파일
PostgreSQL
Linux Kernel

Atomic Write 지원
47
매핑 레이어
DB 파일
read()
DB 쉐도우 파일
PostgreSQL
Linux Kernel

Atomic Write 지원
48
매핑 레이어
DB 파일
fsync()
DB 쉐도우 파일
체크포인트
PostgreSQL
Linux Kernel

Atomic Write 지원
49
매핑 레이어
DB 파일
write()
DB 쉐도우 파일

Atomic Write 지원
50
매핑 레이어
DB 파일
write()
DB 쉐도우 파일

Atomic Write 지원
51
매핑 레이어
DB 파일 DB 쉐도우 파일
read()

Atomic Write 지원
• WAL 크기 (max_wal_size=1GB)
52
0
2
4
6
8
10
12
14
0 100 200 300 400 500 600
WAL크기(GB)
경과시간 (초)
Default WAL compression V12-P

Atomic Write 지원
• 성능 영향
53
0
2000
4000
6000
8000
10000
12000
처리량(trx/sec)
처리량 2X

Atomic Write 지원
• CPU 사용률
54
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600
CPU사용률(%)
경과시간 (초)

Contents
55

Parallel Query
56
https://blog.2ndquadrant.com/whats-new-in-postgres-xl-9-6/

Parallel Query
57
느린 응답
빠른 응답
사용자 쿼리
백엔드
워커
cache
storage
cache
storage
cache
storage
cache
storage
hit hitmiss!
워커 워커 워커
miss!

Parallel Query
• 기존 문제점
58

Parallel Query 처리 최적화
• 성능 영향
59
pgbench –i –s 1000 test1
pgbench –i –s 1000 test2
Q1: SELECT * FROM pgbench_accounts WHERE filler LIKE ‘%x% -d test1
Q2: SELECT * FROM pgbench_accounts WHERE filler LIKE ‘%x% -d test2
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=1000.00..1818916.53 rows=1 width=97)
Workers Planned: 7
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..1817916.43 rows=1 width=97)
Filter: (filler ~~ '%x%'::text)

• 성능 영향
60
0
5
10
15
20
25
30
35
40
Solo Deadline Noop CFQ Task-aware
응답시간(초)
Q1 Q2

• 성능 영향 (디바이스 큐 off)
61
0
20
40
60
80
100
Solo Deadline Noop CFQ Task-aware
응답시간(초)
Q1 Q2 평균 응답
30% 향상

62
H http://apposha.io
F www.facebook.com/apposha
M sangwook@apposha.io

[Pgday.Seoul 2017] 2. PostgreSQL을 위한 리눅스 커널 최적화 - 김상욱

More Related Content

What's hot

Similar to [Pgday.Seoul 2017] 2. PostgreSQL을 위한 리눅스 커널 최적화 - 김상욱

More from PgDay.Seoul

[Pgday.Seoul 2017] 2. PostgreSQL을 위한 리눅스 커널 최적화 - 김상욱