SlideShare a Scribd company logo
Enlightening the I/O Path:
A Holistic Approach for Application Performance
Sangwook Kim13, Hwanju Kim2, Joonwon Lee3, and Jinkyu Jeong3
Apposha1
Dell EMC2
Sungkyunkwan University3
Data-Intensive Applications
2
Relational
Key-value
SearchColumn
Document
Data-Intensive Applications
• Common structure
3
Storage Device
Operating System
T1
Client
T2
I/O
T3 T4
Request Response
I/O I/O I/O
Application
Application
performance
* Example: MongoDB
- Client (foreground)
- Checkpointer
- Log writer
- Eviction worker
- …
Data-Intensive Applications
• Common structure
4
Storage Device
Operating System
T1
Client
T2
I/O
T3 T4
Request Response
I/O I/O I/O
Application
Application
performance
* Example: MongoDB
- Server (client)
- Checkpointer
- Log writer
- Evict worker
- …
Background tasks are problematic
for application performance
Application Impact
5
• Illustrative experiment
• YCSB update-heavy workload against MongoDB
Application Impact
6
• Illustrative experiment
• YCSB update-heavy workload against MongoDB
0
5000
10000
15000
20000
25000
30000
0 200 400 600 800 1000 1200 1400 1600 1800
Operationthroughput
(ops/sec)
Elapsed time (sec)
CFQ
Regular
checkpoint task
30 seconds latency
at 99.99th percentile
0
5000
10000
15000
20000
25000
30000
0 200 400 600 800 1000 1200 1400 1600 1800
Operationthroughput
(ops/sec)
Elapsed time (sec)
CFQ CFQ-IDLE
Application Impact
7
• Illustrative experiment
• YCSB update-heavy workload against MongoDB
I/O priority does
not help
0
5000
10000
15000
20000
25000
30000
0 200 400 600 800 1000 1200 1400 1600 1800
Operationthroughput
(ops/sec)
Elapsed time (sec)
CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO
Application Impact
8
• Illustrative experiment
• YCSB update-heavy workload against MongoDB
State-of-the-art
schedulers do
not help much
What’s the Problem?
• Independent policies in multiple layers
• Each layer processes I/Os w/ limited information
• I/O priority inversion
• Background I/Os can arbitrarily delay foreground tasks
9
What’s the Problem?
• Independent policies in multiple layers
• Each layer processes I/Os w/ limited information
• I/O priority inversion
• Background I/Os can arbitrarily delay foreground tasks
10
Multiple Independent Layers
• Independent I/O processing
11
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Multiple Independent Layers
12
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Buffer Cache
read() write()
admission
control
Abstraction
• Independent I/O processing
Multiple Independent Layers
• Independent I/O processing
13
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Buffer Cache
read() write()
Block-level Q
admission
control
Abstraction
Multiple Independent Layers
• Independent I/O processing
14
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Buffer Cache
read() write()
reorder
FG FG BGBG
Multiple Independent Layers
• Independent I/O processing
15
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Buffer Cache
read() write()
FG FGBG
Device-internal Q
admission
control
Multiple Independent Layers
• Independent I/O processing
16
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Abstraction
Buffer Cache
read() write()
FG FGBG
BG FG BGBG
reorder
What’s the Problem?
• Independent policies in multiple layers
• Each layer processes I/Os w/ limited information
• I/O priority inversion
• Background I/Os can arbitrarily delay foreground tasks
17
I/O Priority Inversion
• Task dependency
18
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Locks
Condition variables
I/O Priority Inversion
• Task dependency
19
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Condition variables
I/OFG
lock
BG
wait
I/O Priority Inversion
• Task dependency
20
Storage Device
Caching Layer
Application
File System Layer
Block Layer
I/OFG
lock
BG
wait
FG
wait
wait
BGvar
wake
I/O Priority Inversion
• Task dependency
21
Storage Device
Caching Layer
Application
File System Layer
Block Layer
I/O
FG
wait
wait
BGuser
var
wake
FG
wait
I/O Priority Inversion
• I/O dependency
22
Storage Device
Caching Layer
Application
File System Layer
Block Layer
Outstanding I/Os
I/O Priority Inversion
• I/O dependency
23
Storage Device
Caching Layer
Application
File System Layer
Block Layer
I/OFG
wait
For ensuring consistency
and/or durability
24
100 ms latency at
99.99th percentile
0
5000
10000
15000
20000
25000
30000
35000
0 200 400 600 800 1000 1200 1400 1600 1800
Operationthroughput
(ops/sec)
Elapsed time (sec)
CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP
• Request-centric I/O prioritization (RCP)
• Critical I/O: I/O in the critical path of request handling
• Policy: holistically prioritizes critical I/Os along the I/O path
Our Approach
Challenges
• How to accurately identify I/O criticality
• How to effectively enforce I/O criticality
25
Critical I/O Detection
• Enlightenment API
• Interface for tagging foreground tasks
• I/O priority inheritance
• Handling task dependency
• Handling I/O dependency
26
I/O Priority Inheritance
• Handling task dependency
• Locks
• Condition variables
27
FG
lock
BG I/OFG
inherit
BG
submit
complete
FG BG
unlock
FG
wait
BG
register
BG
inherit
FG BGI/O
submit
complete
wake
CV CV CV
I/O Priority Inheritance
• Handling I/O dependency
28
I/O Priority Inheritance
• Handling I/O dependency
29
Block Layer
I/OI/O
I/O Priority Inheritance
• Handling I/O dependency
30
Block Layer
Q admission stage
I/OI/O
Sched queueing stage
I/O Priority Inheritance
• Handling I/O dependency
31
Block Layer
PER-DEV
ROOT
NCIO NCIO
NCIO NCIO
Q admission stage
I/OI/O
Sched queueing stage
Non-critical I/O tracking
Descriptor
Location
Resolver
Sector #
I/O Priority Inheritance
• Handling I/O dependency
32
Block Layer
Descriptor
Location
Resolver
Sector #
allocate
Q admission stage
I/O
I/O
I/O
Sched queueing stage
Non-critical I/O tracking
PER-DEV
ROOT
NCIO NCIO
NCIO NCIO
I/O Priority Inheritance
• Handling I/O dependency
33
Block Layer
Q admission stage
I/O
I/O
I/O
Sched queueing stage
Non-critical I/O tracking
update
Descriptor
Location
Resolver
Sector #
PER-DEV
ROOT
NCIO NCIO
NCIO NCIO
I/O Priority Inheritance
• Handling I/O dependency
34
Block Layer
Q admission stage
I/O
I/O
Sched queueing stage
I/O
Non-critical I/O tracking
update Descriptor
Location
Resolver
Sector #
PER-DEV
ROOT
NCIO NCIO
NCIO NCIO
I/O Priority Inheritance
• Handling I/O dependency
35
Block Layer
Q admission stage
I/O
I/O
Sched queueing stage
I/O
Non-critical I/O tracking
update
Descriptor
Location
Resolver
Sector #
PER-DEV
ROOT
NCIO NCIO
NCIO NCIO
I/O Priority Inheritance
• Handling I/O dependency
36
Block Layer
Q admission stage
I/O
I/O
Sched queueing stage
I/O
Non-critical I/O tracking
update
Descriptor
Location
Resolver
Sector #
PER-DEV
ROOT
NCIO NCIO
NCIO NCIO
I/O Priority Inheritance
• Handling I/O dependency
37
Block Layer
Q admission stage
I/O
I/O
Sched queueing stage
I/O
Non-critical I/O tracking
Descriptor
Location
Resolver
Sector #
PER-DEV
ROOT
NCIO NCIO
NCIO NCIO
delete on
completion
Handling Transitive Dependency
• Possible states of dependent task
38
FG
inherit
BG BG
Blocked
on task
I/OFG
inherit
BG
wait wait
Blocked
on I/O
FG
inherit
BG
wait
Blocked at
admission stage
Handling Transitive Dependency
• Recording blocking status
39
FG
inherit
BG BG
Blocked
on task
I/OFG
inherit
BG
wait wait
Blocked
on I/O
FG
inherit
BG
wait
Blocked at
admission stage
Handling Transitive Dependency
• Recording blocking status
40
FG
inherit
BG BG
Task is
recorded
I/OFG
inherit
BG
wait
Blocked
on I/O
FG
inherit
BG
wait
Blocked at
admission stage
inherit
Handling Transitive Dependency
• Recording blocking status
41
FG
inherit
BG BG I/OFG
inherit
BG
reprio
I/O is
recorded
FG
inherit
BG
wait
Blocked at
admission stage
Task is
recorded
inherit
Handling Transitive Dependency
• Recording blocking status
42
FG
inherit
BG BG I/OFG
inherit
BG FG
inherit
BG
retryreprio
I/O is
recorded
Task is
recorded
inherit
Challenges
• How to accurately identify I/O criticality
• Enlightenment API
• I/O priority inheritance
• Recording blocking status
• How to effectively enforce I/O criticality
43
Criticality-Aware I/O Prioritization
• Caching layer
• Apply low dirty ratio for non-critical writes (1% by default)
• Block layer
• Isolate allocation of block queue slots
• Maintain 2 FIFO queues
• Schedule critical I/O first
• Limit # of outstanding non-critical I/Os (1 by default)
• Support queue promotion to resolve I/O dependency
44
Evaluation
• Implementation on Linux 3.13 w/ ext4
• Application studies
• PostgreSQL relational database
• Backend processes as foreground tasks
• I/O priority inheritance on LWLocks (semop)
• MongoDB document store
• Client threads as foreground tasks
• I/O priority inheritance on Pthread mutex and condition vars (futex)
• Redis key-value store
• Master process as foreground task
45
Evaluation
• Experimental setup
• 2 Dell PowerEdge R530 (server & client)
• 1TB Micron MX200 SSD
• I/O prioritization schemes
• CFQ (default), CFQ-IDLE
• SPLIT-A (priority), SPLIT-D (deadline) [SOSP’15]
• QASIO [FAST’15]
• RCP
46
Application Throughput
• PostgreSQL w/ TPC-C workload
47
0
1000
2000
3000
4000
5000
6000
7000
10GB dataset 60GB dataset 200GB dataset
Transactionthroughput
(trx/sec)
CFQ CFQ-IDLE
Application Throughput
• PostgreSQL w/ TPC-C workload
48
0
1000
2000
3000
4000
5000
6000
7000
10GB dataset 60GB dataset 200GB dataset
Transactionthroughput
(trx/sec)
CFQ CFQ-IDLE SPLIT-A
Application Throughput
• PostgreSQL w/ TPC-C workload
49
0
1000
2000
3000
4000
5000
6000
7000
10GB dataset 60GB dataset 200GB dataset
Transactionthroughput
(trx/sec)
CFQ CFQ-IDLE SPLIT-A SPLIT-D
Application Throughput
• PostgreSQL w/ TPC-C workload
50
0
1000
2000
3000
4000
5000
6000
7000
10GB dataset 60GB dataset 200GB dataset
Transactionthroughput
(trx/sec)
CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO
Application Throughput
• PostgreSQL w/ TPC-C workload
51
0
1000
2000
3000
4000
5000
6000
7000
10GB dataset 60GB dataset 200GB dataset
Transactionthroughput
(trx/sec)
CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP
37%
31%
28%
Application Throughput
• Impact on background task
52
0
5
10
15
20
25
30
35
0 200 400 600 800 1000 1200 1400 1600 1800
Transactionlogsize(GB)
Elapsed time (sec)
CFQ CFQ-IDLE QASIO
Application Throughput
• Impact on background task
53
0
5
10
15
20
25
30
35
0 200 400 600 800 1000 1200 1400 1600 1800
Transactionlogsize(GB)
Elapsed time (sec)
CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO
Application Throughput
• Impact on background task
54
Our scheme improves
application throughput
w/o penalizing
background tasks
0
5
10
15
20
25
30
35
0 200 400 600 800 1000 1200 1400 1600 1800
Transactionlogsize(GB)
Elapsed time (sec)
CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP
Application Latency
• PostgreSQL w/ TPC-C workload
55
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
0 1000 2000 3000 4000 5000 6000
CCDFP[X>=x]
Transaction latency (msec)
CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO
100
10-1
10-2
10-3
10-4
10-5
Over 2 sec
at 99.9th
0th
90th
99th
99.9th
99.99th
99.999th
Application Latency
• PostgreSQL w/ TPC-C workload
56
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
0 1000 2000 3000 4000 5000 6000
CCDFP[X>=x]
Transaction latency (msec)
CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP
100
10-1
10-2
10-3
10-4
10-5
300 msec
at 99.999th
Our scheme is effective
for improving tail latency
Over 2 sec
at 99.9th
100
10-1
10-2
10-3
10-4
10-5
0th
90th
99th
99.9th
99.99th
99.999th
Summary of Other Results
• Performance results
• MongoDB: 12%-201% throughput, 5x-20x latency at 99.9th
• Redis: 7%-49% throughput, 2x-20x latency at 99.9th
• Analysis results
• System latency analysis using LatencyTOP
• System throughput vs. Application latency
• Need for holistic approach
57
Conclusions
• Key observation
• All the layers in the I/O path should be considered as a whole with I/O
priority inversion in mind for effective I/O prioritization
• Request-centric I/O prioritization
• Enlightens the I/O path solely for application performance
• Improves throughput and latency of real applications
• Ongoing work
• Practicalizing implementation
• Applying RCP to database cluster with multiple replicas
58
Thank You!
• Questions and comments
• Contact
• sangwook@apposha.io
59

More Related Content

Viewers also liked

Best Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS WorkloadsBest Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS Workloads
Amazon Web Services
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
Bob Ward
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
Apache Apex
 
2017 iosco research report on financial technologies (fintech)
2017 iosco research report on  financial technologies (fintech)2017 iosco research report on  financial technologies (fintech)
2017 iosco research report on financial technologies (fintech)
Ian Beckett
 
Your App Deserves More – The Art of App Modernization
Your App Deserves More – The Art of App ModernizationYour App Deserves More – The Art of App Modernization
Your App Deserves More – The Art of App Modernization
Klaus Bild
 
Web engineering notes unit 3
Web engineering notes unit 3Web engineering notes unit 3
Web engineering notes unit 3
inshu1890
 
Webinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph DatabasesWebinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph Databases
DataStax
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Tugas 4 0317-imelda felicia-1412510545
Tugas 4 0317-imelda felicia-1412510545Tugas 4 0317-imelda felicia-1412510545
Tugas 4 0317-imelda felicia-1412510545
imeldafelicia
 
What’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQLWhat’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQL
Amazon Web Services
 

Viewers also liked (10)

Best Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS WorkloadsBest Practices for Integrating Active Directory with AWS Workloads
Best Practices for Integrating Active Directory with AWS Workloads
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
2017 iosco research report on financial technologies (fintech)
2017 iosco research report on  financial technologies (fintech)2017 iosco research report on  financial technologies (fintech)
2017 iosco research report on financial technologies (fintech)
 
Your App Deserves More – The Art of App Modernization
Your App Deserves More – The Art of App ModernizationYour App Deserves More – The Art of App Modernization
Your App Deserves More – The Art of App Modernization
 
Web engineering notes unit 3
Web engineering notes unit 3Web engineering notes unit 3
Web engineering notes unit 3
 
Webinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph DatabasesWebinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph Databases
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Tugas 4 0317-imelda felicia-1412510545
Tugas 4 0317-imelda felicia-1412510545Tugas 4 0317-imelda felicia-1412510545
Tugas 4 0317-imelda felicia-1412510545
 
What’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQLWhat’s New in Amazon Aurora for MySQL and PostgreSQL
What’s New in Amazon Aurora for MySQL and PostgreSQL
 

Similar to Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Anne Nicolas
 
Java performance monitoring
Java performance monitoringJava performance monitoring
Java performance monitoring
Simon Ritter
 
Coverage Solutions on Emulators
Coverage Solutions on EmulatorsCoverage Solutions on Emulators
Coverage Solutions on Emulators
DVClub
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
Ender Aydin Orak
 
Datastage Online Training
Datastage Online TrainingDatastage Online Training
Datastage Online Training
Srihitha Technologies
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
Vinay Kumar Chella
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
RISC-V International
 
CrikeyCon 2015 - iOS Runtime Hacking Crash Course
CrikeyCon 2015 - iOS Runtime Hacking Crash CourseCrikeyCon 2015 - iOS Runtime Hacking Crash Course
CrikeyCon 2015 - iOS Runtime Hacking Crash Course
eightbit
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
Aneesh Raveendran
 
Wahckon[2] - iOS Runtime Hacking Crash Course
Wahckon[2] - iOS Runtime Hacking Crash CourseWahckon[2] - iOS Runtime Hacking Crash Course
Wahckon[2] - iOS Runtime Hacking Crash Course
eightbit
 
Introduction to Robot Framework – Exove
Introduction to Robot Framework – ExoveIntroduction to Robot Framework – Exove
Introduction to Robot Framework – Exove
Exove
 
Java Concurrency and Performance | Multi Threading | Concurrency | Java Conc...
Java Concurrency and Performance | Multi Threading  | Concurrency | Java Conc...Java Concurrency and Performance | Multi Threading  | Concurrency | Java Conc...
Java Concurrency and Performance | Multi Threading | Concurrency | Java Conc...
Anand Narayanan
 
Adobe AEM - From Eventing to Job Processing
Adobe AEM - From Eventing to Job ProcessingAdobe AEM - From Eventing to Job Processing
Adobe AEM - From Eventing to Job Processing
Carsten Ziegeler
 
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Perficient
 
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log InsightVMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld
 
Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...
Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...
Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...
Anne Nicolas
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
Glenn K. Lockwood
 

Similar to Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17) (20)

Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
 
Java performance monitoring
Java performance monitoringJava performance monitoring
Java performance monitoring
 
Coverage Solutions on Emulators
Coverage Solutions on EmulatorsCoverage Solutions on Emulators
Coverage Solutions on Emulators
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Datastage Online Training
Datastage Online TrainingDatastage Online Training
Datastage Online Training
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
 
CrikeyCon 2015 - iOS Runtime Hacking Crash Course
CrikeyCon 2015 - iOS Runtime Hacking Crash CourseCrikeyCon 2015 - iOS Runtime Hacking Crash Course
CrikeyCon 2015 - iOS Runtime Hacking Crash Course
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Wahckon[2] - iOS Runtime Hacking Crash Course
Wahckon[2] - iOS Runtime Hacking Crash CourseWahckon[2] - iOS Runtime Hacking Crash Course
Wahckon[2] - iOS Runtime Hacking Crash Course
 
Introduction to Robot Framework – Exove
Introduction to Robot Framework – ExoveIntroduction to Robot Framework – Exove
Introduction to Robot Framework – Exove
 
Java Concurrency and Performance | Multi Threading | Concurrency | Java Conc...
Java Concurrency and Performance | Multi Threading  | Concurrency | Java Conc...Java Concurrency and Performance | Multi Threading  | Concurrency | Java Conc...
Java Concurrency and Performance | Multi Threading | Concurrency | Java Conc...
 
Adobe AEM - From Eventing to Job Processing
Adobe AEM - From Eventing to Job ProcessingAdobe AEM - From Eventing to Job Processing
Adobe AEM - From Eventing to Job Processing
 
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
Integrating Oracle Argus Safety with other Clinical Systems Using Argus Inter...
 
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log InsightVMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
 
Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...
Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...
Kernel Recipes 2018 - Live (Kernel) Patching: status quo and status futurus -...
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 

Recently uploaded

Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
kalichargn70th171
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 

Recently uploaded (20)

Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 

Enlightening the I/O Path: A Holistic Approach for Application Performance (USENIX FAST'17)

  • 1. Enlightening the I/O Path: A Holistic Approach for Application Performance Sangwook Kim13, Hwanju Kim2, Joonwon Lee3, and Jinkyu Jeong3 Apposha1 Dell EMC2 Sungkyunkwan University3
  • 3. Data-Intensive Applications • Common structure 3 Storage Device Operating System T1 Client T2 I/O T3 T4 Request Response I/O I/O I/O Application Application performance * Example: MongoDB - Client (foreground) - Checkpointer - Log writer - Eviction worker - …
  • 4. Data-Intensive Applications • Common structure 4 Storage Device Operating System T1 Client T2 I/O T3 T4 Request Response I/O I/O I/O Application Application performance * Example: MongoDB - Server (client) - Checkpointer - Log writer - Evict worker - … Background tasks are problematic for application performance
  • 5. Application Impact 5 • Illustrative experiment • YCSB update-heavy workload against MongoDB
  • 6. Application Impact 6 • Illustrative experiment • YCSB update-heavy workload against MongoDB 0 5000 10000 15000 20000 25000 30000 0 200 400 600 800 1000 1200 1400 1600 1800 Operationthroughput (ops/sec) Elapsed time (sec) CFQ Regular checkpoint task 30 seconds latency at 99.99th percentile
  • 7. 0 5000 10000 15000 20000 25000 30000 0 200 400 600 800 1000 1200 1400 1600 1800 Operationthroughput (ops/sec) Elapsed time (sec) CFQ CFQ-IDLE Application Impact 7 • Illustrative experiment • YCSB update-heavy workload against MongoDB I/O priority does not help
  • 8. 0 5000 10000 15000 20000 25000 30000 0 200 400 600 800 1000 1200 1400 1600 1800 Operationthroughput (ops/sec) Elapsed time (sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO Application Impact 8 • Illustrative experiment • YCSB update-heavy workload against MongoDB State-of-the-art schedulers do not help much
  • 9. What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 9
  • 10. What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 10
  • 11. Multiple Independent Layers • Independent I/O processing 11 Storage Device Caching Layer Application File System Layer Block Layer Abstraction
  • 12. Multiple Independent Layers 12 Storage Device Caching Layer Application File System Layer Block Layer Buffer Cache read() write() admission control Abstraction • Independent I/O processing
  • 13. Multiple Independent Layers • Independent I/O processing 13 Storage Device Caching Layer Application File System Layer Block Layer Buffer Cache read() write() Block-level Q admission control Abstraction
  • 14. Multiple Independent Layers • Independent I/O processing 14 Storage Device Caching Layer Application File System Layer Block Layer Abstraction Buffer Cache read() write() reorder FG FG BGBG
  • 15. Multiple Independent Layers • Independent I/O processing 15 Storage Device Caching Layer Application File System Layer Block Layer Abstraction Buffer Cache read() write() FG FGBG Device-internal Q admission control
  • 16. Multiple Independent Layers • Independent I/O processing 16 Storage Device Caching Layer Application File System Layer Block Layer Abstraction Buffer Cache read() write() FG FGBG BG FG BGBG reorder
  • 17. What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 17
  • 18. I/O Priority Inversion • Task dependency 18 Storage Device Caching Layer Application File System Layer Block Layer Locks Condition variables
  • 19. I/O Priority Inversion • Task dependency 19 Storage Device Caching Layer Application File System Layer Block Layer Condition variables I/OFG lock BG wait
  • 20. I/O Priority Inversion • Task dependency 20 Storage Device Caching Layer Application File System Layer Block Layer I/OFG lock BG wait FG wait wait BGvar wake
  • 21. I/O Priority Inversion • Task dependency 21 Storage Device Caching Layer Application File System Layer Block Layer I/O FG wait wait BGuser var wake FG wait
  • 22. I/O Priority Inversion • I/O dependency 22 Storage Device Caching Layer Application File System Layer Block Layer Outstanding I/Os
  • 23. I/O Priority Inversion • I/O dependency 23 Storage Device Caching Layer Application File System Layer Block Layer I/OFG wait For ensuring consistency and/or durability
  • 24. 24 100 ms latency at 99.99th percentile 0 5000 10000 15000 20000 25000 30000 35000 0 200 400 600 800 1000 1200 1400 1600 1800 Operationthroughput (ops/sec) Elapsed time (sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP • Request-centric I/O prioritization (RCP) • Critical I/O: I/O in the critical path of request handling • Policy: holistically prioritizes critical I/Os along the I/O path Our Approach
  • 25. Challenges • How to accurately identify I/O criticality • How to effectively enforce I/O criticality 25
  • 26. Critical I/O Detection • Enlightenment API • Interface for tagging foreground tasks • I/O priority inheritance • Handling task dependency • Handling I/O dependency 26
  • 27. I/O Priority Inheritance • Handling task dependency • Locks • Condition variables 27 FG lock BG I/OFG inherit BG submit complete FG BG unlock FG wait BG register BG inherit FG BGI/O submit complete wake CV CV CV
  • 28. I/O Priority Inheritance • Handling I/O dependency 28
  • 29. I/O Priority Inheritance • Handling I/O dependency 29 Block Layer I/OI/O
  • 30. I/O Priority Inheritance • Handling I/O dependency 30 Block Layer Q admission stage I/OI/O Sched queueing stage
  • 31. I/O Priority Inheritance • Handling I/O dependency 31 Block Layer PER-DEV ROOT NCIO NCIO NCIO NCIO Q admission stage I/OI/O Sched queueing stage Non-critical I/O tracking Descriptor Location Resolver Sector #
  • 32. I/O Priority Inheritance • Handling I/O dependency 32 Block Layer Descriptor Location Resolver Sector # allocate Q admission stage I/O I/O I/O Sched queueing stage Non-critical I/O tracking PER-DEV ROOT NCIO NCIO NCIO NCIO
  • 33. I/O Priority Inheritance • Handling I/O dependency 33 Block Layer Q admission stage I/O I/O I/O Sched queueing stage Non-critical I/O tracking update Descriptor Location Resolver Sector # PER-DEV ROOT NCIO NCIO NCIO NCIO
  • 34. I/O Priority Inheritance • Handling I/O dependency 34 Block Layer Q admission stage I/O I/O Sched queueing stage I/O Non-critical I/O tracking update Descriptor Location Resolver Sector # PER-DEV ROOT NCIO NCIO NCIO NCIO
  • 35. I/O Priority Inheritance • Handling I/O dependency 35 Block Layer Q admission stage I/O I/O Sched queueing stage I/O Non-critical I/O tracking update Descriptor Location Resolver Sector # PER-DEV ROOT NCIO NCIO NCIO NCIO
  • 36. I/O Priority Inheritance • Handling I/O dependency 36 Block Layer Q admission stage I/O I/O Sched queueing stage I/O Non-critical I/O tracking update Descriptor Location Resolver Sector # PER-DEV ROOT NCIO NCIO NCIO NCIO
  • 37. I/O Priority Inheritance • Handling I/O dependency 37 Block Layer Q admission stage I/O I/O Sched queueing stage I/O Non-critical I/O tracking Descriptor Location Resolver Sector # PER-DEV ROOT NCIO NCIO NCIO NCIO delete on completion
  • 38. Handling Transitive Dependency • Possible states of dependent task 38 FG inherit BG BG Blocked on task I/OFG inherit BG wait wait Blocked on I/O FG inherit BG wait Blocked at admission stage
  • 39. Handling Transitive Dependency • Recording blocking status 39 FG inherit BG BG Blocked on task I/OFG inherit BG wait wait Blocked on I/O FG inherit BG wait Blocked at admission stage
  • 40. Handling Transitive Dependency • Recording blocking status 40 FG inherit BG BG Task is recorded I/OFG inherit BG wait Blocked on I/O FG inherit BG wait Blocked at admission stage inherit
  • 41. Handling Transitive Dependency • Recording blocking status 41 FG inherit BG BG I/OFG inherit BG reprio I/O is recorded FG inherit BG wait Blocked at admission stage Task is recorded inherit
  • 42. Handling Transitive Dependency • Recording blocking status 42 FG inherit BG BG I/OFG inherit BG FG inherit BG retryreprio I/O is recorded Task is recorded inherit
  • 43. Challenges • How to accurately identify I/O criticality • Enlightenment API • I/O priority inheritance • Recording blocking status • How to effectively enforce I/O criticality 43
  • 44. Criticality-Aware I/O Prioritization • Caching layer • Apply low dirty ratio for non-critical writes (1% by default) • Block layer • Isolate allocation of block queue slots • Maintain 2 FIFO queues • Schedule critical I/O first • Limit # of outstanding non-critical I/Os (1 by default) • Support queue promotion to resolve I/O dependency 44
  • 45. Evaluation • Implementation on Linux 3.13 w/ ext4 • Application studies • PostgreSQL relational database • Backend processes as foreground tasks • I/O priority inheritance on LWLocks (semop) • MongoDB document store • Client threads as foreground tasks • I/O priority inheritance on Pthread mutex and condition vars (futex) • Redis key-value store • Master process as foreground task 45
  • 46. Evaluation • Experimental setup • 2 Dell PowerEdge R530 (server & client) • 1TB Micron MX200 SSD • I/O prioritization schemes • CFQ (default), CFQ-IDLE • SPLIT-A (priority), SPLIT-D (deadline) [SOSP’15] • QASIO [FAST’15] • RCP 46
  • 47. Application Throughput • PostgreSQL w/ TPC-C workload 47 0 1000 2000 3000 4000 5000 6000 7000 10GB dataset 60GB dataset 200GB dataset Transactionthroughput (trx/sec) CFQ CFQ-IDLE
  • 48. Application Throughput • PostgreSQL w/ TPC-C workload 48 0 1000 2000 3000 4000 5000 6000 7000 10GB dataset 60GB dataset 200GB dataset Transactionthroughput (trx/sec) CFQ CFQ-IDLE SPLIT-A
  • 49. Application Throughput • PostgreSQL w/ TPC-C workload 49 0 1000 2000 3000 4000 5000 6000 7000 10GB dataset 60GB dataset 200GB dataset Transactionthroughput (trx/sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D
  • 50. Application Throughput • PostgreSQL w/ TPC-C workload 50 0 1000 2000 3000 4000 5000 6000 7000 10GB dataset 60GB dataset 200GB dataset Transactionthroughput (trx/sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO
  • 51. Application Throughput • PostgreSQL w/ TPC-C workload 51 0 1000 2000 3000 4000 5000 6000 7000 10GB dataset 60GB dataset 200GB dataset Transactionthroughput (trx/sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP 37% 31% 28%
  • 52. Application Throughput • Impact on background task 52 0 5 10 15 20 25 30 35 0 200 400 600 800 1000 1200 1400 1600 1800 Transactionlogsize(GB) Elapsed time (sec) CFQ CFQ-IDLE QASIO
  • 53. Application Throughput • Impact on background task 53 0 5 10 15 20 25 30 35 0 200 400 600 800 1000 1200 1400 1600 1800 Transactionlogsize(GB) Elapsed time (sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO
  • 54. Application Throughput • Impact on background task 54 Our scheme improves application throughput w/o penalizing background tasks 0 5 10 15 20 25 30 35 0 200 400 600 800 1000 1200 1400 1600 1800 Transactionlogsize(GB) Elapsed time (sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP
  • 55. Application Latency • PostgreSQL w/ TPC-C workload 55 1.E-05 1.E-04 1.E-03 1.E-02 1.E-01 1.E+00 0 1000 2000 3000 4000 5000 6000 CCDFP[X>=x] Transaction latency (msec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO 100 10-1 10-2 10-3 10-4 10-5 Over 2 sec at 99.9th 0th 90th 99th 99.9th 99.99th 99.999th
  • 56. Application Latency • PostgreSQL w/ TPC-C workload 56 1.E-05 1.E-04 1.E-03 1.E-02 1.E-01 1.E+00 0 1000 2000 3000 4000 5000 6000 CCDFP[X>=x] Transaction latency (msec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP 100 10-1 10-2 10-3 10-4 10-5 300 msec at 99.999th Our scheme is effective for improving tail latency Over 2 sec at 99.9th 100 10-1 10-2 10-3 10-4 10-5 0th 90th 99th 99.9th 99.99th 99.999th
  • 57. Summary of Other Results • Performance results • MongoDB: 12%-201% throughput, 5x-20x latency at 99.9th • Redis: 7%-49% throughput, 2x-20x latency at 99.9th • Analysis results • System latency analysis using LatencyTOP • System throughput vs. Application latency • Need for holistic approach 57
  • 58. Conclusions • Key observation • All the layers in the I/O path should be considered as a whole with I/O priority inversion in mind for effective I/O prioritization • Request-centric I/O prioritization • Enlightens the I/O path solely for application performance • Improves throughput and latency of real applications • Ongoing work • Practicalizing implementation • Applying RCP to database cluster with multiple replicas 58
  • 59. Thank You! • Questions and comments • Contact • sangwook@apposha.io 59