SlideShare a Scribd company logo
1 of 22
Download to read offline
1
Block I/O Layer Tracing:
blktrace
Gelato – Cupertino, CA
April 2006
Alan D. Brunelle
Hewlett­Packard Company
Open Source and Linux Organization
Scalability & Performance Group
Alan.Brunelle@hp.com
2
Introduction
● Blktrace – overview of a new Linux capability
– Ability to see what's going on inside the block I/O 
layer
● “You can't count what you can't measure”
– Kernel implementation
– Description of user applications 
● Sample Output & Analysis
3
Problem Statement
● Need to know the specific operations performed 
upon each I/O submitted into the block I/O layer
● Who?
– Kernel developers in the I/O path:
● Block I/O layer, I/O scheduler, software RAID, file system, ...
– Performance analysis engineers – HP OSLO S&P...
4
Block I/O Layer (simplified)
Applications
File Systems...
Page Cache
Block I/O Layer: Request Queues
Pseudo devices (MD/DM ­ optional)
Physical devices
5
iostat
● The iostat utility does provide information 
pertaining to request queues associated with 
specifics devices
– Average I/O time on queue, number of merges, number of 
blocks read/written, ...
● However, it does not provide detailed information 
on a per­I/O basis
6
Blktrace – to the rescue!
● Developed and maintained by Jens Axboe (block I/O layer 
maintainer)
– My additions included adding threads & utility splitting, DM remap 
events, blkrawverify utility, binary dump feature, testing,  
kernel/utility patches, and documentation.
● Low­overhead, configurable kernel component which emits events 
for specific operations performed on each I/O entering the block 
I/O layer
● Set of tools which extract and format these events
However, blktrace is not an analysis tool!
7
Feature List
● Provides detailed block layer information concerning individual I/Os
● Low­overhead kernel tracing mechanism
– Seeing less than 2% hits to application performance in relatively stressful I/O situations
● Configurable:
– Specify 1 or more physical or logical devices 
(including MD and DM (LVM2))
– User­selectable events – can specify filter at event acquisition and/or 
when formatting output
● Supports both “live” and “playback” tracing
8
Events Captured
● Request queue entry allocated
● Sleep during request queue 
allocation
● Request queue insertion
● Front/back merge of I/O on 
request queue
● Re­queue of a request
● Request issued to underlying 
block dev
● Request queue plug/unplug op
● I/O split/bounce operation
● I/O remap
– MD or DM
● Request completed
9
blktrace: General Architecture
Block I/O Layer
(request queue)
I/O
I/O
I/O
I/O
...
Relay Channel
Relay Channel
Relay Channel
blktrace
blkparse
Per Dev
Per CPU
Kernel Space User Space
Events
Emitted
10
blktrace Utilities
● blktrace: Device configuration, and event 
extraction utility
– Store events in (long) term storage
– Or, pipe to blkparse utility for live tracing
● Also: networking feature to remote events for parsing on another 
machine
● blkparse: Event formatting utility 
– Supports textual or binary dump output
11
blktrace: Event Output
% blktrace -d /dev/sda -o - | blkparse -i -
8,0 3 1 0.000000000 697 G W 223490 + 8 [kjournald]
8,0 3 2 0.000001829 697 P R [kjournald]
8,0 3 3 0.000002197 697 Q W 223490 + 8 [kjournald]
8,0 3 4 0.000005533 697 M W 223498 + 8 [kjournald]
8,0 3 5 0.000008607 697 M W 223506 + 8 [kjournald]
...
8,0 3 10 0.000024062 697 D W 223490 + 56 [kjournald]
8,0 1 11 0.009507758 0 C W 223490 + 56 [0]
Dev <mjr, mnr>
CPU
Sequence
Number Time
Stamp PID Event Start block + number of blocks
Process
12
blktrace: Summary Output
CPU0 (sdao):
Reads Queued: 0, 0KiB Writes Queued: 77,382, 5,865MiB
Read Dispatches: 0, 0KiB Write Dispatches: 7,329, 3,020MiB
Reads Requeued: 0 Writes Requeued: 6
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0 Write Merges: 68,844
Read depth: 2 Write depth: 65
IO unplugs: 414 Timer unplugs: 414
...
CPU3 (sdao):
Reads Queued: 105, 18KiB Writes Queued: 14,541, 2,578MiB
Read Dispatches: 22, 60KiB Write Dispatches: 6,207, 1,964MiB
Reads Requeued: 0 Writes Requeued: 1,408
Reads Completed: 22, 60KiB Writes Completed: 12,300, 5,059MiB
Read Merges: 83 Write Merges: 10,968
Read depth: 2 Write depth: 65
IO unplugs: 287 Timer unplugs: 287
Total (sdao):
Reads Queued: 105, 18KiB Writes Queued: 92,546, 8,579MiB
Read Dispatches: 22, 60KiB Write Dispatches: 13,714, 5,059MiB
Reads Requeued: 0 Writes Requeued: 1,414
Reads Completed: 22, 60KiB Writes Completed: 12,300, 5,059MiB
Read Merges: 83 Write Merges: 80,246
IO unplugs: 718 Timer unplugs: 718
Throughput (R/W): 0KiB/s / 39,806KiB/s
Events (sdao): 324,011 entries
Skips: 0 forward (0 - 0.0%)
Per CPU details
Avg throughput
Per device
details
Writes submitted on
Writes completed on
13
blktrace: Event Storage Choices
● Physical disk backed file system
– Pros: large/permanent amount of storage available; supported by all kernels
– Cons: potentially higher system impact; may negatively impact devices being watched (if 
storing on the same bus that other devices are being watched on...)
● RAM disk backed file system
– Pros: predictable system impact (RAM allocated at boot); less impact to I/O subsystem
– Cons: limited/temporary storage size; removes RAM from system (even when not tracing); 
may require reboot/kernel build
● TMPFS
– Pros: less impact to I/O subsystem; included in most kernels; only utilizes system RAM while 
events are stored
– Cons: limited/temporary storage; impacts system predictability – RAM “removed” as events 
are stored – could affect application being “watched” 
14
blktrace: Analysis Aid
● As noted previously, blktrace does not analyze 
the data; it is responsible for storing and 
formatting events
● Need to develop post­processing analysis tools 
– Can work on formatted output or binary data stored 
by blktrace itself
– Example: btt – block trace timeline
15
Practical blktrace
● Here at HP OSLO S&P, we are investigating I/O 
scalability at various levels
– Including the efficiency of various hardware configurations 
and the effects on I/O performance caused by software RAID 
(MD and DM)
● blktrace enables us to determine scalability issues within 
the block I/O layer and the overhead costs induced when 
utilizing software RAID
16
Life of an I/O (simplified)
● I/O enters block layer – it can be:
– Remapped onto another device (MD, DM)
– Split into 2 separate I/Os (alignment, size, ...)
– Added to the request queue
– Merged with a previous entry on the queue
All I/Os end up on a request queue at some point
● At some later time, the I/O is issued to a device driver, 
and submitted to a device
● Later, the I/O is completed by the device, and its driver
17
btt: Life of an I/O
● Q2I – time it takes to process an I/O prior to it being 
inserted or merged onto a request queue
– Includes split, and remap time
● I2D – time the I/O is “idle” on the request queue
● D2C – time the I/O is “active” in the driver and on the 
device
● Q2I + I2D + D2C = Q2C 
– Q2C: Total processing time of the I/O
18
btt: Partial Output
DEV #Q #D Ratio BLKmin BLKavg BLKmax Total
------- --- ----- ----- ------- ------ ------ --------
[ 8, 0] 92827 12401 7.5 1 109 1024 10120441
[ 8, 1] 93390 13676 6.8 1 108 1024 10150343
[ 8, 2] 92366 13052 7.1 1 109 1024 10119302
[ 8, 3] 92278 13616 6.8 1 109 1024 10119043
[ 8, 4] 92651 13736 6.7 1 109 1024 10119903
DEV Q2I I2D D2C Q2C
------- ----------- ----------- ----------- -----------
[ 8, 0] 0.049697430 0.302734498 0.074038617 0.400079555
[ 8, 1] 0.031665593 0.050032148 0.058669682 0.125934697
[ 8, 2] 0.035651772 0.031035436 0.047311659 0.096735504
[ 8, 3] 0.021047776 0.011161007 0.038519804 0.060975408
[ 8, 4] 0.028985217 0.008397228 0.034344640 0.058160497
DEV Q2I I2D D2C
------- ------ ------ ------
[ 8, 0] 11.7% 71.0% 17.4%
[ 8, 1] 22.6% 35.6% 41.8%
[ 8, 2] 31.3% 27.2% 41.5%
[ 8, 3] 29.8% 15.8% 54.5%
[ 8, 4] 40.4% 11.7% 47.9%
M
erge Ratio:
#Q / #D
SCSI bus, target:
low­ to high­priority
M
erge Ratio:
#Q / #D
Driver/Device time
Avg I/O time
“Software” tim
e
Excessive “idle” time on 
request queue
19
btt: Q&C Activity
● btt also generates “activity” data – indicating 
ranges where processes and devices are actively 
handling various events (block I/O entered, I/O 
inserted/merged, I/O issued, I/O complete, ...)
● This data can be plotted (e.g. xmgrace) to see 
patterns and extract information concerning 
anomalous behavior 
20
btt: I/O Scheduler Example
mkfs & pdflush
“fight” for device
I/O delayed by 
m
kfs activity
“Q” 
Activity
“C” 
Activity
21
btt: I/O Scheduler ­ Explained
● Characterizing I/O stack
● Noticed very long I2D times for certain processes
● Graph shows continuous stream of I/Os...
– ...at the device level
– ...for the mkfs.ext3 process
● Graph shows significant lag for pdflush daemon
– Last I/O enters block I/O layer around 19 seconds
– But: last batch of I/Os aren't completed until 14 seconds later!
● Cause? Anticipatory scheduler – allows mkfs.ext3 to proceed, 
holding off pdflush I/Os
22
Resources
● Kernel sources:
– Patch for Linux 2.6.14­rc3 (or later, up to 2.6.17)
– Linux 2.6.17 (or later) – built in
● Utilities & documentation (& kernel patches)
– rsync://rsync.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git
– See documentation in doc directory
● Mailing list: linux­btrace@vger.kernel.org

More Related Content

What's hot

Linux Kernel Image
Linux Kernel ImageLinux Kernel Image
Linux Kernel Image
艾鍗科技
 

What's hot (20)

QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
 
[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion
 
Linux Kernel Image
Linux Kernel ImageLinux Kernel Image
Linux Kernel Image
 
Linux Internals - Part II
Linux Internals - Part IILinux Internals - Part II
Linux Internals - Part II
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzer
 
Linux Internals - Interview essentials - 1.0
Linux Internals - Interview essentials - 1.0Linux Internals - Interview essentials - 1.0
Linux Internals - Interview essentials - 1.0
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Linux Internals - Part I
Linux Internals - Part ILinux Internals - Part I
Linux Internals - Part I
 
Kernel Module Programming
Kernel Module ProgrammingKernel Module Programming
Kernel Module Programming
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
Spi drivers
Spi driversSpi drivers
Spi drivers
 
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Embedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernelEmbedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernel
 
Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50
 
The basic concept of Linux FIleSystem
The basic concept of Linux FIleSystemThe basic concept of Linux FIleSystem
The basic concept of Linux FIleSystem
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
 
Yocto project and open embedded training
Yocto project and open embedded trainingYocto project and open embedded training
Yocto project and open embedded training
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 

Similar to Block I/O Layer Tracing: blktrace

Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
Jun Liu
 

Similar to Block I/O Layer Tracing: blktrace (20)

Case study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File systemCase study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File system
 
Ioppt
IopptIoppt
Ioppt
 
Topic 6 IB DP CS
Topic 6 IB DP CSTopic 6 IB DP CS
Topic 6 IB DP CS
 
2337610
23376102337610
2337610
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
 
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
 
IO Dubi Lebel
IO Dubi LebelIO Dubi Lebel
IO Dubi Lebel
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
Oct2009
Oct2009Oct2009
Oct2009
 
Evolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave ProbertEvolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave Probert
 
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniquesLarson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
 
OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud Databases
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 

Block I/O Layer Tracing: blktrace