SlideShare a Scribd company logo
New sendfile(2)
Gleb Smirnoff
glebius@FreeBSD.org
FreeBSD Storage Summit
Netflix
20 February 2015
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 1 / 23
History of sendfile(2) Before sendfile(2)
Miserable life w/o sendfile(2)
while ((cnt = read(filefd, buf, (u_int)blksize))
write(netfd, buf, cnt) == cnt)
byte_count += cnt;
send_data() в src/libexec/ftpd/ftpd.c,
FreeBSD 1.0, 1993
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 2 / 23
History of sendfile(2) sendfile(2) introduced
sendfile(2) introduced
int
sendfile(int fd, int s, off_t offset, size_t nbytes, .. );
1997: HP-UX 11.00
1998: FreeBSD 3.0 and Linux 2.2
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 3 / 23
History of sendfile(2) sendfile(2) in FreeBSD
sendfile(2) in FreeBSD
First implementation - mapping userland cycle to the
kernel:
read(filefd) → VOP_READ(vnode)
write(netfd) → sosend(socket)
blksize → PAGE_SIZE
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
History of sendfile(2) sendfile(2) in FreeBSD
sendfile(2) in FreeBSD
First implementation - mapping userland cycle to the
kernel:
read(filefd) → VOP_READ(vnode)
write(netfd) → sosend(socket)
blksize → PAGE_SIZE
Further optimisations:
2004: SF_NODISKIO flag
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
History of sendfile(2) sendfile(2) in FreeBSD
sendfile(2) in FreeBSD
First implementation - mapping userland cycle to the
kernel:
read(filefd) → VOP_READ(vnode)
write(netfd) → sosend(socket)
blksize → PAGE_SIZE
Further optimisations:
2004: SF_NODISKIO flag
2006: inner cycle, working on sbspace() bytes
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
History of sendfile(2) sendfile(2) in FreeBSD
sendfile(2) in FreeBSD
First implementation - mapping userland cycle to the
kernel:
read(filefd) → VOP_READ(vnode)
write(netfd) → sosend(socket)
blksize → PAGE_SIZE
Further optimisations:
2004: SF_NODISKIO flag
2006: inner cycle, working on sbspace() bytes
2013: sending a shared memory descriptor data
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
What’s not right with sendfile(2) blocking on I/O
Problem #1: blocking on I/O
Algorithm of a modern HTTP-server:
1 Take yet another descriptor from kevent(2)
2 Do write(2)/read(2)/sendfile(2) on it
3 Go to 1
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 5 / 23
What’s not right with sendfile(2) blocking on I/O
Problem #1: blocking on I/O
Algorithm of a modern HTTP-server:
1 Take yet another descriptor from kevent(2)
2 Do write(2)/read(2)/sendfile(2) on it
3 Go to 1
Bottleneck: any syscall time.
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 5 / 23
What’s not right with sendfile(2) blocking on I/O
Attempts to solve problem #1
Separate I/O contexts: processes, threads
Apache
nginx 2
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 6 / 23
What’s not right with sendfile(2) blocking on I/O
Attempts to solve problem #1
Separate I/O contexts: processes, threads
Apache
nginx 2
SF_NODISKIO + aio_read(2)
nginx
Varnish
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 6 / 23
What’s not right with sendfile(2) blocking on I/O
More attempts . . .
aio_mlock(2) instead of aio_read(2)
aio_sendfile(2) ???
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 7 / 23
What’s not right with sendfile(2) control over
Problem #2: control over VM
VOP_READ() leaves pages in VM cache
VOP_READ() [for UFS] does readahead
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 8 / 23
What’s not right with sendfile(2) control over
Problem #2: control over VM
VOP_READ() leaves pages in VM cache
VOP_READ() [for UFS] does readahead
Not easy to prevent it doing that!
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 8 / 23
New sendfile(2) implementation above pager
waht if VOP_GETPAGES()?
VOP_READ() → VOP_GETPAGES()
Pros:
sendfile() already works on pages
implementations for vnode and shmem converge
control over VM is now easier task
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
New sendfile(2) implementation above pager
waht if VOP_GETPAGES()?
VOP_READ() → VOP_GETPAGES()
Pros:
sendfile() already works on pages
implementations for vnode and shmem converge
control over VM is now easier task
Cons
Losing readahead heuristics
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
New sendfile(2) implementation above pager
waht if VOP_GETPAGES()?
VOP_READ() → VOP_GETPAGES()
Pros:
sendfile() already works on pages
implementations for vnode and shmem converge
control over VM is now easier task
Cons
Losing readahead heuristics
But no one used them!
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
New sendfile(2) VOP_GETPAGES_ASYNC()
VOP_GETPAGES_ASYNC()
int
VOP_GETPAGES(struct vnode *vp, vm_page_t *ma,
int count, int reqpage);
1 Initialize buf(9)
2 buf->b_iodone = bdone;
3 bstrategy(buf);
4 bwait(buf); /* sleeps until I/O completes */
5 return;
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 10 / 23
New sendfile(2) VOP_GETPAGES_ASYNC()
VOP_GETPAGES_ASYNC()
int
VOP_GETPAGES_ASYNC(struct vnode *vp,
vm_page_t *ma, int count, int reqpage,
vop_getpages_iodone_t *iodone, void *arg);
1 Initialize buf(9)
2 buf->b_iodone = vnode_pager_async_iodone;
3 bstrategy(buf);
4 return;
vnode_pager_async_iodone calls iodone() .
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 10 / 23
New sendfile(2) non-blocking sendfile(2)
naive non-blocking sendfile(2)
In kern_sendfile():
1 nios++;
2 VOP_GETPAGES_ASYNC(sendfile_iodone);
In sendfile_iodone():
1 nios--;
2 if (nios) return;
3 sosend();
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 11 / 23
New sendfile(2) non-blocking sendfile(2)
the problem of naive implementation
sendfile(filefd, sockfd, ..);
write(sockfd, ..);
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 12 / 23
New sendfile(2) “not ready” data in socket buffers
socket buffer
mbuf mbuf mbuf mbuf mbuf mbuf
struct sockbuf
struct mbuf *sb_mb
struct mbuf *sb_mbtail
u_int sb_cc
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 13 / 23
New sendfile(2) “not ready” data in socket buffers
socket buffer with “not ready” data
mbuf mbuf mbuf mbuf mbuf mbuf
page page
struct sockbuf
struct mbuf *sb_mb
struct mbuf *sb_fnrdy
struct mbuf *sb_mbtail
u_int sb_acc
u_int sb_ccc
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 14 / 23
New sendfile(2) final implementation
non-blocking sendfile(2)
In kern_sendfile():
1 nios++;
2 VOP_GETPAGES_ASYNC(sendfile_iodone);
3 sosend(NOT_READY);
In sendfile_iodone():
1 nios--;
2 if (nios) return;
3 soready();
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 15 / 23
New sendfile(2) comparison with old sendfile(2)
traffic
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 16 / 23
New sendfile(2) comparison with old sendfile(2)
CPU idle
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 17 / 23
New sendfile(2) comparison with old sendfile(2)
profiling sendfile(2) in head
aio_daemon 13.64%
sys_sendfile 7.40%
t4_intr 5.66%
xpt_done 1.04%
pagedaemon 4.16%
scheduler 5.28%
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 18 / 23
New sendfile(2) comparison with old sendfile(2)
profiling new sendfile(2)
sys_sendfile 16.9%
t4_intr 8.17%
xpt_done 9.91%
pagedaemon 6.54%
scheduler 3.58%
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 19 / 23
New sendfile(2) comparison with old sendfile(2)
profiling new sendfile(2)
sys_sendfile 16.9% (vm_page_grab 9.24% !!)
t4_intr 8.17% (tcp_output() 2.07% !!)
xpt_done 9.91% (m_freem() 3.11% !!)
pagedaemon 6.54%
scheduler 3.58%
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 19 / 23
New sendfile(2) comparison with old sendfile(2)
what did change?
New code always sends full socket buffer
Which is good for TCP (as protocol)
Which hurts VM, mbuf allocator,
and unexpectedly TCP stack
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 20 / 23
New sendfile(2) comparison with old sendfile(2)
what did change?
New code always sends full socket buffer
Which is good for TCP (as protocol)
Which hurts VM, mbuf allocator,
and unexpectedly TCP stack
Will fix that!
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 20 / 23
New sendfile(2) comparison with old sendfile(2)
old sendfile(2) @ Netflix
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 21 / 23
New sendfile(2) comparison with old sendfile(2)
new sendfile(2) @ Netflix
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 21 / 23
New sendfile(2) plans and problems
TODO list
Problems:
VM & I/O overcommit
ZFS
SCTP
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 22 / 23
New sendfile(2) plans and problems
TODO list
Problems:
VM & I/O overcommit
ZFS
SCTP
Future plans:
sendfile(2) doing TLS
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 22 / 23
New sendfile(2)
Questions?
Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 23 / 23

More Related Content

What's hot

Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOF
Jeff Squyres
 
[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM
Douglas Chen
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
Tzung-Bi Shih
 
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動するStargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Kohei Tokunaga
 
Move from C to Go
Move from C to GoMove from C to Go
Move from C to Go
Yu-Shuan Hsieh
 
Startup Containers in Lightning Speed with Lazy Image Distribution
Startup Containers in Lightning Speed with Lazy Image DistributionStartup Containers in Lightning Speed with Lazy Image Distribution
Startup Containers in Lightning Speed with Lazy Image Distribution
Kohei Tokunaga
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
Andrea Righi
 
Streams for the Web
Streams for the WebStreams for the Web
Streams for the Web
Domenic Denicola
 
Go 1.8 Release Party
Go 1.8 Release PartyGo 1.8 Release Party
Go 1.8 Release Party
Rodolfo Carvalho
 
CMake best practices
CMake best practicesCMake best practices
CMake best practices
Henry Schreiner
 
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz SnapshotterThe overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
Kohei Tokunaga
 
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動するStargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
Kohei Tokunaga
 
Ddev workshop t3dd18
Ddev workshop t3dd18Ddev workshop t3dd18
Ddev workshop t3dd18
Jigal van Hemert
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Anne Nicolas
 
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Kohei Tokunaga
 
不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)
Douglas Chen
 
Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Tracing Software Build Processes to Uncover License Compliance InconsistenciesTracing Software Build Processes to Uncover License Compliance Inconsistencies
Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Shane McIntosh
 
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingFaster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Kohei Tokunaga
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
Kernel TLV
 

What's hot (20)

Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOF
 
[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM[COSCUP 2021] A trip about how I contribute to LLVM
[COSCUP 2021] A trip about how I contribute to LLVM
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
 
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動するStargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
 
Move from C to Go
Move from C to GoMove from C to Go
Move from C to Go
 
Startup Containers in Lightning Speed with Lazy Image Distribution
Startup Containers in Lightning Speed with Lazy Image DistributionStartup Containers in Lightning Speed with Lazy Image Distribution
Startup Containers in Lightning Speed with Lazy Image Distribution
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Streams for the Web
Streams for the WebStreams for the Web
Streams for the Web
 
Go 1.8 Release Party
Go 1.8 Release PartyGo 1.8 Release Party
Go 1.8 Release Party
 
CMake best practices
CMake best practicesCMake best practices
CMake best practices
 
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz SnapshotterThe overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
 
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動するStargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略しcontainerdでコンテナを高速に起動する
 
ocelot
ocelotocelot
ocelot
 
Ddev workshop t3dd18
Ddev workshop t3dd18Ddev workshop t3dd18
Ddev workshop t3dd18
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
Build and Run Containers With Lazy Pulling - Adoption status of containerd St...
 
不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)
 
Tracing Software Build Processes to Uncover License Compliance Inconsistencies
Tracing Software Build Processes to Uncover License Compliance InconsistenciesTracing Software Build Processes to Uncover License Compliance Inconsistencies
Tracing Software Build Processes to Uncover License Compliance Inconsistencies
 
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingFaster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
 

Similar to New sendfile

WebKit Security Updates (GUADEC 2016)
WebKit Security Updates (GUADEC 2016)WebKit Security Updates (GUADEC 2016)
WebKit Security Updates (GUADEC 2016)
Igalia
 
How to keep Eclipse on the bleeding edge in the Linux world
How to keep Eclipse on the bleeding edge in the Linux worldHow to keep Eclipse on the bleeding edge in the Linux world
How to keep Eclipse on the bleeding edge in the Linux world
Arun Kumar Thondapu
 
The State of the Metasploit Framework.pdf
The State of the Metasploit Framework.pdfThe State of the Metasploit Framework.pdf
The State of the Metasploit Framework.pdf
egypt
 
A tale of two RTC fuzzing approaches
A tale of two RTC fuzzing approachesA tale of two RTC fuzzing approaches
A tale of two RTC fuzzing approaches
Sandro Gauci
 
DockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐるDockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐる
Kohei Tokunaga
 
Pledge in OpenBSD
Pledge in OpenBSDPledge in OpenBSD
Pledge in OpenBSD
Giovanni Bechis
 
Rust & Python : Python WA October meetup
Rust & Python : Python WA October meetupRust & Python : Python WA October meetup
Rust & Python : Python WA October meetup
John Vandenberg
 
Metasploitable
MetasploitableMetasploitable
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
William Stewart
 
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
Fantix King 王川
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
Cosimo Streppone
 
Continuous Integration for Fun and Profit
Continuous Integration for Fun and ProfitContinuous Integration for Fun and Profit
Continuous Integration for Fun and Profit
inovex GmbH
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integration
Anne Nicolas
 
Git workflows automat-it
Git workflows automat-itGit workflows automat-it
Git workflows automat-it
Automat-IT
 
HTTP and 5G (fixed1)
HTTP and 5G (fixed1)HTTP and 5G (fixed1)
HTTP and 5G (fixed1)
dynamis
 
General Pr0ken File System
General Pr0ken File SystemGeneral Pr0ken File System
General Pr0ken File System
Positive Hack Days
 
GTPing, How To
GTPing, How ToGTPing, How To
GTPing, How To
Kentaro Ebisawa
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch Tuesday
Ivanti
 
2024 Français Patch Tuesday - Février
2024 Français Patch Tuesday - Février2024 Français Patch Tuesday - Février
2024 Français Patch Tuesday - Février
Ivanti
 
Patch Tuesday de Febrero
Patch Tuesday de FebreroPatch Tuesday de Febrero
Patch Tuesday de Febrero
Ivanti
 

Similar to New sendfile (20)

WebKit Security Updates (GUADEC 2016)
WebKit Security Updates (GUADEC 2016)WebKit Security Updates (GUADEC 2016)
WebKit Security Updates (GUADEC 2016)
 
How to keep Eclipse on the bleeding edge in the Linux world
How to keep Eclipse on the bleeding edge in the Linux worldHow to keep Eclipse on the bleeding edge in the Linux world
How to keep Eclipse on the bleeding edge in the Linux world
 
The State of the Metasploit Framework.pdf
The State of the Metasploit Framework.pdfThe State of the Metasploit Framework.pdf
The State of the Metasploit Framework.pdf
 
A tale of two RTC fuzzing approaches
A tale of two RTC fuzzing approachesA tale of two RTC fuzzing approaches
A tale of two RTC fuzzing approaches
 
DockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐるDockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐる
 
Pledge in OpenBSD
Pledge in OpenBSDPledge in OpenBSD
Pledge in OpenBSD
 
Rust & Python : Python WA October meetup
Rust & Python : Python WA October meetupRust & Python : Python WA October meetup
Rust & Python : Python WA October meetup
 
Metasploitable
MetasploitableMetasploitable
Metasploitable
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
 
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
About Those Python Async Concurrent Frameworks - Fantix @ OSTC 2014
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
Continuous Integration for Fun and Profit
Continuous Integration for Fun and ProfitContinuous Integration for Fun and Profit
Continuous Integration for Fun and Profit
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integration
 
Git workflows automat-it
Git workflows automat-itGit workflows automat-it
Git workflows automat-it
 
HTTP and 5G (fixed1)
HTTP and 5G (fixed1)HTTP and 5G (fixed1)
HTTP and 5G (fixed1)
 
General Pr0ken File System
General Pr0ken File SystemGeneral Pr0ken File System
General Pr0ken File System
 
GTPing, How To
GTPing, How ToGTPing, How To
GTPing, How To
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch Tuesday
 
2024 Français Patch Tuesday - Février
2024 Français Patch Tuesday - Février2024 Français Patch Tuesday - Février
2024 Français Patch Tuesday - Février
 
Patch Tuesday de Febrero
Patch Tuesday de FebreroPatch Tuesday de Febrero
Patch Tuesday de Febrero
 

Recently uploaded

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 

New sendfile

  • 1. New sendfile(2) Gleb Smirnoff glebius@FreeBSD.org FreeBSD Storage Summit Netflix 20 February 2015 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 1 / 23
  • 2. History of sendfile(2) Before sendfile(2) Miserable life w/o sendfile(2) while ((cnt = read(filefd, buf, (u_int)blksize)) write(netfd, buf, cnt) == cnt) byte_count += cnt; send_data() в src/libexec/ftpd/ftpd.c, FreeBSD 1.0, 1993 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 2 / 23
  • 3. History of sendfile(2) sendfile(2) introduced sendfile(2) introduced int sendfile(int fd, int s, off_t offset, size_t nbytes, .. ); 1997: HP-UX 11.00 1998: FreeBSD 3.0 and Linux 2.2 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 3 / 23
  • 4. History of sendfile(2) sendfile(2) in FreeBSD sendfile(2) in FreeBSD First implementation - mapping userland cycle to the kernel: read(filefd) → VOP_READ(vnode) write(netfd) → sosend(socket) blksize → PAGE_SIZE Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
  • 5. History of sendfile(2) sendfile(2) in FreeBSD sendfile(2) in FreeBSD First implementation - mapping userland cycle to the kernel: read(filefd) → VOP_READ(vnode) write(netfd) → sosend(socket) blksize → PAGE_SIZE Further optimisations: 2004: SF_NODISKIO flag Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
  • 6. History of sendfile(2) sendfile(2) in FreeBSD sendfile(2) in FreeBSD First implementation - mapping userland cycle to the kernel: read(filefd) → VOP_READ(vnode) write(netfd) → sosend(socket) blksize → PAGE_SIZE Further optimisations: 2004: SF_NODISKIO flag 2006: inner cycle, working on sbspace() bytes Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
  • 7. History of sendfile(2) sendfile(2) in FreeBSD sendfile(2) in FreeBSD First implementation - mapping userland cycle to the kernel: read(filefd) → VOP_READ(vnode) write(netfd) → sosend(socket) blksize → PAGE_SIZE Further optimisations: 2004: SF_NODISKIO flag 2006: inner cycle, working on sbspace() bytes 2013: sending a shared memory descriptor data Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 4 / 23
  • 8. What’s not right with sendfile(2) blocking on I/O Problem #1: blocking on I/O Algorithm of a modern HTTP-server: 1 Take yet another descriptor from kevent(2) 2 Do write(2)/read(2)/sendfile(2) on it 3 Go to 1 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 5 / 23
  • 9. What’s not right with sendfile(2) blocking on I/O Problem #1: blocking on I/O Algorithm of a modern HTTP-server: 1 Take yet another descriptor from kevent(2) 2 Do write(2)/read(2)/sendfile(2) on it 3 Go to 1 Bottleneck: any syscall time. Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 5 / 23
  • 10. What’s not right with sendfile(2) blocking on I/O Attempts to solve problem #1 Separate I/O contexts: processes, threads Apache nginx 2 Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 6 / 23
  • 11. What’s not right with sendfile(2) blocking on I/O Attempts to solve problem #1 Separate I/O contexts: processes, threads Apache nginx 2 SF_NODISKIO + aio_read(2) nginx Varnish Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 6 / 23
  • 12. What’s not right with sendfile(2) blocking on I/O More attempts . . . aio_mlock(2) instead of aio_read(2) aio_sendfile(2) ??? Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 7 / 23
  • 13. What’s not right with sendfile(2) control over Problem #2: control over VM VOP_READ() leaves pages in VM cache VOP_READ() [for UFS] does readahead Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 8 / 23
  • 14. What’s not right with sendfile(2) control over Problem #2: control over VM VOP_READ() leaves pages in VM cache VOP_READ() [for UFS] does readahead Not easy to prevent it doing that! Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 8 / 23
  • 15. New sendfile(2) implementation above pager waht if VOP_GETPAGES()? VOP_READ() → VOP_GETPAGES() Pros: sendfile() already works on pages implementations for vnode and shmem converge control over VM is now easier task Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
  • 16. New sendfile(2) implementation above pager waht if VOP_GETPAGES()? VOP_READ() → VOP_GETPAGES() Pros: sendfile() already works on pages implementations for vnode and shmem converge control over VM is now easier task Cons Losing readahead heuristics Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
  • 17. New sendfile(2) implementation above pager waht if VOP_GETPAGES()? VOP_READ() → VOP_GETPAGES() Pros: sendfile() already works on pages implementations for vnode and shmem converge control over VM is now easier task Cons Losing readahead heuristics But no one used them! Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 9 / 23
  • 18. New sendfile(2) VOP_GETPAGES_ASYNC() VOP_GETPAGES_ASYNC() int VOP_GETPAGES(struct vnode *vp, vm_page_t *ma, int count, int reqpage); 1 Initialize buf(9) 2 buf->b_iodone = bdone; 3 bstrategy(buf); 4 bwait(buf); /* sleeps until I/O completes */ 5 return; Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 10 / 23
  • 19. New sendfile(2) VOP_GETPAGES_ASYNC() VOP_GETPAGES_ASYNC() int VOP_GETPAGES_ASYNC(struct vnode *vp, vm_page_t *ma, int count, int reqpage, vop_getpages_iodone_t *iodone, void *arg); 1 Initialize buf(9) 2 buf->b_iodone = vnode_pager_async_iodone; 3 bstrategy(buf); 4 return; vnode_pager_async_iodone calls iodone() . Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 10 / 23
  • 20. New sendfile(2) non-blocking sendfile(2) naive non-blocking sendfile(2) In kern_sendfile(): 1 nios++; 2 VOP_GETPAGES_ASYNC(sendfile_iodone); In sendfile_iodone(): 1 nios--; 2 if (nios) return; 3 sosend(); Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 11 / 23
  • 21. New sendfile(2) non-blocking sendfile(2) the problem of naive implementation sendfile(filefd, sockfd, ..); write(sockfd, ..); Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 12 / 23
  • 22. New sendfile(2) “not ready” data in socket buffers socket buffer mbuf mbuf mbuf mbuf mbuf mbuf struct sockbuf struct mbuf *sb_mb struct mbuf *sb_mbtail u_int sb_cc Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 13 / 23
  • 23. New sendfile(2) “not ready” data in socket buffers socket buffer with “not ready” data mbuf mbuf mbuf mbuf mbuf mbuf page page struct sockbuf struct mbuf *sb_mb struct mbuf *sb_fnrdy struct mbuf *sb_mbtail u_int sb_acc u_int sb_ccc Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 14 / 23
  • 24. New sendfile(2) final implementation non-blocking sendfile(2) In kern_sendfile(): 1 nios++; 2 VOP_GETPAGES_ASYNC(sendfile_iodone); 3 sosend(NOT_READY); In sendfile_iodone(): 1 nios--; 2 if (nios) return; 3 soready(); Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 15 / 23
  • 25. New sendfile(2) comparison with old sendfile(2) traffic Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 16 / 23
  • 26. New sendfile(2) comparison with old sendfile(2) CPU idle Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 17 / 23
  • 27. New sendfile(2) comparison with old sendfile(2) profiling sendfile(2) in head aio_daemon 13.64% sys_sendfile 7.40% t4_intr 5.66% xpt_done 1.04% pagedaemon 4.16% scheduler 5.28% Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 18 / 23
  • 28. New sendfile(2) comparison with old sendfile(2) profiling new sendfile(2) sys_sendfile 16.9% t4_intr 8.17% xpt_done 9.91% pagedaemon 6.54% scheduler 3.58% Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 19 / 23
  • 29. New sendfile(2) comparison with old sendfile(2) profiling new sendfile(2) sys_sendfile 16.9% (vm_page_grab 9.24% !!) t4_intr 8.17% (tcp_output() 2.07% !!) xpt_done 9.91% (m_freem() 3.11% !!) pagedaemon 6.54% scheduler 3.58% Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 19 / 23
  • 30. New sendfile(2) comparison with old sendfile(2) what did change? New code always sends full socket buffer Which is good for TCP (as protocol) Which hurts VM, mbuf allocator, and unexpectedly TCP stack Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 20 / 23
  • 31. New sendfile(2) comparison with old sendfile(2) what did change? New code always sends full socket buffer Which is good for TCP (as protocol) Which hurts VM, mbuf allocator, and unexpectedly TCP stack Will fix that! Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 20 / 23
  • 32. New sendfile(2) comparison with old sendfile(2) old sendfile(2) @ Netflix Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 21 / 23
  • 33. New sendfile(2) comparison with old sendfile(2) new sendfile(2) @ Netflix Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 21 / 23
  • 34. New sendfile(2) plans and problems TODO list Problems: VM & I/O overcommit ZFS SCTP Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 22 / 23
  • 35. New sendfile(2) plans and problems TODO list Problems: VM & I/O overcommit ZFS SCTP Future plans: sendfile(2) doing TLS Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 22 / 23
  • 36. New sendfile(2) Questions? Gleb Smirnoff glebius@FreeBSD.org New sendfile(2) 20 February 2015 23 / 23