IOCP vs EPOLL
Performance Comparison
Seungmo Koo
@sm9kr
kr.linkedin.com/in/sm9kr
Test Configuration
Dummy Clients Test Server
Random data packets
Relay to client (echo)
Client-side measurement:
Server throughput
(Send/Receive Mbps)
Server-side measurement:
CPU usage
(overall % and per-core %)
Gbe link
Test Environment - Server
• Intel i7-3770k, 16GB RAM, Realtek PCIe Gigabit Ethernet
• Disabled CPU-frequency scaling
• Performance Test Program
– Simple packet relay (echo) server using Boost.Asio 1.53
• Boost.Asio uses IOCP on Windows while it uses EPOLL on Linux
– I/O threads: 8
– Client sessions: 10000
– Buffer size per session: read 4096, write 4096
• Performance Check Program
– Linux: htop & sar
– Windows: perfmon
• Operating System
– Linux: Ubuntu Linux Server 13.04 64bit, kernel 3.8.0-23
+ max socket tuning
– Windows: Windows Server 2012 64bit
Test Environment - Client
• Mac mini server 2012 late
– Intel i7 quad-core, 16GB RAM, Gigabit Ethernet
• Dummy Client Program
– Simple packet generator using Boost.Asio 1.53
– # of Clients (session): 10000
– I/O threads: 8
– Buffer size per session: read 4096, write 4096
Performance Test
• Two Cases
– NAGLE: Nagle’s algorithm ON
– NODELAY: Nagle’s algorithm OFF
• Dummy Client Program
– Measuring server-throughput
– Sending random data to the Server and receiving those from
the server for 600 seconds
• Test Server
– Measuring server CPU usage for 600 seconds
• 3 Times Measurement
– Uses the median result
• As a result, every test was practically the same.
Performance Evaluation
• No Session Drop
– Both EPOLL and IOCP kept 10000 sessions alive during a test
• Normalized Throughput
– They were pretty much same in throughput
0
10
20
30
40
50
60
70
80
90
100
NODELAY NAGLE
Normalized Throughput
EPOLL
IOCP
Performance Evaluation
• CPU Utilization
– Average of 8-core usage
– Consists of Most kernel-time and Slight user-time
– IOCP defeated EPOLL
0%
2%
4%
6%
8%
10%
12%
14%
NODELAY NAGLE
Average CPU usage
EPOLL
IOCP
Performance Evaluation
• Average CPU Utilization Per Core (NODELAY mode)
– Similar to results in case of NAGLE and NODELAY
– EPOLL compared with IOCP
• One of the CPU cores is consistently having high CPU utilization
• While the other cores are close to the average utilization
0
10
20
30
40
50
60
70
EPOLL IOCP
Average CPU usage per core (%)
CORE 0
CORE 1
CORE 2
CORE 3
CORE 4
CORE 5
CORE 6
CORE 7
Don’t care.
It is Hyper
Threading
Effect
NIC Receive Processing
on only one core
See “RSS queue”
Update: New Experiment with RSS option
• Average CPU Utilization Per Core (NAGLE mode)
– Using RSS queue (a.k.a. NIC multi-queue)
– Server HW: Mac-mini 2012 server (Broadcom BCM57766 NIC)
– Server OS: Windows Server 2012 and Ubuntu Server 13.04
– Performance
• Throughput: EPOLL’s is approximately equal to IOCP’s
• Average CPU usage: virtually the same (EPOLL 7.38%, IOCP 6.8%)
0
5
10
15
20
EPOLL IOCP
Average CPU Usage per Core (%)
with RSS (NIC multi-queue)
CORE 0
CORE 1
CORE 2
CORE 3
CORE 4
CORE 5
CORE 6
CORE 7
Summary
• Throughput
– There was little difference between IOCP and EPOLL
• CPU usage
– Without RSS (Multi-queue)
• IOCP was more efficient than EPOLL in CPU utilization
• EPOLL had consistently high CPU utilization compared with IOCP
– With RSS mode
• IOCP and EPOLL are about the same in CPU usage
When making a high performance server for Linux,
you should use RSS (multi-queue) supported NIC
Reference: RSS Queue
Linux: NIC Multi-queue Support
Windows: NIC Receive Side Scaling
http://msdn.microsoft.com/en-us/library/windows/hardware/ff556942(v=vs.85).aspx

Windows IOCP vs Linux EPOLL Performance Comparison

  • 1.
    IOCP vs EPOLL PerformanceComparison Seungmo Koo @sm9kr kr.linkedin.com/in/sm9kr
  • 2.
    Test Configuration Dummy ClientsTest Server Random data packets Relay to client (echo) Client-side measurement: Server throughput (Send/Receive Mbps) Server-side measurement: CPU usage (overall % and per-core %) Gbe link
  • 3.
    Test Environment -Server • Intel i7-3770k, 16GB RAM, Realtek PCIe Gigabit Ethernet • Disabled CPU-frequency scaling • Performance Test Program – Simple packet relay (echo) server using Boost.Asio 1.53 • Boost.Asio uses IOCP on Windows while it uses EPOLL on Linux – I/O threads: 8 – Client sessions: 10000 – Buffer size per session: read 4096, write 4096 • Performance Check Program – Linux: htop & sar – Windows: perfmon • Operating System – Linux: Ubuntu Linux Server 13.04 64bit, kernel 3.8.0-23 + max socket tuning – Windows: Windows Server 2012 64bit
  • 4.
    Test Environment -Client • Mac mini server 2012 late – Intel i7 quad-core, 16GB RAM, Gigabit Ethernet • Dummy Client Program – Simple packet generator using Boost.Asio 1.53 – # of Clients (session): 10000 – I/O threads: 8 – Buffer size per session: read 4096, write 4096
  • 5.
    Performance Test • TwoCases – NAGLE: Nagle’s algorithm ON – NODELAY: Nagle’s algorithm OFF • Dummy Client Program – Measuring server-throughput – Sending random data to the Server and receiving those from the server for 600 seconds • Test Server – Measuring server CPU usage for 600 seconds • 3 Times Measurement – Uses the median result • As a result, every test was practically the same.
  • 6.
    Performance Evaluation • NoSession Drop – Both EPOLL and IOCP kept 10000 sessions alive during a test • Normalized Throughput – They were pretty much same in throughput 0 10 20 30 40 50 60 70 80 90 100 NODELAY NAGLE Normalized Throughput EPOLL IOCP
  • 7.
    Performance Evaluation • CPUUtilization – Average of 8-core usage – Consists of Most kernel-time and Slight user-time – IOCP defeated EPOLL 0% 2% 4% 6% 8% 10% 12% 14% NODELAY NAGLE Average CPU usage EPOLL IOCP
  • 8.
    Performance Evaluation • AverageCPU Utilization Per Core (NODELAY mode) – Similar to results in case of NAGLE and NODELAY – EPOLL compared with IOCP • One of the CPU cores is consistently having high CPU utilization • While the other cores are close to the average utilization 0 10 20 30 40 50 60 70 EPOLL IOCP Average CPU usage per core (%) CORE 0 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7 Don’t care. It is Hyper Threading Effect NIC Receive Processing on only one core See “RSS queue”
  • 9.
    Update: New Experimentwith RSS option • Average CPU Utilization Per Core (NAGLE mode) – Using RSS queue (a.k.a. NIC multi-queue) – Server HW: Mac-mini 2012 server (Broadcom BCM57766 NIC) – Server OS: Windows Server 2012 and Ubuntu Server 13.04 – Performance • Throughput: EPOLL’s is approximately equal to IOCP’s • Average CPU usage: virtually the same (EPOLL 7.38%, IOCP 6.8%) 0 5 10 15 20 EPOLL IOCP Average CPU Usage per Core (%) with RSS (NIC multi-queue) CORE 0 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7
  • 10.
    Summary • Throughput – Therewas little difference between IOCP and EPOLL • CPU usage – Without RSS (Multi-queue) • IOCP was more efficient than EPOLL in CPU utilization • EPOLL had consistently high CPU utilization compared with IOCP – With RSS mode • IOCP and EPOLL are about the same in CPU usage When making a high performance server for Linux, you should use RSS (multi-queue) supported NIC
  • 11.
    Reference: RSS Queue Linux:NIC Multi-queue Support Windows: NIC Receive Side Scaling http://msdn.microsoft.com/en-us/library/windows/hardware/ff556942(v=vs.85).aspx