SlideShare a Scribd company logo
1 of 11
Download to read offline
IOCP vs EPOLL
Performance Comparison
Seungmo Koo
@sm9kr
kr.linkedin.com/in/sm9kr
Test Configuration
Dummy Clients Test Server
Random data packets
Relay to client (echo)
Client-side measurement:
Server throughput
(Send/Receive Mbps)
Server-side measurement:
CPU usage
(overall % and per-core %)
Gbe link
Test Environment - Server
• Intel i7-3770k, 16GB RAM, Realtek PCIe Gigabit Ethernet
• Disabled CPU-frequency scaling
• Performance Test Program
– Simple packet relay (echo) server using Boost.Asio 1.53
• Boost.Asio uses IOCP on Windows while it uses EPOLL on Linux
– I/O threads: 8
– Client sessions: 10000
– Buffer size per session: read 4096, write 4096
• Performance Check Program
– Linux: htop & sar
– Windows: perfmon
• Operating System
– Linux: Ubuntu Linux Server 13.04 64bit, kernel 3.8.0-23
+ max socket tuning
– Windows: Windows Server 2012 64bit
Test Environment - Client
• Mac mini server 2012 late
– Intel i7 quad-core, 16GB RAM, Gigabit Ethernet
• Dummy Client Program
– Simple packet generator using Boost.Asio 1.53
– # of Clients (session): 10000
– I/O threads: 8
– Buffer size per session: read 4096, write 4096
Performance Test
• Two Cases
– NAGLE: Nagle’s algorithm ON
– NODELAY: Nagle’s algorithm OFF
• Dummy Client Program
– Measuring server-throughput
– Sending random data to the Server and receiving those from
the server for 600 seconds
• Test Server
– Measuring server CPU usage for 600 seconds
• 3 Times Measurement
– Uses the median result
• As a result, every test was practically the same.
Performance Evaluation
• No Session Drop
– Both EPOLL and IOCP kept 10000 sessions alive during a test
• Normalized Throughput
– They were pretty much same in throughput
0
10
20
30
40
50
60
70
80
90
100
NODELAY NAGLE
Normalized Throughput
EPOLL
IOCP
Performance Evaluation
• CPU Utilization
– Average of 8-core usage
– Consists of Most kernel-time and Slight user-time
– IOCP defeated EPOLL
0%
2%
4%
6%
8%
10%
12%
14%
NODELAY NAGLE
Average CPU usage
EPOLL
IOCP
Performance Evaluation
• Average CPU Utilization Per Core (NODELAY mode)
– Similar to results in case of NAGLE and NODELAY
– EPOLL compared with IOCP
• One of the CPU cores is consistently having high CPU utilization
• While the other cores are close to the average utilization
0
10
20
30
40
50
60
70
EPOLL IOCP
Average CPU usage per core (%)
CORE 0
CORE 1
CORE 2
CORE 3
CORE 4
CORE 5
CORE 6
CORE 7
Don’t care.
It is Hyper
Threading
Effect
NIC Receive Processing
on only one core
See “RSS queue”
Update: New Experiment with RSS option
• Average CPU Utilization Per Core (NAGLE mode)
– Using RSS queue (a.k.a. NIC multi-queue)
– Server HW: Mac-mini 2012 server (Broadcom BCM57766 NIC)
– Server OS: Windows Server 2012 and Ubuntu Server 13.04
– Performance
• Throughput: EPOLL’s is approximately equal to IOCP’s
• Average CPU usage: virtually the same (EPOLL 7.38%, IOCP 6.8%)
0
5
10
15
20
EPOLL IOCP
Average CPU Usage per Core (%)
with RSS (NIC multi-queue)
CORE 0
CORE 1
CORE 2
CORE 3
CORE 4
CORE 5
CORE 6
CORE 7
Summary
• Throughput
– There was little difference between IOCP and EPOLL
• CPU usage
– Without RSS (Multi-queue)
• IOCP was more efficient than EPOLL in CPU utilization
• EPOLL had consistently high CPU utilization compared with IOCP
– With RSS mode
• IOCP and EPOLL are about the same in CPU usage
When making a high performance server for Linux,
you should use RSS (multi-queue) supported NIC
Reference: RSS Queue
Linux: NIC Multi-queue Support
Windows: NIC Receive Side Scaling
http://msdn.microsoft.com/en-us/library/windows/hardware/ff556942(v=vs.85).aspx

More Related Content

What's hot

테라로 살펴본 MMORPG의 논타겟팅 시스템
테라로 살펴본 MMORPG의 논타겟팅 시스템테라로 살펴본 MMORPG의 논타겟팅 시스템
테라로 살펴본 MMORPG의 논타겟팅 시스템
QooJuice
 
송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010
devCAT Studio, NEXON
 
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
devCAT Studio, NEXON
 
임태현, 게임 서버 디자인 가이드, NDC2013
임태현, 게임 서버 디자인 가이드, NDC2013임태현, 게임 서버 디자인 가이드, NDC2013
임태현, 게임 서버 디자인 가이드, NDC2013
devCAT Studio, NEXON
 

What's hot (20)

테라로 살펴본 MMORPG의 논타겟팅 시스템
테라로 살펴본 MMORPG의 논타겟팅 시스템테라로 살펴본 MMORPG의 논타겟팅 시스템
테라로 살펴본 MMORPG의 논타겟팅 시스템
 
Multiplayer Game Sync Techniques through CAP theorem
Multiplayer Game Sync Techniques through CAP theoremMultiplayer Game Sync Techniques through CAP theorem
Multiplayer Game Sync Techniques through CAP theorem
 
Next-generation MMORPG service architecture
Next-generation MMORPG service architectureNext-generation MMORPG service architecture
Next-generation MMORPG service architecture
 
송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010송창규, unity build로 빌드타임 반토막내기, NDC2010
송창규, unity build로 빌드타임 반토막내기, NDC2010
 
게임서버프로그래밍 #2 - IOCP Adv
게임서버프로그래밍 #2 - IOCP Adv게임서버프로그래밍 #2 - IOCP Adv
게임서버프로그래밍 #2 - IOCP Adv
 
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
 
잘 알려지지 않은 숨은 진주, Winsock API - WSAPoll, Fast Loopback
잘 알려지지 않은 숨은 진주, Winsock API - WSAPoll, Fast Loopback잘 알려지지 않은 숨은 진주, Winsock API - WSAPoll, Fast Loopback
잘 알려지지 않은 숨은 진주, Winsock API - WSAPoll, Fast Loopback
 
Massive service basic
Massive service basicMassive service basic
Massive service basic
 
LockFree Algorithm
LockFree AlgorithmLockFree Algorithm
LockFree Algorithm
 
게임서버프로그래밍 #8 - 성능 평가
게임서버프로그래밍 #8 - 성능 평가게임서버프로그래밍 #8 - 성능 평가
게임서버프로그래밍 #8 - 성능 평가
 
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games ConferenceKGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
 
오딘: 발할라 라이징 MMORPG의 성능 최적화 사례 공유 [카카오게임즈 - 레벨 300] - 발표자: 김문권, 팀장, 라이온하트 스튜디오...
오딘: 발할라 라이징 MMORPG의 성능 최적화 사례 공유 [카카오게임즈 - 레벨 300] - 발표자: 김문권, 팀장, 라이온하트 스튜디오...오딘: 발할라 라이징 MMORPG의 성능 최적화 사례 공유 [카카오게임즈 - 레벨 300] - 발표자: 김문권, 팀장, 라이온하트 스튜디오...
오딘: 발할라 라이징 MMORPG의 성능 최적화 사례 공유 [카카오게임즈 - 레벨 300] - 발표자: 김문권, 팀장, 라이온하트 스튜디오...
 
Overlapped IO와 IOCP 조사 발표
Overlapped IO와 IOCP 조사 발표Overlapped IO와 IOCP 조사 발표
Overlapped IO와 IOCP 조사 발표
 
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
 
임태현, 게임 서버 디자인 가이드, NDC2013
임태현, 게임 서버 디자인 가이드, NDC2013임태현, 게임 서버 디자인 가이드, NDC2013
임태현, 게임 서버 디자인 가이드, NDC2013
 
게임서버프로그래밍 #1 - IOCP
게임서버프로그래밍 #1 - IOCP게임서버프로그래밍 #1 - IOCP
게임서버프로그래밍 #1 - IOCP
 
게임서버프로그래밍 #0 - TCP 및 이벤트 통지모델
게임서버프로그래밍 #0 - TCP 및 이벤트 통지모델게임서버프로그래밍 #0 - TCP 및 이벤트 통지모델
게임서버프로그래밍 #0 - TCP 및 이벤트 통지모델
 
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
 
사설 서버를 막는 방법들 (프리섭, 더이상은 Naver)
사설 서버를 막는 방법들 (프리섭, 더이상은 Naver)사설 서버를 막는 방법들 (프리섭, 더이상은 Naver)
사설 서버를 막는 방법들 (프리섭, 더이상은 Naver)
 
Tcp ip & io model
Tcp ip & io modelTcp ip & io model
Tcp ip & io model
 

Viewers also liked

Viewers also liked (6)

NHN NEXT 2014년도 게임트랙 소개
NHN NEXT 2014년도 게임트랙 소개 NHN NEXT 2014년도 게임트랙 소개
NHN NEXT 2014년도 게임트랙 소개
 
게임제작개론 : #0 과목소개
게임제작개론 : #0 과목소개게임제작개론 : #0 과목소개
게임제작개론 : #0 과목소개
 
게임서버프로그래밍 #6 - 예외처리 및 로깅
게임서버프로그래밍 #6 - 예외처리 및 로깅게임서버프로그래밍 #6 - 예외처리 및 로깅
게임서버프로그래밍 #6 - 예외처리 및 로깅
 
게임서버프로그래밍 #5 - 데이터베이스 핸들링
게임서버프로그래밍 #5 - 데이터베이스 핸들링게임서버프로그래밍 #5 - 데이터베이스 핸들링
게임서버프로그래밍 #5 - 데이터베이스 핸들링
 
게임서버프로그래밍 #4 - 멀티스레드 프로그래밍
게임서버프로그래밍 #4 - 멀티스레드 프로그래밍게임서버프로그래밍 #4 - 멀티스레드 프로그래밍
게임서버프로그래밍 #4 - 멀티스레드 프로그래밍
 
게임서버프로그래밍 #7 - 패킷핸들링 및 암호화
게임서버프로그래밍 #7 - 패킷핸들링 및 암호화게임서버프로그래밍 #7 - 패킷핸들링 및 암호화
게임서버프로그래밍 #7 - 패킷핸들링 및 암호화
 

Similar to Windows IOCP vs Linux EPOLL Performance Comparison

Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Tommy Lee
 

Similar to Windows IOCP vs Linux EPOLL Performance Comparison (20)

Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Ceph on rdma
Ceph on rdmaCeph on rdma
Ceph on rdma
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Performance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12cPerformance Tuning Oracle Weblogic Server 12c
Performance Tuning Oracle Weblogic Server 12c
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
IDF'16 San Francisco - Overclocking Session
IDF'16 San Francisco - Overclocking SessionIDF'16 San Francisco - Overclocking Session
IDF'16 San Francisco - Overclocking Session
 
MySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats newMySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats new
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOC
 
Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing Clusters
 
Inter connect2016 yps-2749_02232016_aspresented
Inter connect2016 yps-2749_02232016_aspresentedInter connect2016 yps-2749_02232016_aspresented
Inter connect2016 yps-2749_02232016_aspresented
 

More from Seungmo Koo

More from Seungmo Koo (13)

Understanding Tech Debt
Understanding Tech Debt Understanding Tech Debt
Understanding Tech Debt
 
게임서버프로그래밍 #3 - 메모리 및 오브젝트 풀링
게임서버프로그래밍 #3 - 메모리 및 오브젝트 풀링게임서버프로그래밍 #3 - 메모리 및 오브젝트 풀링
게임서버프로그래밍 #3 - 메모리 및 오브젝트 풀링
 
게임제작개론 : #9 라이브 서비스
게임제작개론 : #9 라이브 서비스게임제작개론 : #9 라이브 서비스
게임제작개론 : #9 라이브 서비스
 
게임제작개론 : #8 게임 제작 프로세스
게임제작개론 : #8 게임 제작 프로세스게임제작개론 : #8 게임 제작 프로세스
게임제작개론 : #8 게임 제작 프로세스
 
게임제작개론 : #7 팀 역할과 게임 리소스에 대한 이해
게임제작개론 : #7 팀 역할과 게임 리소스에 대한 이해게임제작개론 : #7 팀 역할과 게임 리소스에 대한 이해
게임제작개론 : #7 팀 역할과 게임 리소스에 대한 이해
 
게임제작개론 : #6 게임 시스템 구조에 대한 이해
게임제작개론 : #6 게임 시스템 구조에 대한 이해게임제작개론 : #6 게임 시스템 구조에 대한 이해
게임제작개론 : #6 게임 시스템 구조에 대한 이해
 
게임제작개론 : #5 플레이어에 대한 이해
게임제작개론 : #5 플레이어에 대한 이해게임제작개론 : #5 플레이어에 대한 이해
게임제작개론 : #5 플레이어에 대한 이해
 
게임제작개론 : #4 게임 밸런싱
게임제작개론 : #4 게임 밸런싱게임제작개론 : #4 게임 밸런싱
게임제작개론 : #4 게임 밸런싱
 
게임제작개론: #3 간접통제와 게임 커뮤니티
게임제작개론: #3 간접통제와 게임 커뮤니티게임제작개론: #3 간접통제와 게임 커뮤니티
게임제작개론: #3 간접통제와 게임 커뮤니티
 
게임제작개론: #2 세부 디자인 요소
게임제작개론: #2 세부 디자인 요소게임제작개론: #2 세부 디자인 요소
게임제작개론: #2 세부 디자인 요소
 
게임제작개론: #1 게임 구성 요소의 이해
게임제작개론: #1 게임 구성 요소의 이해게임제작개론: #1 게임 구성 요소의 이해
게임제작개론: #1 게임 구성 요소의 이해
 
NHN NEXT 게임 전공 소개
NHN NEXT 게임 전공 소개NHN NEXT 게임 전공 소개
NHN NEXT 게임 전공 소개
 
Game Developer Magazine, May 2012, Supplemental Info
Game Developer Magazine, May 2012, Supplemental InfoGame Developer Magazine, May 2012, Supplemental Info
Game Developer Magazine, May 2012, Supplemental Info
 

Recently uploaded

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 

Recently uploaded (20)

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 

Windows IOCP vs Linux EPOLL Performance Comparison

  • 1. IOCP vs EPOLL Performance Comparison Seungmo Koo @sm9kr kr.linkedin.com/in/sm9kr
  • 2. Test Configuration Dummy Clients Test Server Random data packets Relay to client (echo) Client-side measurement: Server throughput (Send/Receive Mbps) Server-side measurement: CPU usage (overall % and per-core %) Gbe link
  • 3. Test Environment - Server • Intel i7-3770k, 16GB RAM, Realtek PCIe Gigabit Ethernet • Disabled CPU-frequency scaling • Performance Test Program – Simple packet relay (echo) server using Boost.Asio 1.53 • Boost.Asio uses IOCP on Windows while it uses EPOLL on Linux – I/O threads: 8 – Client sessions: 10000 – Buffer size per session: read 4096, write 4096 • Performance Check Program – Linux: htop & sar – Windows: perfmon • Operating System – Linux: Ubuntu Linux Server 13.04 64bit, kernel 3.8.0-23 + max socket tuning – Windows: Windows Server 2012 64bit
  • 4. Test Environment - Client • Mac mini server 2012 late – Intel i7 quad-core, 16GB RAM, Gigabit Ethernet • Dummy Client Program – Simple packet generator using Boost.Asio 1.53 – # of Clients (session): 10000 – I/O threads: 8 – Buffer size per session: read 4096, write 4096
  • 5. Performance Test • Two Cases – NAGLE: Nagle’s algorithm ON – NODELAY: Nagle’s algorithm OFF • Dummy Client Program – Measuring server-throughput – Sending random data to the Server and receiving those from the server for 600 seconds • Test Server – Measuring server CPU usage for 600 seconds • 3 Times Measurement – Uses the median result • As a result, every test was practically the same.
  • 6. Performance Evaluation • No Session Drop – Both EPOLL and IOCP kept 10000 sessions alive during a test • Normalized Throughput – They were pretty much same in throughput 0 10 20 30 40 50 60 70 80 90 100 NODELAY NAGLE Normalized Throughput EPOLL IOCP
  • 7. Performance Evaluation • CPU Utilization – Average of 8-core usage – Consists of Most kernel-time and Slight user-time – IOCP defeated EPOLL 0% 2% 4% 6% 8% 10% 12% 14% NODELAY NAGLE Average CPU usage EPOLL IOCP
  • 8. Performance Evaluation • Average CPU Utilization Per Core (NODELAY mode) – Similar to results in case of NAGLE and NODELAY – EPOLL compared with IOCP • One of the CPU cores is consistently having high CPU utilization • While the other cores are close to the average utilization 0 10 20 30 40 50 60 70 EPOLL IOCP Average CPU usage per core (%) CORE 0 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7 Don’t care. It is Hyper Threading Effect NIC Receive Processing on only one core See “RSS queue”
  • 9. Update: New Experiment with RSS option • Average CPU Utilization Per Core (NAGLE mode) – Using RSS queue (a.k.a. NIC multi-queue) – Server HW: Mac-mini 2012 server (Broadcom BCM57766 NIC) – Server OS: Windows Server 2012 and Ubuntu Server 13.04 – Performance • Throughput: EPOLL’s is approximately equal to IOCP’s • Average CPU usage: virtually the same (EPOLL 7.38%, IOCP 6.8%) 0 5 10 15 20 EPOLL IOCP Average CPU Usage per Core (%) with RSS (NIC multi-queue) CORE 0 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7
  • 10. Summary • Throughput – There was little difference between IOCP and EPOLL • CPU usage – Without RSS (Multi-queue) • IOCP was more efficient than EPOLL in CPU utilization • EPOLL had consistently high CPU utilization compared with IOCP – With RSS mode • IOCP and EPOLL are about the same in CPU usage When making a high performance server for Linux, you should use RSS (multi-queue) supported NIC
  • 11. Reference: RSS Queue Linux: NIC Multi-queue Support Windows: NIC Receive Side Scaling http://msdn.microsoft.com/en-us/library/windows/hardware/ff556942(v=vs.85).aspx