SlideShare a Scribd company logo
1 of 20
Netlink Performance
Optimization
- Kalimuthu Velappan
- Dhanasekar Rathinavel
Introduction
• Netlink messaging Architecture
• System Scaling Issue
• Proposed solution
• Netlink socket filtering
• CBPF /EBPF – Micro code assembly
• EBPF – Clang/LLVM integration/Restricted C coding
• PoC
• TeamD scaling issue verification
• Performance measurements
• Application Integration
• Proposed model
• Q & A
Netlink Messaging Framework
• SONiC mainly uses NETLINK_ROUTE family for Interface
notifications
• It is a broadcast domain
● All Network interface updates are grouped under
NETLINK_ROUTE family.
● Each netdevice notifies the NETLINK subsystem about the
change in its port-properties.
● NETLINK subsystem posts a message(pkt holding “struct
nlmsghdr”) to socket recv-Q of all the registered application.
● Application then reads the message from the recv-Q,
Teamd STPd
NETLINK subsystem
Device Driver
Network Interfaces: Bridge, Vlan, Eth, PO etc.
Other Apps
RTM_NEWLINK/ RTM_DELLINK
Multicasted to all registered apps
Applications interested in NETLINK_ROUTE family updates.
Vlanmgr Portmgrd Other Apps
Applications creating/updating the NetDevice Properties
User Space
Kernel Space
Application Interaction with Netlink
Vlanmgrd
Ex: Ethernet0 is added to 4K Vlans
<<config vlan member range add 2 4094 Ethernet0>>
User Space
Kernel Space
NETLINK subsystem
4K Vlans Ethernet0
NetDevices
8190
Teamsyncd
8190
STPd UDLD
8190 8190
Without Filter
Vlanmgrd
Ethernet0 is added to 4K Vlans
<< config vlan member range add 2 4094 Ethernet0>>
User Space
Kernel Space
NETLINK subsystem
4K Vlans Ethernet0
NetDevices
8190
Teamsyncd
Dropped
8190
STPd UDLD
8190
Teamsyncd & STPd - Binded with eBPF Filter to drop all
Vlan-Member add.
With eBPF Filter
Dropped
8190
nl_msg_hdr (for msg_type == RTM_NEWLINK/DELLINK)
ifinfomsg
Attribute-1
T = IFLA_ADDRESS
V = MAC
Attribute-2
T = IFLA_IFNAME
V = if_name
Attribute-3
T = IFLA_LINKINFO
V = Nested TLVs
T = IFLA_INFO_KIND
V = Team/Vlan
T = IFLA_INFO_SLAVE_KIND
V = Team/Vlan
TLV-N
Attribute-N
nlmsghdr (Carries single netdevice attributes)
sk_buff->data
Netlink Message Format
Every attribute change in the interface will generate the
RTM_NEWLINK message with all the attributes
nlmsghdr-1
sk_buff->data
nlmsghdr-2 nlmsghdr-3 nlmsghdr-N
nlmsghdr-1 nlmsghdr-2 nlmsghdr-3 nlmsghdr-N
nlmsghdr-1 nlmsghdr-2 nlmsghdr-3 nlmsghdr-N
sk_buff->data
sk_buff->data
NetLink Dump will continue untill the complete DB is sent to Application.
Each DUMP reply will have NLM_F_MULTI flag and the last DUMP msg will have NLMSG_DONE. Which is used in filter to trap all DUMP-replies.
NetLink Dump
SONiC Netlink Message - Scaling Issue
• Every net device has multiple attributes
• Any attribute change will generate an net-link message notification
• Application has to process all the netlink messages generated by all the net-devices.
• There is no way to register only for a specific interface or a specific attribute change.
• When 4K VLAN is configured per port
• It generates ~8K Netlink messages
• On a scaled system
• Each process registers for kernel link notification
• Each process suffers from the same bursty notification issue as seen with Teamd
• Easley more than 1M unnecessary messages are getting broadcasted across system.
• Application is not able to process all the messages during config reload and also system reboot
• When socket queue is getting full, messages are dropped with ENOBUF error. No way to retrieve
the lost notification
Netlink Filter
• Berkeley Socket Filter (BPF)
• Interface to execute Micro ASM in the kernel as Minimal VM
• ASM Filter code gets executed for every packet reception
• Return value decides whether to accept/drop the packet
• Gets executed as part of Netlink message sender context
• Filter execution doesn’t affect much of the CPU performance
Netlink socket filtering – CBPF/EBPF
• CBPF /EBPF
• Micro code assembly
• Performance – Optimized flow
• Easy to attach filter
• Limitations
• No loops
• Limited set of registers
• Jump tracing is very hard to debug
• No Local storage – Array/maps –
CBPF
• No NLATTR helper function in EBPF
fd = socket(NETLINIK_ROUTE)
Socket fd
Receive netlink message
struct bpf_insn prog[] = {
BPF_MOV64_REG(R6, R1),
BPF_LD_ABS(BPF_B, 14 + 9), /* Protocol offset */
BPF_JMP_IMM(BPF_NEQ, R0, 7, 1), /* UDP(7) */
BPF_MOV64_IMM(R0, 0xFFFF) /* 0xFFFF- ACCEPT */
BPF_EXIT_INSN(),
};
setsockoption(fd, SO_ATTACH_BPF..)
BPF verifier
BPF JIT compiler
BPF in
Native code
User
Kernel
recvmsg(fd..)
Netlink subsystem
Netlink socket filtering – Clang/LLVM
• Clang/LLVM
• Restricted C
• Array and Hash map
support
• Easy to write and debug
the filter code
• Limitations
• Not an optimized
instruction flow
fd = socket(NETLINIK_ROUTE)
Socket fd
Receive netlink message
SEC("socket") int bpf_prog1(struct __sk_buff *skb)
{
uint16_t flags = load_half(skb, offsetof(struct nlmsghdr, nlmsg_flags));
if ( flags & NLM_F_MULTI)
return ACCEPT_PKT;
else
return DROP_PKT;
}
Clang/llvm
compilation
BPF verifier
BPF JIT compiler
BPF in
Native code
User
Kernel
recvmsg(fd..)
Netlink subsystem
load_and_attach(fd, SO_ATTACH_BPF..)
filter-obj.bpf
PoC with TeamD
• Arlo [ JIRA-7122 ] is fixed
• Verified the ENOBUF issue is not
seen with 4K VLAN sanity suite.
• Thanks to Madhukar
• Helping to understand the teamd
filter requirements
• Validating the PoC filter
FILTER DROP COUNT
Dropped in
Kernel
Trapped to
Application
Dropped %
Teamd
(Per port-channel)
79814 238 99.7%
teamsyncd 214510 42696 83.4%
Design for PoC verification
• Added Kernel patch for nlattr
and nestednlattr search helper
function
• Customized EBPF filter logic for
TeamD
• Clang/LLVM compiler integration
fd = socket(NETLINIK_ROUTE)
Socket fd
Receive netlink message
Hash MAP
DB
BPF Filter
User
Kernel
Netlink subsystem
KEY /
IFINDEX
VALUE/
Attributes
1 [ s:1, f:2, v:3 ]
64 [ s:1, f:3, v:7 ]
23 [ s:1, f:5, v:6 ]
Access from User space
Application Integration Proposal
• EBPF assembly filter
• Clang/LLVM based filter
• Customized BPF filter library
• BCC – python integrated
EBPF assembly filter
• 11 Register set
• Kernel helper functions
• Kernel trace printk
• Array/Hash map APIs
• Tail calls
• Redirects
EBPF
Register
Description
R0 Return value from in-kernel
function, and exit value for eBPF
program
R1 ~ R5 Arguments from eBPF program to
in-kernel function
R6 ~ R9 Callee saved registers that in-kernel
function will preserve
R10 Read-only frame pointer to access
stack
Clang/LLVM
• Clang/LLVM compiler integration
• Build infra for compilation of
application specific filter
• Libsbpf.so - library
• Application interface
• Loads the ebpf object into kernel
• Attaches the ebpf filter code into
application socket
• Application
• App User will write custom filter
for their needs
Application
attach_filter(fd,”myfilter.o”)
libsbpf.so
attach_filter(fd, fobj)
BPF Filter build framework
BPF bytecode
compiler
MyFilter
[ My filter logic – myfilter.c ]
myfilter.o
filter callback
load_filter(fd, fobj)
Customized EBPF library for SONiC (Idea)
• Set of BPF filter rules and actions
• Rules can be
• Offset lookup and match
• Attribute lookup and match
• Nested attribute lookup and match
• Save result into a variable
• Action can be
• Accept
• Drop
• Jump to Nth rule
Label Rule Offset Mask Exp Action
FCHECK OFFSET 0x20 0xFF 0xaa ACCEPT
NLCHECK NLMATCH 0x56 0FE 0xbb GOTO NESTCHECK
DROP DROP 0x00 0x00 0x00 DROP
NESTCHECK NAMATCH 0x89 0xAF 0xcc ACCEPT
RETURN DROP 0x00 0x00 0x00 DROP
BPF Possibilities
• Time critical protocol packets can be generated from kernel.
• Statistics collection
• Custom user code injection
• And Much more …
Thank You

More Related Content

Similar to Netlink-Optimization.pptx

Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?OPNFV
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE
 
Sharing-Knowledge-OAM-3G-Ericsson .ppt
Sharing-Knowledge-OAM-3G-Ericsson   .pptSharing-Knowledge-OAM-3G-Ericsson   .ppt
Sharing-Knowledge-OAM-3G-Ericsson .pptwafawafa52
 
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RCThe Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RCThomas Graf
 
Better Network Management Through Network Programmability
Better Network Management Through Network ProgrammabilityBetter Network Management Through Network Programmability
Better Network Management Through Network ProgrammabilityCisco Canada
 
Introduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelIntroduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelMyNOG
 
Kernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at FacebookKernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at FacebookAnne Nicolas
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
 
Presentation on mcts & ccna
Presentation on mcts & ccnaPresentation on mcts & ccna
Presentation on mcts & ccnaArpit Prajapat
 
Snabbflow: A Scalable IPFIX exporter
Snabbflow: A Scalable IPFIX exporterSnabbflow: A Scalable IPFIX exporter
Snabbflow: A Scalable IPFIX exporterIgalia
 
Evaluation of OpenFlow in RB750GL
Evaluation of OpenFlow in RB750GLEvaluation of OpenFlow in RB750GL
Evaluation of OpenFlow in RB750GLToshiki Tsuboi
 
Chef arista devops days a'dam 2015
Chef arista devops days a'dam 2015Chef arista devops days a'dam 2015
Chef arista devops days a'dam 2015Edwin Beekman
 
Byte Ordering - Unit 2.pptx
Byte Ordering - Unit 2.pptxByte Ordering - Unit 2.pptx
Byte Ordering - Unit 2.pptxRockyBhai46825
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchChun Ming Ou
 
PLNOG 7: Emil Gągała, Sławomir Janukowicz - carrier grade NAT
PLNOG 7: Emil Gągała,  Sławomir Janukowicz - carrier grade NAT PLNOG 7: Emil Gągała,  Sławomir Janukowicz - carrier grade NAT
PLNOG 7: Emil Gągała, Sławomir Janukowicz - carrier grade NAT PROIDEA
 
Samsung_EnodeLSMR__Integration_Module__V1.pdf (1).pdf
Samsung_EnodeLSMR__Integration_Module__V1.pdf (1).pdfSamsung_EnodeLSMR__Integration_Module__V1.pdf (1).pdf
Samsung_EnodeLSMR__Integration_Module__V1.pdf (1).pdfRaviSharma1113
 

Similar to Netlink-Optimization.pptx (20)

Tos tutorial
Tos tutorialTos tutorial
Tos tutorial
 
Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
 
Sharing-Knowledge-OAM-3G-Ericsson .ppt
Sharing-Knowledge-OAM-3G-Ericsson   .pptSharing-Knowledge-OAM-3G-Ericsson   .ppt
Sharing-Knowledge-OAM-3G-Ericsson .ppt
 
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RCThe Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
 
Better Network Management Through Network Programmability
Better Network Management Through Network ProgrammabilityBetter Network Management Through Network Programmability
Better Network Management Through Network Programmability
 
Introduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelIntroduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, Intel
 
Kernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at FacebookKernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at Facebook
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Presentation on mcts & ccna
Presentation on mcts & ccnaPresentation on mcts & ccna
Presentation on mcts & ccna
 
Snabbflow: A Scalable IPFIX exporter
Snabbflow: A Scalable IPFIX exporterSnabbflow: A Scalable IPFIX exporter
Snabbflow: A Scalable IPFIX exporter
 
Evaluation of OpenFlow in RB750GL
Evaluation of OpenFlow in RB750GLEvaluation of OpenFlow in RB750GL
Evaluation of OpenFlow in RB750GL
 
Pcp
PcpPcp
Pcp
 
Chef arista devops days a'dam 2015
Chef arista devops days a'dam 2015Chef arista devops days a'dam 2015
Chef arista devops days a'dam 2015
 
Byte Ordering - Unit 2.pptx
Byte Ordering - Unit 2.pptxByte Ordering - Unit 2.pptx
Byte Ordering - Unit 2.pptx
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
OSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable SwitchOSN days 2019 - Open Networking and Programmable Switch
OSN days 2019 - Open Networking and Programmable Switch
 
infiniband.pdf
infiniband.pdfinfiniband.pdf
infiniband.pdf
 
PLNOG 7: Emil Gągała, Sławomir Janukowicz - carrier grade NAT
PLNOG 7: Emil Gągała,  Sławomir Janukowicz - carrier grade NAT PLNOG 7: Emil Gągała,  Sławomir Janukowicz - carrier grade NAT
PLNOG 7: Emil Gągała, Sławomir Janukowicz - carrier grade NAT
 
Samsung_EnodeLSMR__Integration_Module__V1.pdf (1).pdf
Samsung_EnodeLSMR__Integration_Module__V1.pdf (1).pdfSamsung_EnodeLSMR__Integration_Module__V1.pdf (1).pdf
Samsung_EnodeLSMR__Integration_Module__V1.pdf (1).pdf
 

More from KalimuthuVelappan (8)

rdma-intro-module.ppt
rdma-intro-module.pptrdma-intro-module.ppt
rdma-intro-module.ppt
 
lesson24.ppt
lesson24.pptlesson24.ppt
lesson24.ppt
 
kerch04.ppt
kerch04.pptkerch04.ppt
kerch04.ppt
 
memory_mapping.ppt
memory_mapping.pptmemory_mapping.ppt
memory_mapping.ppt
 
DPKG caching framework-latest .pptx
DPKG caching framework-latest .pptxDPKG caching framework-latest .pptx
DPKG caching framework-latest .pptx
 
memory.ppt
memory.pptmemory.ppt
memory.ppt
 
stack.pptx
stack.pptxstack.pptx
stack.pptx
 
lesson05.ppt
lesson05.pptlesson05.ppt
lesson05.ppt
 

Recently uploaded

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

Netlink-Optimization.pptx

  • 1. Netlink Performance Optimization - Kalimuthu Velappan - Dhanasekar Rathinavel
  • 2. Introduction • Netlink messaging Architecture • System Scaling Issue • Proposed solution • Netlink socket filtering • CBPF /EBPF – Micro code assembly • EBPF – Clang/LLVM integration/Restricted C coding • PoC • TeamD scaling issue verification • Performance measurements • Application Integration • Proposed model • Q & A
  • 3. Netlink Messaging Framework • SONiC mainly uses NETLINK_ROUTE family for Interface notifications • It is a broadcast domain ● All Network interface updates are grouped under NETLINK_ROUTE family. ● Each netdevice notifies the NETLINK subsystem about the change in its port-properties. ● NETLINK subsystem posts a message(pkt holding “struct nlmsghdr”) to socket recv-Q of all the registered application. ● Application then reads the message from the recv-Q,
  • 4. Teamd STPd NETLINK subsystem Device Driver Network Interfaces: Bridge, Vlan, Eth, PO etc. Other Apps RTM_NEWLINK/ RTM_DELLINK Multicasted to all registered apps Applications interested in NETLINK_ROUTE family updates. Vlanmgr Portmgrd Other Apps Applications creating/updating the NetDevice Properties User Space Kernel Space Application Interaction with Netlink
  • 5. Vlanmgrd Ex: Ethernet0 is added to 4K Vlans <<config vlan member range add 2 4094 Ethernet0>> User Space Kernel Space NETLINK subsystem 4K Vlans Ethernet0 NetDevices 8190 Teamsyncd 8190 STPd UDLD 8190 8190 Without Filter
  • 6. Vlanmgrd Ethernet0 is added to 4K Vlans << config vlan member range add 2 4094 Ethernet0>> User Space Kernel Space NETLINK subsystem 4K Vlans Ethernet0 NetDevices 8190 Teamsyncd Dropped 8190 STPd UDLD 8190 Teamsyncd & STPd - Binded with eBPF Filter to drop all Vlan-Member add. With eBPF Filter Dropped 8190
  • 7. nl_msg_hdr (for msg_type == RTM_NEWLINK/DELLINK) ifinfomsg Attribute-1 T = IFLA_ADDRESS V = MAC Attribute-2 T = IFLA_IFNAME V = if_name Attribute-3 T = IFLA_LINKINFO V = Nested TLVs T = IFLA_INFO_KIND V = Team/Vlan T = IFLA_INFO_SLAVE_KIND V = Team/Vlan TLV-N Attribute-N nlmsghdr (Carries single netdevice attributes) sk_buff->data Netlink Message Format Every attribute change in the interface will generate the RTM_NEWLINK message with all the attributes
  • 8. nlmsghdr-1 sk_buff->data nlmsghdr-2 nlmsghdr-3 nlmsghdr-N nlmsghdr-1 nlmsghdr-2 nlmsghdr-3 nlmsghdr-N nlmsghdr-1 nlmsghdr-2 nlmsghdr-3 nlmsghdr-N sk_buff->data sk_buff->data NetLink Dump will continue untill the complete DB is sent to Application. Each DUMP reply will have NLM_F_MULTI flag and the last DUMP msg will have NLMSG_DONE. Which is used in filter to trap all DUMP-replies. NetLink Dump
  • 9. SONiC Netlink Message - Scaling Issue • Every net device has multiple attributes • Any attribute change will generate an net-link message notification • Application has to process all the netlink messages generated by all the net-devices. • There is no way to register only for a specific interface or a specific attribute change. • When 4K VLAN is configured per port • It generates ~8K Netlink messages • On a scaled system • Each process registers for kernel link notification • Each process suffers from the same bursty notification issue as seen with Teamd • Easley more than 1M unnecessary messages are getting broadcasted across system. • Application is not able to process all the messages during config reload and also system reboot • When socket queue is getting full, messages are dropped with ENOBUF error. No way to retrieve the lost notification
  • 10. Netlink Filter • Berkeley Socket Filter (BPF) • Interface to execute Micro ASM in the kernel as Minimal VM • ASM Filter code gets executed for every packet reception • Return value decides whether to accept/drop the packet • Gets executed as part of Netlink message sender context • Filter execution doesn’t affect much of the CPU performance
  • 11. Netlink socket filtering – CBPF/EBPF • CBPF /EBPF • Micro code assembly • Performance – Optimized flow • Easy to attach filter • Limitations • No loops • Limited set of registers • Jump tracing is very hard to debug • No Local storage – Array/maps – CBPF • No NLATTR helper function in EBPF fd = socket(NETLINIK_ROUTE) Socket fd Receive netlink message struct bpf_insn prog[] = { BPF_MOV64_REG(R6, R1), BPF_LD_ABS(BPF_B, 14 + 9), /* Protocol offset */ BPF_JMP_IMM(BPF_NEQ, R0, 7, 1), /* UDP(7) */ BPF_MOV64_IMM(R0, 0xFFFF) /* 0xFFFF- ACCEPT */ BPF_EXIT_INSN(), }; setsockoption(fd, SO_ATTACH_BPF..) BPF verifier BPF JIT compiler BPF in Native code User Kernel recvmsg(fd..) Netlink subsystem
  • 12. Netlink socket filtering – Clang/LLVM • Clang/LLVM • Restricted C • Array and Hash map support • Easy to write and debug the filter code • Limitations • Not an optimized instruction flow fd = socket(NETLINIK_ROUTE) Socket fd Receive netlink message SEC("socket") int bpf_prog1(struct __sk_buff *skb) { uint16_t flags = load_half(skb, offsetof(struct nlmsghdr, nlmsg_flags)); if ( flags & NLM_F_MULTI) return ACCEPT_PKT; else return DROP_PKT; } Clang/llvm compilation BPF verifier BPF JIT compiler BPF in Native code User Kernel recvmsg(fd..) Netlink subsystem load_and_attach(fd, SO_ATTACH_BPF..) filter-obj.bpf
  • 13. PoC with TeamD • Arlo [ JIRA-7122 ] is fixed • Verified the ENOBUF issue is not seen with 4K VLAN sanity suite. • Thanks to Madhukar • Helping to understand the teamd filter requirements • Validating the PoC filter FILTER DROP COUNT Dropped in Kernel Trapped to Application Dropped % Teamd (Per port-channel) 79814 238 99.7% teamsyncd 214510 42696 83.4%
  • 14. Design for PoC verification • Added Kernel patch for nlattr and nestednlattr search helper function • Customized EBPF filter logic for TeamD • Clang/LLVM compiler integration fd = socket(NETLINIK_ROUTE) Socket fd Receive netlink message Hash MAP DB BPF Filter User Kernel Netlink subsystem KEY / IFINDEX VALUE/ Attributes 1 [ s:1, f:2, v:3 ] 64 [ s:1, f:3, v:7 ] 23 [ s:1, f:5, v:6 ] Access from User space
  • 15. Application Integration Proposal • EBPF assembly filter • Clang/LLVM based filter • Customized BPF filter library • BCC – python integrated
  • 16. EBPF assembly filter • 11 Register set • Kernel helper functions • Kernel trace printk • Array/Hash map APIs • Tail calls • Redirects EBPF Register Description R0 Return value from in-kernel function, and exit value for eBPF program R1 ~ R5 Arguments from eBPF program to in-kernel function R6 ~ R9 Callee saved registers that in-kernel function will preserve R10 Read-only frame pointer to access stack
  • 17. Clang/LLVM • Clang/LLVM compiler integration • Build infra for compilation of application specific filter • Libsbpf.so - library • Application interface • Loads the ebpf object into kernel • Attaches the ebpf filter code into application socket • Application • App User will write custom filter for their needs Application attach_filter(fd,”myfilter.o”) libsbpf.so attach_filter(fd, fobj) BPF Filter build framework BPF bytecode compiler MyFilter [ My filter logic – myfilter.c ] myfilter.o filter callback load_filter(fd, fobj)
  • 18. Customized EBPF library for SONiC (Idea) • Set of BPF filter rules and actions • Rules can be • Offset lookup and match • Attribute lookup and match • Nested attribute lookup and match • Save result into a variable • Action can be • Accept • Drop • Jump to Nth rule Label Rule Offset Mask Exp Action FCHECK OFFSET 0x20 0xFF 0xaa ACCEPT NLCHECK NLMATCH 0x56 0FE 0xbb GOTO NESTCHECK DROP DROP 0x00 0x00 0x00 DROP NESTCHECK NAMATCH 0x89 0xAF 0xcc ACCEPT RETURN DROP 0x00 0x00 0x00 DROP
  • 19. BPF Possibilities • Time critical protocol packets can be generated from kernel. • Statistics collection • Custom user code injection • And Much more …