This document provides an overview of sockets in the Linux kernel networking stack. It discusses the socket() system call and how it relates to struct socket and struct sock in the kernel. It covers the different socket types like SOCK_STREAM and SOCK_DGRAM. UDP protocol is explained in detail along with how packets are received via udp_rcv() in the kernel from both userspace and other kernel components. Control messages and UDP errors are also briefly mentioned.
The Linux booting process involves multiple stages:
1) The BIOS loads the first stage boot loader from the MBR which finds and loads the second stage boot loader.
2) The second stage boot loader loads the Linux kernel and initial RAM disk. It then passes control to the kernel.
3) The kernel initializes hardware, mounts the root filesystem, and loads the init process to perform further system initialization.
This document discusses the internals of the ixgbe driver, which is the Intel 10 Gigabit Ethernet driver for Linux. It describes how the driver handles transmission and reception of packets using ring buffers and NAPI. It also discusses how the driver supports eXpress Data Path (XDP) by using a separate ring for XDP packets and adjusting page reference counts to support XDP operations like redirection. Upcoming features that will improve XDP support for this driver are also mentioned.
The document discusses various data structures and functions related to network packet processing in the Linux kernel socket layer. It describes the sk_buff structure that is used to pass packets between layers. It also explains the net_device structure that represents a network interface in the kernel. When a packet is received, the interrupt handler will raise a soft IRQ for processing. The packet will then traverse various protocol layers like IP and TCP to be eventually delivered to a socket and read by a userspace application.
Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.
SOSCON 2019.10.17
What are the methods for packet processing on Linux? And how fast are each packet processing methods? In this presentation, we will learn how to handle packets on Linux (User space, socket filter, netfilter, tc), and compare performance with analysis of where each packet processing is done in the network stack (hook point). Also, we will discuss packet processing using XDP, an in-kernel fast-path recently added to the Linux kernel. eXpress Data Path (XDP) is a high-performance programmable network data-path within the Linux kernel. The XDP is located at the lowest level of access through SW in the network stack, the point at which driver receives the packet. By using the eBPF infrastructure at this hook point, the network stack can be expanded without modifying the kernel.
Daniel T. Lee (Hoyeon Lee)
@danieltimlee
Daniel T. Lee currently works as Software Engineer at Kosslab and contributing to Linux kernel BPF project. He has interest in cloud, Linux networking, and tracing technologies, and likes to analyze the kernel's internal using BPF technology.
This presentation includes basic information related to sockets ,socket-buffer,cliet-server programs and relationship between them
The files included in the ppt for the variables are taken from linux-2.6.10.
In case of any queriers.
contact souravpunoriyar@gmail.com
The Linux kernel is undergoing the most fundamental architecture evolution in history and is becoming a microkernel. Why is the Linux kernel evolving into a microkernel? The potentially biggest fundamental change ever happening to the Linux kernel. This talk covers how companies like Facebook and Google use BPF to patch 0-day exploits, how BPF will change the way features are added to the kernel forever, and how BPF is introducing a new type of application deployment method for the Linux kernel.
This document discusses how eBPF (extended Berkeley Packet Filter) can be used for kernel tracing. It provides an overview of BPF and eBPF, how eBPF programs are compiled and run in the kernel, the use of BPF maps, and how eBPF enables new possibilities for dynamic kernel instrumentation through techniques like Kprobes and ftrace.
The Linux booting process involves multiple stages:
1) The BIOS loads the first stage boot loader from the MBR which finds and loads the second stage boot loader.
2) The second stage boot loader loads the Linux kernel and initial RAM disk. It then passes control to the kernel.
3) The kernel initializes hardware, mounts the root filesystem, and loads the init process to perform further system initialization.
This document discusses the internals of the ixgbe driver, which is the Intel 10 Gigabit Ethernet driver for Linux. It describes how the driver handles transmission and reception of packets using ring buffers and NAPI. It also discusses how the driver supports eXpress Data Path (XDP) by using a separate ring for XDP packets and adjusting page reference counts to support XDP operations like redirection. Upcoming features that will improve XDP support for this driver are also mentioned.
The document discusses various data structures and functions related to network packet processing in the Linux kernel socket layer. It describes the sk_buff structure that is used to pass packets between layers. It also explains the net_device structure that represents a network interface in the kernel. When a packet is received, the interrupt handler will raise a soft IRQ for processing. The packet will then traverse various protocol layers like IP and TCP to be eventually delivered to a socket and read by a userspace application.
Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.
SOSCON 2019.10.17
What are the methods for packet processing on Linux? And how fast are each packet processing methods? In this presentation, we will learn how to handle packets on Linux (User space, socket filter, netfilter, tc), and compare performance with analysis of where each packet processing is done in the network stack (hook point). Also, we will discuss packet processing using XDP, an in-kernel fast-path recently added to the Linux kernel. eXpress Data Path (XDP) is a high-performance programmable network data-path within the Linux kernel. The XDP is located at the lowest level of access through SW in the network stack, the point at which driver receives the packet. By using the eBPF infrastructure at this hook point, the network stack can be expanded without modifying the kernel.
Daniel T. Lee (Hoyeon Lee)
@danieltimlee
Daniel T. Lee currently works as Software Engineer at Kosslab and contributing to Linux kernel BPF project. He has interest in cloud, Linux networking, and tracing technologies, and likes to analyze the kernel's internal using BPF technology.
This presentation includes basic information related to sockets ,socket-buffer,cliet-server programs and relationship between them
The files included in the ppt for the variables are taken from linux-2.6.10.
In case of any queriers.
contact souravpunoriyar@gmail.com
The Linux kernel is undergoing the most fundamental architecture evolution in history and is becoming a microkernel. Why is the Linux kernel evolving into a microkernel? The potentially biggest fundamental change ever happening to the Linux kernel. This talk covers how companies like Facebook and Google use BPF to patch 0-day exploits, how BPF will change the way features are added to the kernel forever, and how BPF is introducing a new type of application deployment method for the Linux kernel.
This document discusses how eBPF (extended Berkeley Packet Filter) can be used for kernel tracing. It provides an overview of BPF and eBPF, how eBPF programs are compiled and run in the kernel, the use of BPF maps, and how eBPF enables new possibilities for dynamic kernel instrumentation through techniques like Kprobes and ftrace.
- The document discusses Linux network stack monitoring and configuration. It begins with definitions of key concepts like RSS, RPS, RFS, LRO, GRO, DCA, XDP and BPF.
- It then provides an overview of how the network stack works from the hardware interrupts and driver level up through routing, TCP/IP and to the socket level.
- Monitoring tools like ethtool, ftrace and /proc/interrupts are described for viewing hardware statistics, software stack traces and interrupt information.
The document provides an overview of the initialization phase of the Linux kernel. It discusses how the kernel enables paging to transition from physical to virtual memory addresses. It then describes the various initialization functions that are called by start_kernel to initialize kernel features and architecture-specific code. Some key initialization tasks discussed include creating an identity page table, clearing BSS, and reserving BIOS memory.
The document discusses Linux networking architecture and covers several key topics in 3 paragraphs or less:
It first describes the basic structure and layers of the Linux networking stack including the network device interface, network layer protocols like IP, transport layer, and sockets. It then discusses how network packets are managed in Linux through the use of socket buffers and associated functions. The document also provides an overview of the data link layer and protocols like Ethernet, PPP, and how they are implemented in Linux.
The document discusses analyzing Linux kernel crash dumps. It covers various ways to gather crash data like serial console, netconsole, kmsg dumpers, Kdump, and Pstore. It then discusses analyzing the crashed kernel using tools like ksymoops, crash utility, and examining the backtrace, kernel logs, processes, and file descriptors. The document provides examples of gathering data from Pstore and using commands like bt, log, and ps with the crash utility to extract information from a crash dump.
netfilter is a framework provided by the Linux kernel that allows various networking-related operations to be implemented in the form of customized handlers.
iptables is a user-space application program that allows a system administrator to configure the tables provided by the Linux kernel firewall (implemented as different netfilter modules) and the chains and rules it stores.
Many systems use iptables/netfilter, Linux's native packet filtering/mangling framework since Linux 2.4, be it home routers or sophisticated cloud network stacks.
In this session, we will talk about the netfilter framework and its facilities, explain how basic filtering and mangling use-cases are implemented using iptables, and introduce some less common but powerful extensions of iptables.
Shmulik Ladkani, Chief Architect at Nsof Networks.
Long time network veteran and kernel geek.
Shmulik started his career at Jungo (acquired by NDS/Cisco) implementing residential gateway software, focusing on embedded Linux, Linux kernel, networking and hardware/software integration.
Some billions of forwarded packets later, Shmulik left his position as Jungo's lead architect and joined Ravello Systems (acquired by Oracle) as tech lead, developing a virtual data center as a cloud-based service, focusing around virtualization systems, network virtualization and SDN.
Recently he co-founded Nsof Networks, where he's been busy architecting network infrastructure as a cloud-based service, gazing at internet routes in astonishment, and playing the chkuku.
The document discusses Linux device trees and how they are used to describe hardware configurations. Some key points:
- A device tree is a data structure that describes hardware connections and configurations. It allows the same kernel to support different hardware.
- Device trees contain nodes that represent devices, with properties like compatible strings to identify drivers. They describe things like memory maps, interrupts, and bus attachments.
- The kernel uses the device tree passed by the bootloader to identify and initialize hardware. Drivers match based on compatible properties.
- Device tree files with .dts extension can be compiled to binary blobs (.dtb) and overlays (.dtbo) used at boot time to describe hardware.
The document discusses Ethernet device drivers in Linux. It describes the driver architecture, Ethernet packet format, driver development process, and important data structures like net_device and sk_buff. It provides examples of initializing a driver, probing hardware, uploading/downloading data using callbacks, interrupt handling, and buffer management. The key steps are registering the driver, allocating network devices, setting callbacks, and using sk_buff objects to transfer packets between layers.
About the author: Priya Autee is software engineer at Intel working on various leading edge IA features and Intel(R) RDT expert. She is focused on prototyping and researching open source APIs like DPDK, Intel(R) RDT etc. to support NFV/compute sensitive requirements on Intel Architecture. She holds Masters in Computer Science from Arizona State University, Arizona.
Agenda:
In this session, Shmulik Ladkani discusses the kernel's net_device abstraction, its interfaces, and how net-devices interact with the network stack. The talk covers many of the software network devices that exist in the Linux kernel, the functionalities they provide and some interesting use cases.
Speaker:
Shmulik Ladkani is a Tech Lead at Ravello Systems.
Shmulik started his career at Jungo (acquired by NDS/Cisco) implementing residential gateway software, focusing on embedded Linux, Linux kernel, networking and hardware/software integration.
51966 coffees and billions of forwarded packets later, with millions of homes running his software, Shmulik left his position as Jungo’s lead architect and joined Ravello Systems (acquired by Oracle) as tech lead, developing a virtual data center as a cloud service. He's now focused around virtualization systems, network virtualization and SDN.
The document discusses developing network device drivers for embedded Linux. It covers key topics like socket buffers, network devices, communicating with network protocols and PHYs, buffer management, and differences between Ethernet and WiFi drivers. The outline lists these topics and others like throughput and considerations. Prerequisites include C skills, Linux knowledge, and an understanding of networking and embedded driver development.
The document discusses ioremap and mmap functions in Linux for mapping physical addresses into the virtual address space. Ioremap is used when physical addresses are larger than the virtual address space size. It maps physical addresses to virtual addresses that can be accessed by the CPU. Mmap allows a process to map pages of a file into virtual memory. It is useful for reducing memory copies and improving performance of file read/write operations. The document outlines the functions, flags, and flows of ioremap, mmap, and implementing a custom mmap file operation for direct physical memory mapping.
In this talk Jiří Pírko discusses the design and evolution of the VLAN implementation in Linux, the challenges and pitfalls as well as hardware acceleration and alternative implementations.
Jiří Pírko is a major contributor to kernel networking and the creator of libteam for link aggregation.
FOSDEM15 SDN developer room talk
DPDK performance
How to not just do a demo with DPDK
The Intel DPDK provides a platform for building high performance Network Function Virtualization applications. But it is hard to get high performance unless certain design tradeoffs are made. This talk focuses on the lessons learned in creating the Brocade vRouter using DPDK. It covers some of the architecture, locking and low level issues that all have to be dealt with to achieve 80 Million packets per second forwarding.
The document discusses algorithms used in the DPDK libraries for fast lookups. It describes the characteristics and usage of the hash, LPM, and ACL libraries. The hash library uses cuckoo hashing for tables like FDB and host tables. The LPM library uses a modified DIR-24-8-BASIC algorithm for IPv4 and IPv6 route tables. The ACL library classifies entries using techniques like scalar, SSE, and AVX2 based on multi-bit tries. Examples of lookups and inserts are provided for each library.
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
This talk will start with a deep dive and hands on examples of BPF, possibly the most promising low level technology to address challenges in application and network security, tracing, and visibility. We will discuss how BPF evolved from a simple bytecode language to filter raw sockets for tcpdump to the a JITable virtual machine capable of universally extending and instrumenting both the Linux kernel and user space applications. The introduction is followed by a concrete example of how the Cilium open source project applies BPF to solve networking, security, and load balancing for highly distributed applications. We will discuss and demonstrate how Cilium with the help of BPF can be combined with distributed system orchestration such as Docker to simplify security, operations, and troubleshooting of distributed applications.
This document provides an introduction to eBPF and XDP. It discusses the history of BPF and how it evolved into eBPF. Key aspects of eBPF covered include the instruction set, JIT compilation, verifier, helper functions, and maps. XDP is introduced as a way to program the data plane using eBPF programs attached early in the receive path. Example use cases and performance benchmarks for XDP are also mentioned.
When your whole system is unresponsive, how to investigate on this failure ?
We'll see how to get a memory dump for offline analysis with kdump system.
Then how to analyze it with crash utility.
And finally, how to use crash on a running system to modify the kernel memory (at your own risks !)
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
This presentation introduces Data Plane Development Kit overview and basics. It is a part of a Network Programming Series.
First, the presentation focuses on the network performance challenges on the modern systems by comparing modern CPUs with modern 10 Gbps ethernet links. Then it touches memory hierarchy and kernel bottlenecks.
The following part explains the main DPDK techniques, like polling, bursts, hugepages and multicore processing.
DPDK overview explains how is the DPDK application is being initialized and run, touches lockless queues (rte_ring), memory pools (rte_mempool), memory buffers (rte_mbuf), hashes (rte_hash), cuckoo hashing, longest prefix match library (rte_lpm), poll mode drivers (PMDs) and kernel NIC interface (KNI).
At the end, there are few DPDK performance tips.
Tags: access time, burst, cache, dpdk, driver, ethernet, hub, hugepage, ip, kernel, lcore, linux, memory, pmd, polling, rss, softswitch, switch, userspace, xeon
- The document discusses Linux network stack monitoring and configuration. It begins with definitions of key concepts like RSS, RPS, RFS, LRO, GRO, DCA, XDP and BPF.
- It then provides an overview of how the network stack works from the hardware interrupts and driver level up through routing, TCP/IP and to the socket level.
- Monitoring tools like ethtool, ftrace and /proc/interrupts are described for viewing hardware statistics, software stack traces and interrupt information.
The document provides an overview of the initialization phase of the Linux kernel. It discusses how the kernel enables paging to transition from physical to virtual memory addresses. It then describes the various initialization functions that are called by start_kernel to initialize kernel features and architecture-specific code. Some key initialization tasks discussed include creating an identity page table, clearing BSS, and reserving BIOS memory.
The document discusses Linux networking architecture and covers several key topics in 3 paragraphs or less:
It first describes the basic structure and layers of the Linux networking stack including the network device interface, network layer protocols like IP, transport layer, and sockets. It then discusses how network packets are managed in Linux through the use of socket buffers and associated functions. The document also provides an overview of the data link layer and protocols like Ethernet, PPP, and how they are implemented in Linux.
The document discusses analyzing Linux kernel crash dumps. It covers various ways to gather crash data like serial console, netconsole, kmsg dumpers, Kdump, and Pstore. It then discusses analyzing the crashed kernel using tools like ksymoops, crash utility, and examining the backtrace, kernel logs, processes, and file descriptors. The document provides examples of gathering data from Pstore and using commands like bt, log, and ps with the crash utility to extract information from a crash dump.
netfilter is a framework provided by the Linux kernel that allows various networking-related operations to be implemented in the form of customized handlers.
iptables is a user-space application program that allows a system administrator to configure the tables provided by the Linux kernel firewall (implemented as different netfilter modules) and the chains and rules it stores.
Many systems use iptables/netfilter, Linux's native packet filtering/mangling framework since Linux 2.4, be it home routers or sophisticated cloud network stacks.
In this session, we will talk about the netfilter framework and its facilities, explain how basic filtering and mangling use-cases are implemented using iptables, and introduce some less common but powerful extensions of iptables.
Shmulik Ladkani, Chief Architect at Nsof Networks.
Long time network veteran and kernel geek.
Shmulik started his career at Jungo (acquired by NDS/Cisco) implementing residential gateway software, focusing on embedded Linux, Linux kernel, networking and hardware/software integration.
Some billions of forwarded packets later, Shmulik left his position as Jungo's lead architect and joined Ravello Systems (acquired by Oracle) as tech lead, developing a virtual data center as a cloud-based service, focusing around virtualization systems, network virtualization and SDN.
Recently he co-founded Nsof Networks, where he's been busy architecting network infrastructure as a cloud-based service, gazing at internet routes in astonishment, and playing the chkuku.
The document discusses Linux device trees and how they are used to describe hardware configurations. Some key points:
- A device tree is a data structure that describes hardware connections and configurations. It allows the same kernel to support different hardware.
- Device trees contain nodes that represent devices, with properties like compatible strings to identify drivers. They describe things like memory maps, interrupts, and bus attachments.
- The kernel uses the device tree passed by the bootloader to identify and initialize hardware. Drivers match based on compatible properties.
- Device tree files with .dts extension can be compiled to binary blobs (.dtb) and overlays (.dtbo) used at boot time to describe hardware.
The document discusses Ethernet device drivers in Linux. It describes the driver architecture, Ethernet packet format, driver development process, and important data structures like net_device and sk_buff. It provides examples of initializing a driver, probing hardware, uploading/downloading data using callbacks, interrupt handling, and buffer management. The key steps are registering the driver, allocating network devices, setting callbacks, and using sk_buff objects to transfer packets between layers.
About the author: Priya Autee is software engineer at Intel working on various leading edge IA features and Intel(R) RDT expert. She is focused on prototyping and researching open source APIs like DPDK, Intel(R) RDT etc. to support NFV/compute sensitive requirements on Intel Architecture. She holds Masters in Computer Science from Arizona State University, Arizona.
Agenda:
In this session, Shmulik Ladkani discusses the kernel's net_device abstraction, its interfaces, and how net-devices interact with the network stack. The talk covers many of the software network devices that exist in the Linux kernel, the functionalities they provide and some interesting use cases.
Speaker:
Shmulik Ladkani is a Tech Lead at Ravello Systems.
Shmulik started his career at Jungo (acquired by NDS/Cisco) implementing residential gateway software, focusing on embedded Linux, Linux kernel, networking and hardware/software integration.
51966 coffees and billions of forwarded packets later, with millions of homes running his software, Shmulik left his position as Jungo’s lead architect and joined Ravello Systems (acquired by Oracle) as tech lead, developing a virtual data center as a cloud service. He's now focused around virtualization systems, network virtualization and SDN.
The document discusses developing network device drivers for embedded Linux. It covers key topics like socket buffers, network devices, communicating with network protocols and PHYs, buffer management, and differences between Ethernet and WiFi drivers. The outline lists these topics and others like throughput and considerations. Prerequisites include C skills, Linux knowledge, and an understanding of networking and embedded driver development.
The document discusses ioremap and mmap functions in Linux for mapping physical addresses into the virtual address space. Ioremap is used when physical addresses are larger than the virtual address space size. It maps physical addresses to virtual addresses that can be accessed by the CPU. Mmap allows a process to map pages of a file into virtual memory. It is useful for reducing memory copies and improving performance of file read/write operations. The document outlines the functions, flags, and flows of ioremap, mmap, and implementing a custom mmap file operation for direct physical memory mapping.
In this talk Jiří Pírko discusses the design and evolution of the VLAN implementation in Linux, the challenges and pitfalls as well as hardware acceleration and alternative implementations.
Jiří Pírko is a major contributor to kernel networking and the creator of libteam for link aggregation.
FOSDEM15 SDN developer room talk
DPDK performance
How to not just do a demo with DPDK
The Intel DPDK provides a platform for building high performance Network Function Virtualization applications. But it is hard to get high performance unless certain design tradeoffs are made. This talk focuses on the lessons learned in creating the Brocade vRouter using DPDK. It covers some of the architecture, locking and low level issues that all have to be dealt with to achieve 80 Million packets per second forwarding.
The document discusses algorithms used in the DPDK libraries for fast lookups. It describes the characteristics and usage of the hash, LPM, and ACL libraries. The hash library uses cuckoo hashing for tables like FDB and host tables. The LPM library uses a modified DIR-24-8-BASIC algorithm for IPv4 and IPv6 route tables. The ACL library classifies entries using techniques like scalar, SSE, and AVX2 based on multi-bit tries. Examples of lookups and inserts are provided for each library.
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
This talk will start with a deep dive and hands on examples of BPF, possibly the most promising low level technology to address challenges in application and network security, tracing, and visibility. We will discuss how BPF evolved from a simple bytecode language to filter raw sockets for tcpdump to the a JITable virtual machine capable of universally extending and instrumenting both the Linux kernel and user space applications. The introduction is followed by a concrete example of how the Cilium open source project applies BPF to solve networking, security, and load balancing for highly distributed applications. We will discuss and demonstrate how Cilium with the help of BPF can be combined with distributed system orchestration such as Docker to simplify security, operations, and troubleshooting of distributed applications.
This document provides an introduction to eBPF and XDP. It discusses the history of BPF and how it evolved into eBPF. Key aspects of eBPF covered include the instruction set, JIT compilation, verifier, helper functions, and maps. XDP is introduced as a way to program the data plane using eBPF programs attached early in the receive path. Example use cases and performance benchmarks for XDP are also mentioned.
When your whole system is unresponsive, how to investigate on this failure ?
We'll see how to get a memory dump for offline analysis with kdump system.
Then how to analyze it with crash utility.
And finally, how to use crash on a running system to modify the kernel memory (at your own risks !)
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
This presentation introduces Data Plane Development Kit overview and basics. It is a part of a Network Programming Series.
First, the presentation focuses on the network performance challenges on the modern systems by comparing modern CPUs with modern 10 Gbps ethernet links. Then it touches memory hierarchy and kernel bottlenecks.
The following part explains the main DPDK techniques, like polling, bursts, hugepages and multicore processing.
DPDK overview explains how is the DPDK application is being initialized and run, touches lockless queues (rte_ring), memory pools (rte_mempool), memory buffers (rte_mbuf), hashes (rte_hash), cuckoo hashing, longest prefix match library (rte_lpm), poll mode drivers (PMDs) and kernel NIC interface (KNI).
At the end, there are few DPDK performance tips.
Tags: access time, burst, cache, dpdk, driver, ethernet, hub, hugepage, ip, kernel, lcore, linux, memory, pmd, polling, rss, softswitch, switch, userspace, xeon
PASTE: A Network Programming Interface for Non-Volatile Main Memorymicchie
The document describes PASTE, a network programming interface for non-volatile main memory. PASTE aims to enable durable zero-copy networking to NVMM and help applications organize persistent data structures on NVMM. It allows applications to use TCP/IP while protecting data. PASTE offers high performance even without native NVMM support by leveraging techniques from modern network stacks. Evaluation shows PASTE outperforms existing write-ahead logging and B+tree implementations for NVMM storage and is applicable to systems like Redis, packet logging middleboxes, and scalable NFV backends.
Linux internet server security and configuration tutorialannik147
The document provides steps to secure a web server, including:
1. Reducing exposed network services by commenting out unused services in configuration files like /etc/initd.conf and restarting daemons;
2. Configuring firewall rules using iptables or ipchains to block unnecessary ports;
3. Removing unneeded users and network services from startup.
This document provides an overview of setting up an Intel IoT Developer Kit including the hardware components, installing software, and running sample codes. It discusses the Galileo and Edison boards, microSD cards, IDEs, MRAA and UPM libraries, and connecting devices. It also demonstrates how to set up environments for C/C++ with Eclipse, JavaScript with XDK, and Arduino, and describes where to find documentation and sample codes for getting started with the kits and sensors.
HKG18-110 - net_mdev: Fast path user space I/OLinaro
Session ID: HKG18-110
Session Name: HKG18-110 - net_mdev: Fast path user space I/O
Speaker: Ilias Apalodimas
Track: Networking
★ Session Summary ★
User space I/O offers significant speedup potential for data plane and other high-performance applications, but at the high cost of writing and maintaining separate device drivers. Building on the existing kernel mediated device framework originally introduced to support GPUs, net\_mdev extends this support to network I/O, requiring only minor changes to existing kernel drivers. Applications, in turn, need only provide ""mini drivers"" to handle the performance I/O paths in user space while leaving control operations in the kernel.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-110/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-110.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-110.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Networking
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
A brief overview of how the ubiquitous 'iptables' command works, under the hood. This is by no means complete, and is intended as a primer. I presented these slides in addition to an interactive demo as a guest lecturer at a computer network security class at DePaul University, Chicago IL.
Let's trace Linux Lernel with KGDB @ COSCUP 2021Jian-Hong Pan
https://coscup.org/2021/en/session/39M73K
https://www.youtube.com/watch?v=L_Gyvdl_d_k
Engineers have plenty of debug tools for user space programs development, code tracing, debugging and analyzing. Except “printk”, do we have any other debug tools for Linux kernel development? The “KGDB” mentioned in Linux kernel document provides another possibility.
Will share how to experiment with the KGDB in a virtual machine. And, use GDB + OpenOCD + JTAG + Raspberry Pi in the real environment as the demo in this talk.
開發 user space 軟體時,工程師們有方便的 debug 工具進行查找、分析、除錯。但在 Linux kernel 的開發,除了 printk 外,還可以有哪些工具可以使用呢?從 Linux kernel document 可以看到 KGDB 相關的資訊,提供了在 kernel 除錯時的另一個可能性。
本次將分享,從建立最簡單環境的虛擬機機開始,到實際使用 GDB + OpenOCD + JTAG + Raspberry Pi 當作展示範例。
How to Use GSM/3G/4G in Embedded Linux SystemsToradex
The number of embedded devices that are connected to the internet is growing each day. Nowadays, they are installed majorly using a wireless connection. They need mobile network coverage to be connected to the internet. Read our next blog which tells you about the various configurations to connect a device such as Colibri iMX6S with the Colibri Evaluation Board running Linux to the internet through the PPP (Point-to-Point Protocol) link. Read More: https://www.toradex.com/blog/how-to-use-gsm-3g-4g-in-embedded-linux-systems
"Session ID: SFO17-102
Session Name: Deploy STM32 family on Zephyr - SFO17-102
Speaker: Erwan Gouriou
Track: LITE
★ Session Summary ★
Objects:
-Quick intro on STM32 offer
-Strategy used to minimize code and maintenance effort and break silos
-Status on supported drivers
Slides:
-STM32 families and SoCs (highlight number of refs (>900) and need for mutualization)
-SoC naming conventions
-ST boards
-STM32Cube
-Initial deployment in Zephyr
-STM32Cube introduction and introduction in Zephyr
*HAL vs LL
*Information conveyed by CMSIS files
-Driver deployment strategy
*CMSIS (generic defines)
*LL/HAL
-Simplification brought by driver init code and pinmux generated by Device tree
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-102/
Presentation:
Video:
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...PROIDEA
Users of modern Linux containerization technologies are frequently at loss with what kind of security guarantees are delivered by tools they use. Typical questions range from Can these be used to isolate software with known security shortcomings and rich history of security vulnerabilities? to even Can I used such technique to isolate user-generated and potentially hostile assembler payloads?
Modern Linux OS code-base as well as independent authors provide a plethora of options for those who desire to make sure that their computational loads are solidly confined. Potential users can choose from solutions ranging from Docker-like confinement projects, through Xen hypervisors, seccomp-bpf and ptrace-based sandboxes, to isolation frameworks based on hardware virtualization (e.g. KVM).
The talk will discuss available today techniques, with focus on (frequently overstated) promises regarding their strength. In the end, as they say: “Many speed bumps don’t make a wall
The document provides an overview of the OSI model, TCP/IP protocols, Cisco IOS modes, router components, cabling, router management, LAN switching concepts, IP addressing, routing protocols, and IPv6 migration methods. It summarizes key topics for the CCNA exam in 10 sentences or less per section.
The Linux Terminal Server Project (LTSP) allows inexpensive thin client computers to connect to a server running Linux, distributing the server's processing power to multiple desktop clients over the network; it works by booting the thin clients from the server using PXE, NFS and NBD to access the server's filesystem and run applications remotely; the document provides information on installing, configuring and troubleshooting an LTSP server and thin clients.
Note I only need the last 3 sub-questions ( e, f and g) 3. Firew.pdfezonesolutions
Note: I only need the last 3 sub-questions ( e, f and g) 3. Firewall Design (55pts) Design a
firewall for your Linux machine using the iptables packet filtering mod- It is likely that iptables
came pre-installed with the Linux distribution you are using. In the event you are using an old
version of the Linux kernel, you may need to upgrade it for iptables to work. Your homework
consists of writing iptables rules to do the following: (a) Place no restriction on outbound
packets. (b) Allow for ssH access (port 22) to your machine from only the fiu.edu domain. (c)
Assuming you are running an HTTPD server on your machine that can make available your
entire horne directory to the outside world, write a rule that allows only a single IP address in the
internet to access your machine for the HTTP service. (d) Permit Auth/Ident (port 113) that is
used by some services like SMTP and (e) Aocept the ICMP Echo requests (as used by ping)
ooming from the outside. Respond back with TcP RST or ICMP unreachable for incoming
requests blocked ports. (g) Block all input packats from the enn.com domain and respond back
with destination unreachable error message for all incoming SYN packets from the cnn.com
domain.
Solution
(e) Echo Request:
Ping operates by sending Internet Control Message Protocol (ICMP) echo request packets to the
target host and waiting for an ICMP echo reply. It measures the round-trip time from
transmission to reception, reporting errors and packet loss.
Ping is a computer network administration software utility used to test the reachability of a host
on an Internet Protocol (IP) network.
Packet InterNet Gopher, is a computer network administration utility used to test the reachability
of a host on an Internet Protocol (IP) network and to measure the total round-trip time for
messages sent from the originating host to a destination computer and back.
Ping operates by sending Internet Control Message Protocol (ICMP) Echo Request packets to the
target host and waiting for an ICMP Echo Reply. The program reports errors, packet loss, and a
statistical summary of the results, typically including the minimum, maximum, the mean round-
trip times, and standard deviation of the mean.
The Internet Control Message Protocol (ICMP) is a supporting protocol in the Internet protocol
suite. It is used by network devices, like routers, to send error messages and operational
information indicating, for example, that a requested service is not available or that a host or
router could not be reached. ICMP differs from transport protocols such as TCPand UDP in that
it is not typically used to exchange data between systems, nor is it regularly employed by end-
user network applications (with the exception of some diagnostic tools like ping and traceroute).
The Internet Control Message Protocol (ICMP) has many messages that are identified by a
“type” field. You need to use 0 and 8 ICMP code types.
=> Zero (0) is for echo-reply
=> Eight (8) is for echo-request.
To .
This document discusses network considerations for Real Application Clusters (RAC). It describes the different network types used, including public, private, storage, and backup networks. It discusses protocols like TCP and UDP used for different traffic. It also covers concepts like network architecture, layers, MTU, jumbo frames, and tools for monitoring network performance like netstat, ping, and traceroute.
This document provides an agenda and overview for a hands-on lab on using DPDK in containers. It introduces Linux containers and how they use fewer system resources than VMs. It discusses how containers still use the kernel network stack, which is not ideal for SDN/NFV usages, and how DPDK can be used in containers to address this. The hands-on lab section guides users through building DPDK and Open vSwitch, configuring them to work with containers, and running packet generation and forwarding using testpmd and pktgen Docker containers connected via Open vSwitch.
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterAnne Nicolas
NDIV is a young, very simple, yet efficient network traffic diverter. Its purpose is to help build network applications that intercept packets at line rate with a very low processing overhead. A first example application is a stateless HTTP server reaching line rate on all packet sizes.
Willy Tarreau, HaproxyTech
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
1. Linux Kernel Networking –
advanced topics (5)
Sockets in the kernel
Rami Rosen
ramirose@gmail.com
Haifux, August 2009
www.haifux.org
All rights reserved.
2. Linux Kernel Networking (5)-
advanced topics
● Note:
● This lecture is a sequel to the following 4
lectures I gave in Haifux:
1) Linux Kernel Networking lecture
– http://www.haifux.org/lectures/172/
– slides:http://www.haifux.org/lectures/172/netLec.pdf
2) Advanced Linux Kernel Networking -
Neighboring Subsystem and IPSec lecture
– http://www.haifux.org/lectures/180/
– slides:http://www.haifux.org/lectures/180/netLec2.pdf
3. Linux Kernel Networking (5)-
advanced topics
3) Advanced Linux Kernel Networking -
IPv6 in the Linux Kernel lecture
● http://www.haifux.org/lectures/187/
– Slides: http://www.haifux.org/lectures/187/netLec3.pdf
4) Wireless in Linux
http://www.haifux.org/lectures/206/
– Slides: http://www.haifux.org/lectures/206/wirelessLec.pdf
4. ● Table of contents:
– The socket() system call.
– UDP protocol.
– Control Messages.
– Appendixes.
● Note: All code examples in this lecture refer to
the recent 2.6.30 version of the Linux kernel.
6. ● In user space, we have application, session and presentation
layers(tcp/ip refers to all 3 as application layer)
● creating a socket from user space is done by
the socket() system call:
– int socket (int family, int type, int protocol);
– From man 2 socket:
– RETURN VALUE
– On success, a file descriptor for the new socket is returned.
– For open() system call (for files), we also get a file descriptor
as the return value.
– “Everything is a file” Unix paradigm.
● The first parameter, family, is also sometimes referred to as “domain”.
7. ● The family is PF_INET for IPV4 or PF_INET6 for IPV6.
– The family is PF_PACKET for Packet sockets, which operate
at the device driver layer. (Layer 2).
● pcap library for Linux uses PF_PACKET sockets:
– pcap library is in use by sniffers such as tcpdump.
● Also hostapd uses PF_PACKET sockets:
● (hostapd is a wireless access point management project)
● From hostapd:
– drv->monitor_sock = socket(PF_PACKET, SOCK_RAW,
htons(ETH_P_ALL));
8. ● Type:
– SOCK_STREAM and SOCK_DGRAM are the mostly used
types.
● SOCK_STREAM for TCP, SCTP, BLUETOOTH.
● SOCK_DGRAM for UDP.
● SOCK_RAW for RAW sockets.
● There are cases where protocol can be either
SOCK_STREAM or SOCK_DGRAM; for example, Unix
domain socket (AF_UNIX).
– Protocol:usually 0 ( IPPROTO_IP is 0, see:
include/linux/in.h).
– For SCTP, the protocol is IPPROTO_SCTP:
●
sockfd=socket(AF_INET, SOCK_STREAM,IPPROTO_SCTP);
10. ● SCTP: Stream Control Transmission Protocol.
● For every socket which is created by a userspace
application, there is a corresponding socket struct and
sock struct in the kernel.
● This system call eventually invokes the sock_create()
method in the kernel.
– An instance of struct socket is created (include/linux/net.h)
– struct socket has only 8 members; struct sock has more than 20,
and is one of the biggest structures in the networking stack. You
can easily be confused between them. So the convention is this:
– sock
sock always refers to struct socket.
– sk
sk always refers to struct sock.
12. ● The state can be
– SS_FREE
– SS_UNCONNECTED
– SS_CONNECTING
– SS_CONNECTED
– SS_DISCONNECTING
● These states are not layer 4 states (like TCP_ESTABLISHED
or TCP_CLOSE).
● The sk_protocol member of struct sock equals to the third
parameter (protocol) of the socket() system call.
14. ● Note: The inet_dgram_ops and inet_sockraw_ops differ only in
the .poll member:
– in inet_dgram_ops it is udp_poll().
– in inet_sockraw_ops, it is datagram_poll().
15. ● Diagram:
struct inet_sock
struct sock (sk)
struct ip_options *opt;
__u8 tos;
__u8 recverr:1;
__u8 hdrincl:1;
.....
inet_sk(sock *sk) => returns the inet_sock which contains sk
16. ● struct sock has three queues: rx , tx and err.
sk_buff sk_buff sk_buff
sk_receive_queue
sk_buff sk_buff sk_buff sk_write_queue
● Each queue has a lock (spinlock)
sk_buff sk_buff sk_buff
sk_error_queue
. . . .
. . . .
. . . .
17. ● skb_queue_tail() : Adding to the queue
● skb_dequeue() : removing from the queue
– With MSG_PEEK, this is done in two stages:
– skb_peek()
– __skb_unlink(). (to remove the sk_buff from the
queue).
● For the error queue: sock_queue_err_skb() adds
to its tail (include/net/sock.h). Eventually, it also calls
skb_queue_tail().
● Errors can be ICMP errors or EMSGSIZE errors.
● For more about errors,see APPENDIX F: UDP errors.
18. UDP and TCP
● No explicit connection setup is done with UDP.
– In TCP there is a preliminary connection setup.
● Packets can be lost in UDP (there is no
retransmission mechanism in the kernel). TCP
on the other hand is reliable (there is a
retransmission mechanism).
● Most of the Internet traffic is TCP (like http,
ssh).
– UDP is for audio/video (RTP)/streaming.
● Note: streaming with VLC is by UDP (RTP).
● Streaming via YouTube is tcp (http).
19. The udp header
● There are a very few UDP-based servers like DNS, NTP,
DHCP, TFTP and more.
● For DHCP, it is quite natural to be UDP (Since many times with
DHCP, you don't have a source address, which is a must for TCP).
● TCP implementation is much more complex
– The TCP header is much bigger than UDP header.
The udp header: include/linux/udp.h
struct udphdr {
__be16source;
__be16dest;
__be16len;
__sum16 check;
};
20. ● UDP packet = UDP header + payload
● All members are 2 bytes (16 bits)
source port dest port
len checksum
Payload
21. Receiving packets in UDP from
kernel
● UDP kernel sockets can get traffic either from userspace or
from kernel.
UDP – layer 4
IPv4 - layer 3
USER SPACE
UDP sockets
ip_local_deliver_finish()
calls udp_rcv()
NF_INET_LOCAL_IN
hook
KERNEL
sock_queue_rcv_skb()
Layer 2 (Ethernet)
22. ● From user space, you can receive udp traffic in
three system calls:
– recv() (when the socket is connected)
– recvfrom()
– recvmsg()
● All three are handled by udp_recvmsg() in the kernel.
● Note that fourth parameter of these 3 methods is flags;
however, this parameter is NOT changed upon return. If you are
interested in returned flags , you must use only recvmsg(), and
to retrieve the msg.msg_flags member.
23. ● For example, suppose you have a client-server udp
applications, and the sender sends a packets which is longer
then what the client had allocated for input buffer. The kernel
than truncates the packet, and send MSG_TRUNC flag. In
order to retrieve it, you should use something like:
recvmsg(udpSocket, &msg, flags);
if (msg.msg_flags & MSG_TRUNC)
printf("MSG_TRUNCn");
● There was a new suggestion recently for
recvmmsg() system call for receiving multiple
messages (By Arnaldo Carvalho de Melo)
● The recvmmsg() will reduce the overhead caused by multiple
system calls of recvmsg() in the usual case.
24. Receiving packets in UDP from user
space
● UDP kernel sockets can get traffic either from userspace or
from kernel.
UDP – layer 4
IPv4 - layer 3
USER SPACE
UDP sockets
KERNEL
udp_recvmsg()
recvfrom() system call
__skb_recv_datagram() :
reads from sk->sk_receive_queue
Layer 2 (Ethernet)
recv() syscall call recvmsg() syscall
25. Receiving packets - udp_rcv()
● udp_rcv() is the handler for all UDP packets
from the IP layer. It handles all incoming
packets in which the protocol field in the ip
header is IPPROTO_UDP (17) after ip layer
finished with them.
See the udp_protocol definition: (net/ipv4/af_inet.c)
struct net_protocol udp_protocol = {
.handler = udp_rcv,
.err_handler = udp_err,
...
};
26. ● In the same way we have :
– raw_rcv() as a handler for raw packets.
– tcp_v4_rcv() as a handler for TCP packets.
– icmp_rcv() as a handler for ICMP packets.
● Kernel implementation: the proto_register()
method registers a protocol handler.
(net/core/sock.c)
27. udp_rcv() implementation:
● For broadcasts and multicast – there is a special
treatment:
if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
return __udp4_lib_mcast_deliver(net, skb, uh,
saddr, daddr, udptable);
● Then perform a lookup in a hashtable of struct sock.
– Hash key is created from destination port in the udp header.
– If there is no entry in the hashtable, then there is no sock
listening on this UDP destination port => so send ICMP
back: (of port unreachable).
– icmp_send(skb, ICMP_DEST_UNREACH,
ICMP_PORT_UNREACH, 0);
28. udp_rcv()
● In this case, a corresponding SNMP MIB
counter is incremented
(UDP_MIB_NOPORTS).
● UDP_INC_STATS_BH(net, UDP_MIB_NOPORTS, proto ==
IPPROTO_UDPLITE);
● You can see it by:
netstat -s
.....
Udp:
...
35 packets to unknown port received.
29. udp_rcv() - contd
● Or, by:
● cat /proc/net/snmp | grep Udp:
Udp: InDatagrams NoPorts InErrors
OutDatagrams RcvbufErrors SndbufErrors
Udp: 14 35 0 30 0 0
● If there is a sock listening on the destination
port, call udp_queue_rcv_skb().
– Eventually calls sock_queue_rcv_skb().
● Which adds the packet to the sk_receive_queue by
skb_queue_tail()
31. ● udp_recvmsg():
● Calls __skb_recv_datagram() , for receiving
one sk_buff.
– The __skb_recv_datagram() may block.
– Eventually, what __skb_recv_datagram() does is
read one sk_buff from the sk_receive_queue
queue.
● memcpy_toiovec() performs the actual copy to
user space by invoking copy_to_user().
● One of the parameters of udp_recvmsg() is a
pointer to struct msghdr. Let's take a look:
32. MSGHDR
From include/linux/socket.h:
struct msghdr {
void *msg_name; /* Socket name */
int msg_namelen; /* Length of name */
struct iovec *msg_iov; /* Data blocks */
__kernel_size_t msg_iovlen; /* Number of blocks */
void *msg_control;
__kernel_size_t msg_controllen; /* Length of cmsg list */
unsigned msg_flags;
};
33. Control messages (ancillary
messages)
● The msg_control member of msgdhr represent a control
message.
– Sometimes you need to perform some special
things. For example, getting to know what was the
destination address of a received packet.
● Sometimes there is more than one address on a machine
(and also you can have multiple addresses on the same
nic).
– How can we know the destination address of the ip
header in the application?
– struct cmsghdr (/usr/include/bits/socket.h)
represents a control message.
34. ● cmsghdr members can mean different things based on the type
of socket.
● There is a set of macros for handling cmsghdr like
CMSG_FIRSTHDR(), CMSG_NXTHDR(), CMSG_DATA(),
CMSG_LEN() and more.
● There are no control messages for TCP sockets.
35. Socket options:
In order to tell the socket to get the information about the packet
destination, we should call setsockopt().
● setsockopt() and getsockopt() - set and get options on a socket.
– Both methods return 0 on success and -1 on error.
● Prototype: int setsockopt(int sockfd, int level, int optname,...
There are two levels of socket options:
To manipulate options at the sockets API level: SOL_SOCKET
To manipulate options at a protocol level, that protocol number
should be used;
– for example, for UDP it is IPPROTO_UDP or SOL_UDP
(both are equal 17) ; see include/linux/in.h and include/linux/socket.h
● SOL_IP is 0.
36. ● There are currently 19 Linux socket options and one
another on option for BSD compatibility.
See Appendix B for a full list of socket options.
● There is an option called IP_PKTINFO.
– We will set the IP_PKTINFO option on a socket in the
following example.
37. // from /usr/include/bits/in.h
#define IP_PKTINFO 8 /* bool */
/* Structure used for IP_PKTINFO. */
struct in_pktinfo
{
int ipi_ifindex; /* Interface index */
struct in_addr ipi_spec_dst; /* Routing destination address */
struct in_addr ipi_addr; /* Header destination address */
};
38. const int on = 1;
sockfd = socket(AF_INET, SOCK_DGRAM,0);
if (setsockopt(sockfd, SOL_IP, IP_PKTINFO, &on,
sizeof(on))<0)
perror("setsockopt");
...
...
...
39. When calling recvmsg(), we will parse the msghr like this:
for (cmptr=CMSG_FIRSTHDR(&msg); cmptr!=NULL;
cmptr=CMSG_NXTHDR(&msg,cmptr))
{
if (cmptr->cmsg_level == SOL_IP && cmptr->cmsg_type ==
IP_PKTINFO)
{
pktinfo = (struct in_pktinfo*)CMSG_DATA(cmptr);
printf("destination=%sn", inet_ntop(AF_INET, &pktinfo->ipi_addr,
str, sizeof(str)));
}
}
40. ● In the kernel, this calls ip_cmsg_recv() in
net/ipv4/ip_sockglue.c. (which eventually calls
ip_cmsg_recv_pktinfo()).
● You can in this way retrieve other fields of the ip
header:
– For getting the TTL:
● setsockopt(sockfd, SOL_IP, IP_RECVTTL, &on,
sizeof(on))<0).
● But: cmsg_type == IP_TTL.
– For getting ip_options:
● setsockopt() with IP_OPTIONS.
41. ● Note: you cannot get/set ip_options in Java
app.
42. Sending packets in UDP
● From user space, you can send udp traffic with
three system calls:
– send() (when the socket is connected).
– sendto()
– sendmsg()
● All three are handled by udp_sendmsg() in the kernel.
● udp_sendmsg() is much simpler than the tcp parallel
method , tcp_sendmsg().
● udp_sendpage() is called when user space calls
sendfile() (to copy a file into a udp socket).
– sendfile() can be used also to copy data between one file
descriptor and another.
43. – udp_sendpage() invokes udp_sendmsg().
● udp_sendpage() will work only if the nic supports
Scatter/Gather (NETIF_F_SG feature is supported).
44. Example – udp client
#include <stdio.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <string.h>
int main()
{
int s;
struct sockaddr_in target;
int res;
char buf[10];
46. ● For comparison, there is a tcp client in appendix C
● The source port of the UDP packet here is
chosen randomly in the kernel.
● If I want to send from a specified port ?
You can bind to a specific source port (888 in this example) by
adding:
source.sin_family = AF_INET;
source.sin_port = htons(888);
source.sin_addr.s_addr = htonl(INADDR_ANY);
if (bind(s, (struct sockaddr*)&source, sizeof(struct
sockaddr_in)) == -1)
perror("bind");
47. ● You cannot bind to privileged ports (ports lower
than 1024) when you are not root !
– Trying to do this will give:
– “Permission denied” (EPERM).
– You can enable non root binding on privileged port
by running as root: (You will need at least a 2.6.24
kernel)
– setcap 'cap_net_bind_service=+ep' udpclient
– This sets the CAP_NET_BIND_SERVICE
capability.
48. ● You cannot bind on a port which is already
bound.
– Trying to do this will give:
– “Address already in use” (EADDRINUSE)
● You cannot bind twice or more with the same
UDP socket (even if you change the port).
– You will get “bind: Invalid argument” error in such
case (EINVAL)
49. ● If you try connect() on an unbound UDP socket
and then bind() you will also get the EINVAL
error. The reason is that connecting to an
unbound socket will call inet_autobind() to
automatically bind an unbound socket (on a
random port). So after connect(), the socket is
bounded. And the calling bind() again will fail
with EINVAL (since the socket is already
bonded).
● Binding in the kernel for UDP is implemented in
inet_bind() and inet_autobind()
– (in IPV6: inet6_bind() )
50. Non local bind
● What happens if we try to bind on a non local address ? (a non
local address can be for example, an address of interface which
is temporarily down)
– We get EADDRNOTAVAIL error:
– “bind: Cannot assign requested address.”
– However, if we set
/proc/sys/net/ipv4/ip_nonlocal_bind to 1, by
– echo "1" > /proc/sys/net/ipv4/ip_nonlocal_bind
– Or adding in /etc/sysctl.conf:
net.ipv4.ip_nonlocal_bind=1
– The bind() will succeed, but it may sometimes break
applications.
51. ● What will happen if in the above udp client example, we will try
setting a broadcast address as the destination (instead of
192.168.0.121), thus:
inet_aton("255.255.255.255",&target.sin_addr);
● We will get EACCESS error (“Permission denied”) for sendto().
● In order that UDP broadcast will work, we have to add:
int flag = 1;
if (setsockopt (s, SOL_SOCKET, SO_BROADCAST,&flag,
sizeof(flag)) < 0)
perror("setsockopt");
52. UDP socket options
● For IPPROTO_UDP/SOL_UDP level, we have
two socket options:
● UDP_CORK socket option.
– Added in Linux kernel 2.5.44.
int state=1;
setsockopt(s, IPPROTO_UDP, UDP_CORK, &state,
sizeof(state));
for (j=1;j<1000;j++)
sendto(s,buf1,...)
state=0;
setsockopt(s, IPPROTO_UDP, UDP_CORK, &state,
sizeof(state));
53. ● The above code fragment will call
udp_sendmsg() 1000 times without actually
sending anything on the wire (in the usual case,
when without setsockopt() with UDP_CORK,
1000 packets will be send).
● Only after the second setsockopt() is called,
with UDP_CORK and state=0, one packet is
sent on the wire.
● Kernel implementation: when using
UDP_CORK, udp_sendmsg() passes
MSG_MORE to ip_append_data().
54. – Implementation detail: UDP_CORK is not in glibc-header
(/usr/include/netinet/udp.h); you need to add in your
program:
– #define UDP_CORK 1
● UDP_ENCAP socket option.
– For usage with IPSEC.
● Used, for example, in ipsec-tools.
● Note: UDP_ENCAP does not appear yet in the man page
of udp (UDP_CORK does appear).
● Note that there are other socket options at the
SOL_SOCKET level which you can get/set on
UDP sockets: for example, SO_NO_CHECK (to
disable checksum on UDP receive). (see Appendix E).
55. ● SO_DONTROUTE (equivalent to MSG_DONTROUTE in send().
● The SO_DONTROUTE option tells “don't send via a gateway,
only send to directly connected hosts.”
● Adding:
– setsockopt(s, SOL_SOCKET, SO_DONTROUTE, val,
sizeof(one)) < 0)
– And sending the packet to a host on a different network will
cause “Network is unreachable” error to be received.
(ENETUNREACH)
– The same will happen when MSG_DONTROUTE flag is set
in sendto().
● SO_SNDBUF.
● getsockopt(s, SOL_SOCKET, SO_SNDBUF, (void *) &sndbuf).
56. ● Suppose we want to receive ICMP errors with the UDP client
example (like ICMP destination unreachable/port unreachable).
● How can we achieve this ?
● First, we should set this socket option:
– int val=1;
– setsockopt(s, SOL_IP, IP_RECVERR,(char*)&val, sizeof(val));
57. ● Then, we should add a call to a method like this
for receiving error messages:
int recv_err(int s)
{
int res;
char cbuf[512];
struct iovec iov;
struct msghdr msg;
struct cmsghdr *cmsg;
struct sock_extended_err *e;
struct icmphdr icmph;
struct sockaddr_in target;
62. ● The destination UDP port must not be 0.
● If we try destination port of 0 we get EINVAL
error as a return value of udp_sendmsg()
– The destination UDP is embedded inside the
msghdr parameter (In fact, msg->msg_name
represents a sockaddr_in; sin_port is sockaddr_in
is the destination port number).
● MSG_OOB is the only illegal flag for UDP.
Returns EOPNOTSUPP error if such a flag is
passed. (only permitted to SOCK_STREAM)
● MSG_OOB is also illegal in AF_UNIX.
63. ● OOB stands for “Out Of Band data”.
● The MSG_OOB flag is permitted in TCP.
– It enables sending one byte of data in urgent mode.
– (telnet , “ctrl/c” for example).
● The destination must be either:
– specified in the msghdr (the name field in msghdr).
– Or the socket is connected.
● sk->sk_state == TCP_ESTABLISHED
– Notice that though this is UDP, we use TCP semantics here.
64. Sending packets in UDP (contd)
● In case the socket is not connected, we should
find a route to it; this is done by calling
ip_route_output_flow().
● In case it is connected, we use the route from
the sock (sk_dst_cache member of sk, which is
an instance of dst_entry).
– When the connect() system call was invoked,
ip4_datagram_connect() find the route by
ip_route_connect() and set sk->sk_dst_cache in
sk_dst_set()
● Moving the packet to Layer 3 (IP layer) is done
by ip_append_data().
65. ● In TCP, moving the packet to Layer 3 is done
with ip_queue_xmit().
– What's the difference ?
● UDP does not handle fragmentation;
ip_append_data() does handle fragmentation.
– TCP handles fragmentation in layer 4. So no need
for ip_append_data().
66. ● ip_queue_xmit() is (naturally) a simpler method.
● Basically what the udp_sendmsg() method
does is:
● Finds the route for the packet by
ip_route_output_flow()
● Sends the packet with
ip_local_out(skb)
67. Asynchronous I/O
● There is support for Asynchronous I/O in UDP
sockets.
● This means that instead of polling to know if
there is data (by select(), for example), the
kernel sends a SIGIO signal in such a case.
68. ● Using Asynchronous I/O UDP in a user space
application is done in three stages:
– 1) Adding a SIGIO signal handler by calling
sigaction() system call
– 2) Calling fcntl() with F_SETOWN and the pid of our
process to tell the process that it is the owner of the
socket (so that SIGIO signals will be delivered to it).
Several processes can access a socket. If we will not call
fcntl() with F_SETOWN, there can be ambiguity as to which
process will get the SIGIO signal. For example, if we call
fork() the owner of the SIGIO is the parent; but we can call,
in the son, fcntl(s,F_SETOWN, getpid()).
– 3) Setting flags: calling fcntl() with F_SETFL and
O_NONBLOCK | FASYNC.
70. handler.sa_handler = SIGIOHandler;
sigfillset(&handler.sa_mask);
handler.sa_flags = 0;
sigaction(SIGIO, &handler, 0);
fcntl(servSocket,F_SETOWN, getpid());
fcntl(servSocket,F_SETFL, O_NONBLOCK | FASYNC);
● The fcntl() which sets the O_NONBLOCK | FASYNC flags
invokes sock_fasync() in net/socket.c to add the socket.
– The SIGIOHandler() method will be called when there is
data (since a SIGIO signal was generated) ; it should call
recvmsg().
71. Appendix B : Socket options
● Socket options by protocol:
IP protocol (SOL_IP) 19 socket options:
IP_TOS IP_TTL
IP_HDRINCL IP_OPTIONS
IP_ROUTER_ALERT IP_RECVOPTS
IP_RETOPTS IP_PKTINFO
IP_PKTOPTIONS IP_MTU_DISCOVER
IP_RECVERR IP_RECVTTL
IP_RECVTOS IP_MTU
IP_FREEBIND IP_IPSEC_POLICY
IP_XFRM_POLICY IP_PASSSEC
IP_TRANSPARENT
Note: For BSD compatibility there is IP_RECVRETOPTS (which is identical to
IP_RETOPTS).
72. ● AF_UNIX:
– SO_PASSCRED for AF_UNIX sockets.
– Note:For historical reasons these socket options are specified with a
SOL_SOCKET type even though they are PF_UNIX specific.
● UDP:
– UDP_CORK (IPPROTO_UDP level).
● RAW:
– ICMP_FILTER
● TCP:
– TCP_CORK
– TCP_DEFER_ACCEPT
– TCP_INFO
– TCP_KEEPCNT
78. tcp client - contd.
● If on the other side (192.168.0.121 in this example) there is no
TCP server listening on this port (853) you will get this error for
the socket() system call:
– connect: Connection refused.
● You can send data on this socket by adding, for example:
const char *message = "mymessage";
int length;
length = strlen(message)+1;
res = write(sd, message, length);
● write() is the same as send(), but with no flags.
79. Appendix D : ICMP options
● These are ICMP options you can set with
setsockopt on RAW ICMP socket: (see
/usr/include/netinet/ip_icmp.h)
ICMP_ECHOREPLY
ICMP_DEST_UNREACH
ICMP_SOURCE_QUENCH
ICMP_REDIRECT
ICMP_ECHO
ICMP_TIME_EXCEEDED
ICMP_PARAMETERPROB
ICMP_TIMESTAMP
81. APPENDIX E: flags for send/receive
MSG_OOB
MSG_PEEK
MSG_DONTROUTE
MSG_TRYHARD - Synonym for MSG_DONTROUTE for DECnet
MSG_CTRUNC
MSG_PROBE - Do not send. Only probe path f.e. for MTU
MSG_TRUNC
MSG_DONTWAIT - Nonblocking io
MSG_EOR - End of record
MSG_WAITALL - Wait for a full request
82. MSG_FIN
MSG_SYN
MSG_CONFIRM - Confirm path validity
MSG_RST
MSG_ERRQUEUE - Fetch message from error queue
MSG_NOSIGNAL - Do not generate SIGPIPE
MSG_MORE 0x8000 - Sender will send more.
83. Example: set and get an option
● This simple example demonstrates how to set and get an IP layer option:
#include <stdio.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <string.h>
int main()
{
int s;
int opt;
int res;
int one = 1;
int size = sizeof(opt);
84. s = socket(AF_INET, SOCK_DGRAM, 0);
if (s<0)
perror("socket");
res = setsockopt(s, SOL_IP, IP_RECVERR, &one, sizeof(one));
if (res==-1)
perror("setsockopt");
res = getsockopt(s, SOL_IP, IP_RECVERR,&opt,&size);
if (res==-1)
perror("getsockopt");
printf("opt = %dn",opt);
close(s);
}
85. Example: record route option
● This example shows how to send a record route
option.
#define NROUTES 9
int main()
{
int s;
int optlen=0;
struct sockaddr_in target;
int res;
89. ● Another metric:
– cat /proc/net/udp
– The last column in: drops
● Represents sk->sk_drops.
● Incremented in __udp_queue_rcv_skb()
– net/ipv4/udp.c
● When do RcvbufErrors occur ?
– The total number of bytes queued in sk_receive_queue
queue of a socket is sk->sk_rmem_alloc.
– The total allowed memory of a socket is sk->sk_rcvbuf.
● It can be retrieved with getsockopt() using SO_RCVBUF.
90. ● Each time a packet is received, the sk-
>sk_rmem_alloc is incremented by skb->truesize:
– skb->truesize it the size (in bytes) allocated for the data of
the skb plus the size of sk_buff structure itself.
– This incrementation is done in skb_set_owner_r()
...
atomic_add(skb->truesize, &sk->sk_rmem_alloc);
...
– see: include/net/sock.h
91. ● When the packet is freed by kfree_skb(), we decrement sk-
>sk_rmem_alloc by skb->truesize; this is done in
sock_rfree():
● sock_rfree()
...
atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
...
Immediately in the beginning of sock_queue_rcv_skb(), we
have this check:
if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
(unsigned)sk->sk_rcvbuf) {
err = -ENOMEM;
92. ● When returning -ENOMEM, this notifies the
caller to drop the packet.
● This is done in __udp_queue_rcv_skb() method:
static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
{
...
if ((rc = sock_queue_rcv_skb(sk, skb)) < 0) {
/* Note that an ENOMEM error is charged twice */
if (rc == -ENOMEM) {
UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_RCVBUFERRORS,
is_udplite);
atomic_inc(&sk->sk_drops);
93. ● The default size of sk->sk_rcvbuf is SK_RMEM_MAX
(sysctl_rmem_max).
● It equals to
● (sizeof(struct sk_buff) + 256) * 256
● See: SK_RMEM_MAX definition in
net/core/sock.c
● This can be viewed and modified by:
– /proc/sys/net/core/rmem_default entry.
– getsockopt()/setsockopt() with SO_RCVBUF.
94. ● For the send queue (sk_write_queue), we have in
ip_append_data() a call to sock_alloc_send_skb(), which
eventually invokes sock_alloc_send_pskb().
● In sock_alloc_send_pskb(), we peform this check:
...
if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)
...
● If it is true, everything is fine.
● If not, we end with setting SOCK_ASYNC_NOSPACE and
SOCK_NOSPACE flags of the socket:
set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags);
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
95. ● In udp_sendmsg(), we check the SOCK_NOSPACE flag. If it is
set, we increment the UDP_MIB_SNDBUFERRORS counter.
● sock_alloc_send_pskb() calls skb_set_owner_w().
● In skb_set_owner_w(), we have:
...
atomic_add(skb->truesize, &sk->sk_wmem_alloc);
...
96. When the packet is freed by kfree_skb(), we decrement
sk_wmem_alloc, in sock_wfree() method:
sock_wfree()
...
atomic_sub(skb->truesize, &sk->sk_wmem_alloc);
...
97. Tips
● To find out socket used by a process:
● ls -l /proc/[pid]/fd|grep socket|cut -d: -f3|sed 's/[//;s/]//'
● The number returned is the inode number of the socket.
● Information about these sockets can be obtained from
– netstat -ae
● After starting a process which creates a socket, you can see
that the inode cache was incremented by one by:
● more /proc/slabinfo | grep sock
● sock_inode_cache 476 485 768 5 1 : tunables 0 0
0 : slabdata 97 97 0
● The first number, 476, is the number of active objects.