2. Agenda
● System calls in Networking world
● Client server model
● Linux networking stack
● Evolution of networking stack
● Driver Interface
● Introduction to Wifi Stack
● Wifi stack as an example
● Future...?
3. Simple router Control plane
Network Driver
Data Plane
Control Plane
Kernel Space
Module 1 Module 2 Module n
Intercard
communication
mechanism
Interconnect
Protocol
Control Plane
User Space
Module 1 to
Module n are
processes on the
CP
4. Problem definition
●
All CP modules are communicating with each other IPC
●
Control plane / Data plan communication happens over
high speed network link
●
Line cards can interact with other line cards or Control
cards.
●
●
●
And the router crashes ???
5. Things to look out for...
●
Is kernel, network driver alive, kernel log, crash dump.
●
see if there is a particular irq screaming in
/proc/interrupts
●
/proc/sys/net/* : networking information
●
Check top, free output if any process is hogging cpu?
●
Check ps to see expected processes/threads are alive :
status of CP processes.
●
Try to get some info from /proc/net/nf_conntrack_stats
to see if a particular type of error packet is being
reported
●
Check firewall rules: iptables -L, ifconfig, route.
●
Kernel/Application log indicating any error:
/var/log/syslog
6. Going deeper...
●
Check files, sockets owned by each process.
●
cat /proc/$PID/* : fd, wchan
●
/proc/net/tcp, /proc/net/udp
●
netstat -apeen
●
lsof (-i for networking)
●
●
Socket Status [socket operation on non-socket]
●
●
Kernel modules to spit information on data structures
like task_struct, struct netdevice
7. Knowing the full stack...
●
In order to understand the complete, kernel knowledge
is necessary.
– User space applications
– Threads, socket types
– Kernel interface through system calls
– TCP IP stack inside the kernel
– Interaction with network device driver.
– And the kernel subsystems.
11. Socket() in kernel
●
For every socket which is created by a userspace
application, there is a corresponding socket struct and
sock struct in the kernel
int socket (int family, int type, int protocol);
●
SOCK_STREAM : TCP, SOCK_DGRAM: UDP, SOCK_RAW.
●
This system call eventually invokes the sock_create() method in the
kernel.
●
struct socket { /* ONLY important members */
socket_state state;
unsigned long flags;
struct fasync_struct *fasync_list;
wait_queue_head_t wait;
struct file *file;
struct sock *sk;
const struct proto_ops *ops;
}
13. bind() in kernel
●
int bind(int sockfd, const struct sockaddr
*addr, socklen_t addrlen);
●
This system call eventually invokes the inet_bind() method in the
kernel.
●
The bind system call associates a local network transport address with a
socket. For a client process, it is not mandatory to issue a bind call. The
kernel takes care of doing an implicit binding when the client process
issues the connect system call.
●
The kernel function sys_bind() does the following:
– sock = sockfd_lookup_light(fd, &err, &fput_needed);
– sock->ops->bind(sock, (struct sockaddr *)address, addrlen);
●
Point to note:
– Binding to unprivileged ports (<1024)
14. listen() in kernel
●
int listen(int sockfd, int backlog);
●
backlog argument defines the maximum length to which the queue of
pending connections for sockfd may grow.
●
Linux uses two queues, a SYN queue (or incomplete connection queue)
and an accept queue (or complete connection queue). Connections in
state SYN RECEIVED are added to the SYN queue and later moved to the
accept queue when their state changes to ESTABLISHED, i.e. when the
ACK packet in the 3-way handshake is received. As the name implies, the
accept call is then implemented simply to consume connections from the
accept queue. In this case, the backlog argument of the listen syscall
determines the size of the accept queue.
●
SYN queue with a size specified by a system wide setting.
– /proc/sys/net/ipv4/tcp_max_syn_backlog.
●
accept queue with a size specified by the application.
●
Implementation is in inet_listen() kernel function.
15. connect() in kernel
●
int connect(int sockfd, const struct sockaddr
*addr, socklen_t addrlen);
●
Calls inet_autobind() to use the available source port as needed.
●
Fills destination in inet_sock and calls ipv4_stream_connect or
ipv4_datagram_connect (for IPV4).
●
Routing is done by ip_route_connect function (L3)
16. accept() in kernel
●
int accept(int sockfd, struct sockaddr *addr,
socklen_t *addrlen);
●
This system call eventually invokes the inet_accept() method in the
kernel.
21. close() in kernel
●
int shutdown(int sockfd, int how);
●
int close(int sockfd);
●
Shutdown can bring down the connection in half duplex mode. At the
point, the queues associated with socket are not purged. Hence, it is
necessary to call the close() function.
23. Socket Data Structures
●
For every socket which is created by a user space application, there
is a corresponding struct socket and struct sock in the kernel.
●
●
struct socket: include/linux/net.h
– Data common to the BSD socket layer
– Has only 8 members
– Any variable “sock” always refers to a struct socket
●
struct sock : include/net/sock.h
– Data common to the Network Protocol layer (i.e., AF_INET)
– Any variable “sk” always refers to a struct sock.
24. AF Interface
●
Main data structures
– struct net_proto_family
– struct proto_ops
●
Key function
sock_register(struct net_proto_family *ops)
●
Each address family:
– Implements the struct net _proto_family.
– Calls the function sock_register( ) when the protocol family is
initialized.
– Implement the struct proto_ops for binding the BSD socket
layer and protocol family layer.
32. sk_buff
●
Kernel buffer that stores packets.
– Contains headers for all network layers.
●
Creation
– Application sends data to socket.
– Packet arrives at network interface.
●
Copying
– Copied from user/kernel space.
– Copied from kernel space to NIC.
– Send: appends headers via skb_reserve().
– Receive: moves ptr from header to header.
33. sk_buff (cont...)
●
sk_buff represents data and headers.
●
sk_buff API (examples)
– sk_buff allocation is done with alloc_skb() or
dev_alloc_skb();
– drivers use dev_alloc_skb();
– (free by kfree_skb() and dev_kfree_skb().
●
unsigned char* data : points to the current header.
●
skb_pull(int len) – removes data from the start of a
buffer by
advancing data to data+len and by decreasing len.
●
Almost always sk_buff instances appear as “skb” in the
kernel code
35. sk_buff functions
●
skb_reserve()
Prototype
void skb_reserve(struct sk_buff *skb, unsigned int len);
Description
adjust headroom. Used to make reservation for the header. When
setting up receive packets that an ethernet device will DMA into,
skb_reserve(skb, NET_IP_ALIGN) is called. This makes it so that, after the
ethernet header, the protocol header will be aligned on at least a 4-byte
boundary
36. sk_buff functions
●
skb_push()
Prototype
unsigned char *skb_push(struct sk_buff *skb, unsigned int
len);
Description
add data to the start of a buffer. skb_push() decrements 'skb-
>data' and increments 'skb->len'. e.g. adding ethernet header
before IP, TCP header.
38. sk_buff functions
●
skb_put()
Prototype
unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
Description
add data to a buffer. skb_put() advances 'skb->tail' by the
specified number of bytes, it also increments 'skb->len' by that
number of bytes as well. Make sure, that enough tailroom is
available, else skb_over_panic()
40. Network device drivers
●
net_device registration
●
hard_start_xmit function pointer
●
Interrupt handler for packet reception
●
●
Bus Interaction (e.g. PCI)
●
NAPI context
41. net_device structure
●
net_device represents a network interface card.
●
It is used to represent physical or virtual devices. e.g.
loopback devices, bonding devices used for load
balancing or high availability.
●
Implemented using the private data of the
device (the void *priv member of net_device);
●
unsigned char* data : points to the current header.
●
skb_pull(int len) – removes data from the start of a
buffer by advancing data to data+len and by
decreasing len.
●
Almost always sk_buff instances appear as “skb” in the
kernel code
42. net_device structure (cont...)
●
unsigned int mtu – Maximum Transmission
Unit: the maximum size of frame the device
can handle.
●
unsigned int flags, dev_addr[6].
●
void *ip_ptr: IPv4 specific data. This pointer is
assigned to a pointer to in_device in
inetdev_init() (net/ipv4/devinet.c)
●
struct in_device: It contains a member named
cnf (which is instance of ipv4_devconf).
Setting /proc/sys/net/ipv4/conf/all/forwarding
43. Packet Transmission
●
TCP/IP stack calls dev_queue_xmit function to
queue the packet in the device queue.
●
The device driver has a Tx handler registered
as hard_start_xmit() function pointer.
●
This function transmits the packet over wire
or air and waits for completion callback.
●
This completion callback is generally used to
free the sk_buff associated with the packet.
44. Packet Transmission (cont...)
●
Handling of sending a packet is done by
ip_route_output_key().
●
Routing lookup also in the case of
transmission.
●
If the packet is for a remote host, set dst
>output to ip_output()
●
ip_output() will call ip_finish_output()
– This is the NF_IP_POST_ROUTING point
45. Packet Reception
●
When working in interrupt-driven model, the
nic registers an interrupt handler with the IRQ
with which the device works by calling
request_irq().
●
This interrupt handler will be called when a
frame is received.
●
The same interrupt handler will be called
when transmission of a frame is finished and
under other conditions like errors.
●
Interrupt handler should verify interrupt
cause
●
Control transferred to TCP/IP stack using
netif_rx() or netif_rx_ni()
46. Packet Reception (cont...)
●
Interrupt handler: sk_buff is allocated by
calling dev_alloc_skb() ; also eth_type_trans()
is called; It also advances the data pointer of
the sk_buff to point to the IP header using
skb_pull(skb, ETH_HLEN).
●
This interrupt handler will be called when a
frame is received.
●
The same interrupt handler will be called
when transmission of a frame is finished and
under other conditions like errors.
●
Interrupt handler should verify interrupt
cause.
48. Physical ( Ethernet ) [L1]
●
NIC generates an Interrupt Request ( IRQ )
●
The card driver is the Interrupt Service
Routine ( ISR ) - disables interrupts
– Allocates a new sk_buff structure
– Fetches packet data from card buffer to freshly
allocated sk_buff ( using DMA )
– Invokes netif_rx()
– When netif_rx() returns, the Interrupts are re-
enabled and the ISR is terminated
49. The picture:
Ethernet Driver
Low Lever Pkt Rx
Deferred pkt rcptn
Other Layer 3 Proc AF_INET ( IP ) AF_PACKET
TCP Processing UDP ICMP
Socket Level
Receiving Process
netif_rx()
net_rx_action()
packet_rcv()ip_rcv()*_rcv()
tcp_rcv() udp_rcv() icmp_rcv()
data_ready()
wake_up_interruptible()
Journey of a packet
50. TCP/IP stack
●
Minimize copying
●
Zero copy technique
●
Page remapping
●
Branch optimization
●
Avoid process migration or cache misses
●
Avoid dynamic assignment of interrupts
to different CPUs
●
Combine Operations within the same
layer to minimize passes to the data
52. Wifi Programming
Steps for programming the wireless extensions:
• Open a network socket.
(PF_INET, SOCK_DGRAM).
• Setup the wireless request using struct iwreq.
Set device name.
Set wireless request data.
Set subioctl_no.
• Invoke device ioctl.
• Wait for the response. [ Blocking Call ]
• Wireless events are received over netlink socket.
( PF_NETLINK )
53. Wifi kernel handling
Kernel space handling:
* When kernel ioctl handler transfers control to the ioctl from
the wireless device driver.
* The driver invokes appropriate wireless extension call based
on the ioctl command.
* The wireless extension call transfers control to wireless
firmware using special command interface over the
USB/SDIO/MMC bus.
* Wireless driver can receive events from firmware.
( e.g.Link_Loss Event)
54. Driver firmware interface
What is a firmware ?
* Firmware is wireless networking software that runs on the
wireless chipset.
* The wireless device driver downloads the firmware to the
wireless chipset, upon initialization.
* All low level wireless operations are performed by the
firmware software.
* It works in two modes
Synchronous Request, response protocol
Asynchronous Events from FW.
* The firmware resides in /lib/firmware/
e.g. /lib/firmware/iwl-3945.ucode
•
55. Need for NGW
●
Next Generation Wireless
●
Centralized control for all wireless work
●
Drivers implement small set of configuration
methods
●
Semantics as per flows in the IEEE specifications
●
Various modes of operation
Station, AP, Monitor, IBSS, WDS, Mesh, P2P
56. Mac80211, cfg80211
●
Mac80211 is Linux kernel subsytem
●
Implements shared code for soft MAC, half MAC
devices
●
Contains MLME (Media Access Control (MAC) Layer Management Entity)
Authenticate, Deauthenticate, Associate, Disassociate
Reassociate , Beacon , Probe
Cfg80211 is the layer between user space and
mac80211.