Introduction to socket programming nbv

Prof NB Venkateswarlu
B.Tech(SVU), M.Tech(IIT-K), Ph.D(BITS, Pilani), PDF(U of Leeds,UK)
ISTE Visiting Fellow 2010-11
AITAM, Tekkali

 A Small Dose of Questions To Know You
 Little Briefing about Unix Internals
 Recapitulation of What is Internet
 Variety of Addresses involved
 Socket Concepts
 Related System Calls
 Simple TCP Client and Server in action
 Simple UDP Client and Server in action
 What is DNS

 What is the Difference between Data
Communications and Computer Networks?.
 What is firmware?.
 Why do we need to split a message?.
 Why do we require so many levels of control?
(Is network system is reliable?)
 What are physical and logical addresses?.
 What is the conceptual difference between
DLL and NLL?.

 What is fork()?
 What is signal?
 What is Process and Thread?
 What is a device driver?.
 What is a daemon?.
 What is exec()
 What are locks?.

 A collection of
interconnected networks
 Networks: Different depts,
labs, etc.
 Router: node that connects
distinct networks
 Host: network endpoints
(computer, PDA, light
switch, …)
 Together, an independently
administered entity
◦ Enterprise, ISP, etc.
6
Internet[work]
EE ME
CS

 Many differences
between networks
◦ Address formats
◦ Performance –
bandwidth/latency
◦ Packet size
◦ Loss
rate/pattern/handling
◦ Routing
 How to translate
and inter-operate?
7
Internet[work]
802.3 Frame
relay
ATM

 Internet vs. internet
 The Internet: the interconnected set of
networks of the Internet Service Providers
(ISPs) and end-networks, providing data
communications services.
◦ Network of internetworks, and more
◦ About 17,000 different ISP networks make up the
Internet
◦ Many other “end” networks
◦ 100,000,000s of hosts
8

 Links can be
◦ Wired or wireless
9
Node Link Node

11
R
R
R
RRH
H
H
H
R
RH
R
Routers send packet
towards destination
H: Hosts
R: Routers

Because of Noise Conditions of the Channels
Noise is rated as: 1 in 105

13
Packets
Better Link Utilization

 Short bursts: buffer
 Buffer sizes varies from network to network. So,
fragmentation takes places
 What if buffer overflows?
◦ Packets dropped
◦ Sender adjusts rate until load = resources  “congestion
control”
14
Problem: Network Overload
Solution: Buffering and Congestion Control

15
Problem: Packet size
Solution: Fragment data across packets
• On Ethernet, max packet is 1.5KB
• Typical web page is 10KB
GETindex.html
GET index.html

 Implements an
agreement between
parties on how
communication
should take place
16
Friendly greeting
Muttered reply
Destination?
Madison
Thank you

 Each protocol offers interfaces
◦ One to higher-level protocols on the same end hosts
 Expects one from the layers on which it builds
 Interface characteristics, e.g. IP service model
◦ A “peer interface” to a counterpart on destinations
 Syntax and semantics of communications
 (Assumptions about) data formats
 Protocols build upon each other
◦ Adds value, improves functionality overall
 E.g., a reliable protocol running on top of IP
◦ Reuse, avoid re-writing
 E.g., OS provides TCP, so apps don’t have to rewrite
17

 Protocols are the key to interoperability.
◦ Networks are very heterogenous:
◦ The hardware/software of communicating parties are often
not built by the same vendor
◦ Yet they can communicate because they use the same
protocol
 Actually implementations could be different
 But must adhere to same specification
 Protocols exist at many levels.
◦ Application level protocols
◦ Protocols at the hardware level
18
Ethernet: 3com, etc.
Routers: cisco, juniper etc.
App: Email, AIM, IE etc.
Hardware/link
Network
Application

 One or more protocols implement the functionality
in a layer
◦ Only horizontal (among peers) and vertical (in a host)
communication
 Protocols/layers can be implemented and modified
in isolation
 Each layer offers a service to the higher layer, using
the services of the lower layer.
 “Peer” layers on different systems communicate via
a protocol.
◦ higher level protocols (e.g. TCP/IP, Appletalk) can run on
multiple lower layers
◦ multiple higher level protocols can share a single physical
network
19

20
Application
(plus
libraries)
TCP/UDP
IP
Data link
Physical
Application
Presentation
Session
Transport
Network
Data link
Physical

21
FTP HTTP TFTPNV
TCP UDP
IP
NET1 NET2 NETn… Network protocols implemented by a
comb of hw and sw.
Interconnection of n/w technologies
into a single logical n/w
Two transport protocols: provide
logical channels to apps
App protocols
Note: No strict layering.
App writers can define apps that run on any lower level protocols.

22
UDP TCP
Data Link
Physical
Applications
The Hourglass Model
Waist
The waist: minimal, carefully chosen functions.
Facilitates interoperability and rapid evolution
FTP HTTP TFTPNV
TCP UDP
IP
NET1 NET2 NETn…

23
Bridge/Switch Router/GatewayHost Host
Application
Transport
Network
Link
Physical

24
Get index.html
Connection ID
Source/Destination
Link Address
User A User B
Header

 Multiple choices at each layer
 How to know which one to pick?
25
FTP HTTP TFTPNV
TCP UDP
IP
NET1 NET2 NETn…
TCP/UDPIP
Many
Networks

 Multiple
implementations of each
layer
◦ How does the receiver
know what version/module
of a layer to use?
 Packet header includes a
demultiplexing field
◦ Used to identify the right
module for next layer
◦ Filled in by the sender
◦ Used by the receiver
 Multiplexing occurs at
multiple layers. E.g., IP,
TCP, …
26
IP
TCP
IP
TCP
V/HL TOS Length
ID Flags/Offset
TTL Prot. H. Checksum
Source IP address
Destination IP address
Options..

TCP
 Reliable – guarantee
delivery
 Byte stream – in-order
delivery
 Checksum for validity
 Setup connection followed
by data transfer
27
Telephone Call
• Guaranteed delivery
• In-order delivery
• Setup connection followed
by conversation
Example TCP applications
Web, Email, Telnet

28
Example UDP applications
Multimedia, voice over IP
UDP
• No guarantee of delivery
• Not necessarily in-order
delivery
• No validity guaranteed
• Must address each
independent packet
Postal Mail
• Unreliable
• Not necessarily in-order
delivery
• Must address each reply

29
no loss
no loss
no loss
loss-tolerant
loss-tolerant
loss-tolerant
no loss
elastic
elastic
elastic
audio: 5Kb-1Mb
video:10Kb-5Mb
same as above
few Kbps
elastic
no
no
no
yes, 100’s msec
yes, few secs
yes, 100’s msec
yes and no
file transfer
e-mail
web documents
real-time audio/
video
stored audio/video
interactive games
financial apps
Application Data loss Bandwidth Time Sensitive

Byte Order
Different computers may have different internal representation
of 16 / 32-bit integer (called host byte order).
Examples
Big-Endian byte order (e.g., used by Motorola 68000):
Little-Endian byte order (e.g., used by Intel 80x86):

◦ TCP/IP specifies a network byte order which is the big-
endian byte order.
◦ For some WinSock functions, their arguments (i.e., the
parameters to be passed to these functions) must be
stored in network byte order.
◦ WinSock provides functions to convert between host
byte order and network byte order:
32

Processes
36
• A process has
 text: machine instructions
(may be shared by other processes)
 data
 stack
• Process may execute either in user mode or in kernel
mode.
• Process information are stored in two places:
 Process table
 User table

User mode and Kernel mode
37
• At any given instant a computer running the Unix system
is either executing a process or the kernel itself is running
• The computer is in user mode when it is executing
instructions in a user process and it is in kernel mode
when it is executing instructions in the kernel.
• Executing System call ==> User mode to Kernel mode
perform I/O operations
system clock interrupt

Process Table
38
• Process table: an entry in process table has the following
information:
 process state:
A. running in user mode or kernel mode
B. Ready in memory or Ready but swapped
C. Sleep in memory or sleep and swapped
 PID: process id
 UID: user id
 scheduling information
 signals that is sent to the process but not yet handled
 a pointer to per-process-region table
• There is a single process table for the entire system

User Table (u area)
39
• Each process has only one private user table.
• User table contains information that must be accessible
while the process is in execution.
 A pointer to the process table slot
 parameters of the current system call, return values
error codes
 file descriptors for all open files
 current directory and current root
 process and file size limits.
• User table is an extension of the process table.

40
u area
Active process
resident
swappable
data
stack
text
Process
table
Per-process
region table
Region
table
Kernel
address
space
user
address
space

Shared Program Text and
Software Libraries
41
• Many programs, such as shell, are often being
executed by several users simultaneously.
• The text (program) part can be shared.
• In order to be shared, a program must be compiled using
a special option that arranges the process image so that
the variable part(data and stack) and the fixed part (text)
are cleanly separated.
• An extension to the idea of sharing text is sharing
libraries.
• Without shared libraries, all the executing programs
contain their own copies.

42
Active process
data
stack
text
Process
table
Per-process
region table
Region
table
data
stack
text
Reference
count = 2

System Call
43
• A process accesses system resources through system call.
• System call for
 Process Control:
fork: create a new process
wait: allow a parent process to synchronize its
execution with the exit of a child process.
exec: invoke a new program.
exit: terminate process execution
 File system:
File: open, read, write, lseek, close
inode: chdir, chown chmod, stat fstat
others: pipe dup, mount, unmount, link, unlink

System call: fork()
44
• fork: the only way for a user to create a process in Unix
operating system.
• The process that invokes fork is called parent process
and the newly created process is called child process.
• The syntax of fork system call:
newpid = fork();
• On return from fork system call, the two processes have
identical copies of their user-level context except for the
return value pid.
• In parent process, newpid = child process id
• In child process, newpid = 0;

45
/* forkEx1.c */
#include <stdio.h>
main()
{
int fpid;
printf("Before forking ...n");
fpid = fork();
if (fpid == 0) {
printf("Child Process fpid=%dn", fpid);
} else {
printf("Parent Process fpid=%dn", fpid);
}
printf("After forking fpid=%dn", fpid);
}
$ cc forkEx1.c -o forkEx1
$ forkEx1
Before forking ...
Child Process fpid=0
After forking fpid=0
Parent Process fpid=14707
$

46
/* forkEx2.c */
#include <stdio.h>
main()
{
int fpid;
system("ps");
fpid = fork();
system("ps");
printf("After forking
fpid=%dn", fpid);
}
$ forkEx2
Before forking ...
PID TTY TIME CMD
14759 pts/9 0:00 tcsh
14778 pts/9 0:00 sh
14777 pts/9 0:00 forkEx2
PID TTY TIME CMD
14781 pts/9 0:00 sh
14759 pts/9 0:00 tcsh
14782 pts/9 0:00 sh
14780 pts/9 0:00 forkEx2
14777 pts/9 0:00 forkEx2
$ PID TTY TIME CMD
14781 pts/9 0:00 sh
14759 pts/9 0:00 tcsh
14780 pts/9 0:00 forkEx2
$ ps
PID TTY TIME CMD
14759 pts/9 0:00 tcsh
$

System Call: getpid() getppid()
47
• Each process has a unique process id (PID).
• PID is an integer, typically in the range 0 through 65535.
• Kernel assigns the PID when a new process is created.
• Processes can obtain their PID by calling getpid().
• Each process has a parent process and a corresponding
parent process ID.
• Processes can obtain their parent’s PID by calling
getppid().

48
/* pid.c */
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
main()
{
printf("pid=%d ppid=%dn",getpid(), getppid());
}
$ cc pid.c -o pid
$ pid
pid=14935
ppid=14759
$

49
/* forkEx3.c */
#include <stdio.h>
#include <unistd.h>
main()
{
int fpid;
if((fpid = fork())== 0) {
printf("Child Process fpid=%d pid=%d ppid=%dn",
fpid, getpid(), getppid());
} else {
printf("Parent Process fpid=%d pid=%d ppid=%dn",
}
printf("After forking fpid=%d pid=%d ppid=%dn",
}

50
$ forkEx3
Before forking ...
Parent Process fpid=14942 pid=14941 ppid=14759
After forking fpid=14942 pid=14941 ppid=14759
$ Child Process fpid=0 pid=14942 ppid=1
$ ps
PID TTY TIME CMD
14759 pts/9 0:00 tcsh

System Call: wait()
51
• wait system call allows a parent process to wait
for the demise of a child process.
• See forkEx4.c

52
#include <stdio.h>
#include <unistd.h>
main()
{
int fpid, status;
fpid = fork();
if (fpid == 0) {
} else {
}
wait(&status);
}

53
$ forkEx4
Before forking ...
Child Process fpid=0 pid=14980 ppid=14979
$

System Call: exec()
54
• exec() system call invokes another program by replacing
the current process
• No new process table entry is created for exec() program.
Thus, the total number of processes in the system isn’t
changed.
• Six different exec functions:
execlp, execvp, execl, execv, execle, execve,
(see man page for more detail.)
• exec system call allows a process to choose its successor.

55
int execl(file_name, arg0 [, arg1, ..., argn], NULL)
char *file_name, *arg0, *arg1, ..., *argn;
int execv(file_name, argv)
char *file_name, *argv[];
int execle(file_name, arg0 [, arg1, ..., argn], NULL, envp)
char *file_name, *arg0, *arg1, ..., *argn, *envp[];
int execve(file_name, argv, envp)
char *file_name, *argv[], *envp[];
int execlp(file_name, arg0 [, arg1, ..., argn], NULL)
char *file_name, *arg0, *arg1, ..., *argn;
int execvp(file_name, argv)
char *file_name, *argv[];

56
/* execEx1.c */
#include <stdio.h>
#include <unistd.h>
main()
{
printf("Before execing ...n");
execl("/bin/date", "date", 0);
printf("After execn");
}
$ execEx1
Before execing ...
Sun May 9 16:39:17 CST 1999
$

57
/* execEx2.c */
#include <unistd.h>
#include <stdio.h>
main()
{
int fpid;
printf("Before execing ...n");
fpid = fork();
if (fpid == 0) {
execl("/bin/date", "date", 0);
}
printf("After exec and fpid=%dn",fpid);
}
$ execEx2
Before execing ...
After exec and fpid=14903
$ Sun May 9 16:47:08 CST 1999
$

Handling Signal
58
• A signal is a message from one process to another.
• Signal are sometime called “software interrupt”
• Signals usually occur asynchronously.
• Signals can be sent
A. by one process to anther (or to itself)
B. by the kernel to a process.
• Unix signals are content-free. That is the only thing that
can be said about a signal is “it has arrived or not”

Handling Signal
59
• Most signals have predefined meanings:
A. sighup (HangUp): when a terminal is closed, the
hangup signal is sent to every process in control terminal.
B. sigint (interrupt): ask politely a process to terminate.
C. sigquit (quit): ask a process to terminate and produce a
codedump.
D. sigkill (kill): force a process to terminate.
• See signEx1.c

60
#include <stdio.h>
#include <unistd.h>
main() {
int fpid, *status;
fpid = fork();
if (fpid == 0) {
for(;;); /* loop forever */
} else {
}
wait(status); /* wait for child process */
}

61
$ cc sigEx1.c -o sigEx1
$ sigEx1 &
Before forking ...
Child Process fpid=0 pid=14989 ppid=14988
$ ps
PID TTY TIME CMD
14988 pts/9 0:00 sigEx1
14759 pts/9 0:01 tcsh
14989 pts/9 0:09 sigEx1
$ kill -9 14989
$ ps
...

Scheduling Processes
62
• On a time sharing system, the kernel allocates the CPU to
a process for a period of time (time slice or time quantum)
preempts the process and schedules another one when
time slice expired, and reschedules the process to continue
execution at a later time.
• The scheduler use round-robin with multilevel feedback
algorithm to choose which process to be executed:
A. Kernel allocates the CPU to a process for a time slice.
B. preempts a process that exceeds its time slice.
C. feeds it back into one of the several priority queues.

Process Priority
63
swapper
wait for Disk IO
wait for buffer
wait for inode
...
wait for child exit
User level 0
User level 1
User level n
...
Kernel Mode
User Mode
ProcessesPriority Levels

Process Scheduling
(Unix System V)
64
• There are 3 processes A, B, C under the following
assumptions:
A. they are created simultaneously with initial priority 60.
B. the clock interrupt the system 60 times per second.
C. these processes make no system call.
D. No other process are ready to run
E. CPU usage calculation: CPU = decay(CPU) = CPU/2
F. Process priority calculation: priority = CPU/2 + 60.
G. Rescheduling Calculation is done once per second.

65
Process A
Priority CPU count
Process B
Priority CPU count
Process C
Priority CPU count
60 0
…
60
75 30
67 15
63 7
…
67
76 33
60 0
60 0
…
60
75 30
67 15
63 7
...
60 0
60 0
60 0
…
60
75 30
67 15
1
2
3
4
0

Booting
66
• When the computer is powered on or rebooted, a short
built-in program (maybe store in ROM) reads the first
block or two of the disk into memory. These blocks
contain a loader program, which was placed on the disk
when disk is formatted.
• The loader is started. The loader searches the root
directory for /unix or /root/unix and load the file into
memory
• The kernel starts to execute.

The first processes
67
• The kernel initializes its internal data structures:
it constructs linked list of free inodes, regions, page table
• The kernel creates u area and initializes slot 0 of process
table
• Process 0 is created
• Process 0 forks, invoking the fork algorithm directly
from the Kernel. Process 1 is created.
• In kernel mode, Process 1 creates user-level context
(regions) and copy code (/etc/init) to the new region.
• Process 1 calls exec (executes init).

init process
68
• The init process is a process dispatcher:spawning
processes, allow users to login.
• Init reads /etc/inittab and spawns getty
• when a user login successfully, getty goes through a login
procedure and execs a login shell.
• Init executes the wait system call, monitoring the death
of its child processes and the death of orphaned processes
by exiting parent.

69
Init fork/exec
a getty progrma
to manage the line
Getty prints
“login:” message and
waits for someone
to login
The login process
prints the
password message,
read the password
then check the password
The shell runs
programs for the
user unitl the
user logs off
When the shell
dies, init wakes up
and fork/exec a
getty for the line

File Subsystem
70
• A file system is a collection of files and directories on
a disk or tape in standard UNIX file system format.
• Each UNIX file system contains four major parts:
A. boot block:
B. superblock:
C. i-node table:
D. data block: file storage

File System Layout
71
Block 0: bootstrap
Block 1: superblock
Block 2
Block n
...
Block n+1
The last Block
...
Block 2 - n:i-nodes
Block n+1 - last:Files

Boot Block
72
• A boot block may contains several physical blocks.
• Note that a physical block contains 512 bytes
(or 1K or 2KB)
• A boot block contains a short loader program for
booting
• It is blank on other file systems.

Superblock
73
• Superblock contains key information about a file system
• Superblock information:
A. Size of a file system and status:
label: name of this file system
size: the number of logic blocks
date: the last modification date of super block.
B. information of i-nodes
the number of i-nodes
the number of free i-nodes
C. information of data block: free data blocks.
• The information of a superblock is loaded into memory.

I-nodes
74
• i-node: index node (information node)
• i-list: the list of i-nodes
• i-number: the index of i-list.
• The size of an i-node: 64 bytes.
• i-node 0 is reserved.
• i-node 1 is the root directory.
• i-node structure: next page

75
I-node structure
mode
owner
timestamp
Size
Block count
Direct blocks
0-9
Double indirect
Triple indirect
Single indirect
Data block
Data block
Data block
Indirect block
...
Data block
Data block
Data block
...
Indirect block
Indirect block
Indirect block
...
Reference count

I-node structure
76
• mode: A. type: file, directory, pipe, symbolic link
B. Access: read/write/execute (owner, group,)
• owner: who own this I-node (file, directory, ...)
• timestamp: creation, modification, access time
• size: the number of bytes
• block count: the number of data blocks
• direct blocks: pointers to the data
• single indirect: pointer to a data block which
pointers to the data blocks (128 data blocks).
• Double indirect: (128*128=16384 data blocks)
• Triple indirect: (128*128*128 data blocks)

Data Block
77
• A data block has 512 bytes.
A. Some FS has 1K or 2k bytes per blocks.
B. See blocks size effect (next page)
• A data block may contains data of files or data of
a directory.
• File: a stream of bytes.
• Directory format:
i-# Next size File name pad

78
Report.txt
home
john
bin
find
alex jenny
notes
grep
i-# Next 10 Report.txt pad i-# Next 3
bin pad i-# Next 5 notes pad 0 Next

79
Boot Block
SuperBlock
i-node
i-node
i-node
i-node
...
...
...
Current Dir
Report.txt
source
notes
...
...
...
...
i-nodes
Data
Blocks
Report.txt
home
kc
source
find
alex
notes
grep
Device driver
&
Hardware
controlCurrent
directory
inode
u area
i-node
i-node
i-node
...
...
In-core
inodes

In-core inode table
80
• UNIX system keeps regular files and directories on block
devices such as disk or tape,
• Such disk space are called physical device address space.
• The kernel deals on a logical level with file system
(logical device address space) rather than with disks.
• Disk driver can transfer logical addresses into physical
device addresses.
• In-core (memory resident) inode table stores the
inode information in kernel space.

In-core inode table
81
• An in-core inode contains
A. all the information of inode in disks.
B. status of in-core inode
inode is locked,
inode data changed
file data changed.
C. the logic device number of the file system.
D. inode number
E. reference count

File table
82
• The kernel have a global data structure, called file table,
to store information of file access.
• Each entry in file table contains:
A. a pointer to in-core inode table
B. the offset of next read or write in the file
C. access rights (r/w) allowed to the opening process.
D. reference count.

User File Descriptor table
83
• Each process has a user file descriptor table to identify
all opened files.
• An entry in user file descriptor table pointer to an entry
of kernel’s global file table.
• Entry 0: standard input
• Entry 1: standard output
• Entry 2: error output

System Call: open
84
• open: A process may open a existing file to read or write
• syntax:
fd = open(pathname, mode);
A. pathname is the filename to be opened
B. mode: read/write
• Example

85
#include <stdio.h>
#include <fcntl.h>
main()
{
int fd1, fd2, fd3;
printf("Before open ...n");
fd1 = open("/etc/passwd", O_RDONLY);
fd2 = open("./openEx1.c", O_WRONLY);
printf("fd1=%d fd2=%d fd3=%d n", fd1, fd2, fd3);
}
$ cc openEx1.c -o openEx1
$ openEx1
Before open ...
fd1=3 fd2=4 fd3=5
$

86
…
CNT=2
/etc/passwd
CNT=1
./openEx2.c
in-core
inodes
Pointer to
Descriptor table
U area
User file
descriptor
table
0
1
2
3
4
5
6
7
.
.
.
...
...
CNT=1 R
CNT=1 W
...
CNT=1 R
file table
...
...
...

System Call: read
87
• read: A process may read an opened file
• syntax:
fd = read(fd, buffer, count);
A. fd: file descriptor
B. buffer: data to be stored in
C. count: the number (count) of byte
• Example

88
#include <stdio.h>
#include <fcntl.h>
main()
{
int fd1, fd2, fd3;
char buf1[20], buf2[20];
buf1[19]='0';
buf2[19]='0';
printf("=======n");
read(fd1, buf1, 19);
printf("fd1=%d buf1=%s n",fd1, buf1);
printf("=======n");
}
$ openEx2
=======
fd1=3 buf1=root:x:0:1:Super-Us
fd1=3 buf2=er:/:/sbin/sh
daemo
=======
$

89
#include <stdio.h>
#include <fcntl.h>
main()
{
int fd1, fd2, fd3;
buf1[19]='0';
buf2[19]='0';
printf("======n");
printf("======n");
}
$ openEx3
======
======
$

90
…
CNT=2
/etc/passwd
...
in-core
inodes
Descriptor
table
U area
User file
descriptor
table
0
1
2
3
4
5
6
7
.
.
.
...
...
CNT=1 R
...
...
CNT=1 R
file table
...
...
...

System Call: dup
91
• dup: copy a file descriptor into the first free slot of the
user file descriptor table.
• syntax:
newfd = dup(fd);
Example

92
#include <stdio.h>
#include <fcntl.h>
main()
{
int fd1, fd2, fd3;
buf1[19]='0';
buf2[19]='0';
printf("======n");
fd2 = dup(fd1);
printf("======n"); char buf1[20], buf2[20];
}
$ openEx4
======
fd2=4 buf2=er:/:/sbin/sh
daemo
======
$

93
…
CNT=1
/etc/passwd
...
in-core
inodes
Descriptor
table
U area
User file
descriptor
table
0
1
2
3
4
5
6
7
.
.
.
...
...
CNT=2 R
...
...
...
file table
...
...
...

System Call: creat
94
• creat: A process may create a new file by creat system
call
• syntax:
fd = write(pathname, mode);
A. pathname: file name
B. mode: read/write
Example

System Call: close
95
• close: A process may close a file by close system
call
• syntax:
close(fd);
Example

System Call: write
96
• write: A process may write data to an opened file
• syntax:
fd = write(fd, buffer, count);
B. buffer: data to be stored in
C. count: the number (count) of byte
• Example

97
/* creatEx1.c */
#include <stdio.h>
#include <fcntl.h>
main()
{
int fd1;
char *buf1="I am a stringn";
char *buf2="second linen";
printf("======n");
fd1 = creat("./testCreat.txt", O_WRONLY);
write(fd1, buf1, 20);
write(fd1, buf2, 30);
close(fd1);
chmod("./testCreat.txt", 0666);
printf("======n");
}

98
$ cc creatEx1.c -o creatEx1
$ creatEx1
======
fd1=3 buf1=I am a string
======
$ ls -l testCreat.txt
-rw-rw-rw- 1 cheng staff 50 May 10 20:37 testCreat.txt
$ more testCreat.txt
...

System Call: stat/fstat
99
• stat/fstat: A process may query the status of a file (locked)
file type, file owner, access permission. file size, number
of links, inode number, access time.
• syntax:
stat(pathname, statbuffer); fstat(fd, statbuffer);
B. statbuffer: read in data
C. fd: file descriptor
Example

100
/* statEx1.c */
#include <sys/stat.h>
main()
{
int fd1, fd2, fd3;
struct stat bufStat1, bufStat2;
printf("======n");
fd2 = open("./statEx1", O_RDONLY);
fstat(fd1, &bufStat1); fstat(fd2, &bufStat2);
printf("fd1=%d inode no=%d block size=%d blocks=%dn",
fd1, bufStat1.st_ino,bufStat1.st_blksize, bufStat1.st_blocks);
printf("fd2=%d inode no=%d block size=%d blocks=%dn",
fd2, bufStat2.st_ino,bufStat2.st_blksize, bufStat2.st_blocks);
printf("======n");
}

101
$ cc statEx1.c -o statEx1
$ statEx1
======
fd1=3 inode no=21954 block size=8192 blocks=6
fd2=4 inode no=190611 block size=8192 blocks=
======
...

System Call: link/unlink
102
• link: hardlink a file to another
• syntax:
link(sourceFile, targetFile); unlink(file)
A. sourceFile targetFile, file: file name
Example:
Lab exercise: write a c program which use link/unlink
system call. Use ls -l to see the reference count.

System Call: chdir
103
• chdir: A process may change the current directory
of a processl
• syntax:
chdir(pathname);
Example

104
#include <stdio.h>
#include <fcntl.h>
main()
{
chdir("/usr/bin");
system("ls -l");
}
$ ls -l /usr/bin
$

 pipe(int a[])
 FILE* popen(char *command, char *mode)
 pclose(FILE*)
 mknod(char *, S_IFIFO|0644, 0)
 mknod filename p
 mkfifo filename

Signal Description
SIGABRT Process abort signal.
SIGALRM Alarm clock.
SIGFPE Erroneous arithmetic operation.
SIGHUP Hangup.
SIGILL Illegal instruction.
SIGINT Terminal interrupt signal.
SIGKILL Kill (cannot be caught or ignored).
SIGPIPE Write on a pipe with no one to read it.
SIGQUIT Terminal quit signal.
SIGSEGV Invalid memory reference.
SIGTERM Termination signal.
SIGUSR1 User-defined signal 1.
SIGUSR2 User-defined signal 2.
SIGCHLD Child process terminated or stopped.
SIGCONT Continue executing, if stopped.
SIGSTOP Stop executing (cannot be caught or ignored).
SIGTSTP Terminal stop signal.
SIGTTIN Background process attempting read.
SIGTTOU Background process attempting write.
SIGBUS Bus error.
SIGPOLL Pollable event.
SIGPROF Profiling timer expired.
SIGSYS Bad system call.
SIGTRAP Trace/breakpoint trap.
SIGURG High bandwidth data is available at a socket.
SIGVTALRM Virtual timer expired.
SIGXCPU CPU time limit exceeded.
SIGXFSZ File size limit exceeded.

int signal(int signo, void (*f)(int) );
Signal number
Handler

#include <stdio.h> /* standard I/O functions */
#include <unistd.h> /* standard unix functions, like getpid() */
#include <sys/types.h> /* various type definitions, like pid_t */
#include <signal.h> /* signal name macros, and the signal() prototype */
/* first, here is the signal handler */
void catch_int(int sig_num)
{
/* re-set the signal handler again to catch_int, for next time */
signal(SIGINT, catch_int);
/* and print the message */
printf("Don't do that");
fflush(stdout);
}
/* and somewhere later in the code.... */
/* set the INT (Ctrl-C) signal handler to 'catch_int' */
signal(SIGINT, catch_int);
/* now, lets get into an infinite loop of doing nothing. */
for ( ;; )
pause();
}

Signal sets
Signal sets are data types (structures) to represent multiple signals. The following functions
are used manipulate them.
int sigemptyset(sigset_t *set);
This function initializes the signal set pointed by set variable such that it contains no
signals in it.
int sigfillset(segset_t *set);
This function fills the signal set pointed by set variable such that it contains all signals in it.
int sigaddset(segset_t *set,int signo);
This function adds a signal (with signal number signo) to the signal set pointed by set
variable.
int sigdelset(segset_t *set,int signo);
This function removes a signal (with signal number signo) from the signal set pointed by set
variable.
int issigmember(segset_t *set,int signo);
This function checks a signal (with signal number signo) is in the signal set pointed by set
variable or not.
int sigpending(sigset_t *set);
This function returns the set of signals that are blocked from delivery and currently pending
to the signal set pointed by set variable.
int sigsuspend(sigset_t *set);
This function sets the signal mask of the process to the signal set pointed by set variable.
Also, the process is suspended until a
signal is caught or until a signal occurs that terminates the process.

SIG_BLOCK
SIG_UNBLOCK
SIG_SETMASK

struct sigaction{ void (*sa_handler)();
/*pointer to function or SIG_DFL or SIG_IGN*/
sigset_t sa_mask/ /*additional signal to be blocked during
execution of hander*/
int sa_flags; /*special flags and options*/}

#include <stdio.h>
#include <sys/ipc.h>
#include <sys/msg.h>
int main(int argc, char* argv[]){ /* create a private message queue, with access only to the owner. */
struct msgbuf* msg; struct msgbuf* recv_msg; int rc;
int queue_id = msgget(IPC_PRIVATE, 0600);
if (queue_id == -1) { perror("main: msgget"); exit(1); }
printf("message queue created, queue id '%d'.n", queue_id);
msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world"));
msg->mtype = 1;
strcpy(msg->mtext, "hello world");
rc = msgsnd(queue_id, msg, strlen(msg->mtext)+1, 0);
if (rc == -1) { perror("main: msgsnd"); exit(1); }
free(msg);
printf("message placed on the queue successfully.n");
recv_msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world"));
rc = msgrcv(queue_id, recv_msg, strlen("hello world")+1, 0, 0);
if (rc == -1) { perror("main: msgrcv"); exit(1); }
printf("msgrcv: received message: mtype '%d'; mtext '%s'n", recv_msg->mtype, recv_msg-
>mtext);
return 0;

11
5
FTP [21]
HTTP [80]
SMTP [25]
Telnet [23]
192.168.19.1
192.168.19.3
192.168.19.2
192.168.19.2 [21]192.168.19. [21]192.168.19.2[21]192.168.19.2 [21]
198.163.197.4
198.163.197.4 [x]
192.168.19.0
Internet

11
6
412-268-8000
ext.123
Central Number
Applications/Servers
Web
Port 80
Mail
Port 25
Exchange
Area Code
412-268-8000
ext.654
IP Address
Network No.
Host Number
Telephone No
15-441 Students Clients
Professors at CMU
Network ProgrammingTelephone Call
Port No.Extension

11
7
◦ Port numbers are used to identify
“entities” on a host
◦ Port numbers can be
 Well-known (port 0-1023)
 Dynamic or private (port 1024-65535)
◦ Servers/daemons usually use well-
known ports
 Any client can identify the
server/service
 HTTP = 80, FTP = 21, Telnet = 23, ...
 /etc/service defines well-known ports
◦ Clients usually use dynamic ports
 Assigned by the kernel at run time
TCP/UDP
IP
Ethernet Adapter
NTP
daemon
Web
server
port 123 port 80

Consider Railway Station
Counter 0: Platform
Tickets
Counter 1: Enquiries
Counter 2: Reservations
----
---
Counter 8: Current
Reservations

 Each host machine has an IP address
 When a packet arrives at a host
11
9
medellin.cs.columbia.edu
(128.59.21.14)
cluster.cs.columbia.edu
(128.59.21.14,
128.59.16.7, 128.59.16.5,
128.59.16.4)
newworld.cs.umass.edu
(128.119.245.93)

 Transfer file to/from remote host
 Client/server model
◦ Client: side that initiates transfer (either to/from remote)
◦ Server: remote host
 ftp: RFC 959
 ftp server: port 21
file transfer
FTP
server
FTP
user
interface
FTP
client
local file
system
remote file
system
user
at host

 Ftp client contacts ftp server
at port 21, specifying TCP as
transport protocol
 Two parallel TCP connections
opened:
◦ Control: exchange
commands, responses
between client, server.
“out of band control”
◦ Data: file data to/from server
FTP
client
FTP
server
TCP control connection
port 21
TCP data connection
port 20

The interface that the OS provides to its
networking subsystem
application layer
transport layer (TCP/UDP)
network layer (IP)
link layer (e.g. ethernet)
physical layer
application layer
transport layer (TCP/UDP)
network layer (IP)
link layer (e.g. ethernet)
physical layer
OS network
stack
Sockets as means for inter-process
communication (IPC)
Client Process Server Process
Socket
OS network
stack
Socket
Internet
Internet
Internet

 Address the machine on the network
◦ By IP address
 Address the process
◦ By the “port”-number
 The pair of IP-address + port – makes up a “socket-address”
Connection socket pair
(128.2.194.242:3479, 208.216.181.15:80)
Server
(port 80)
Client
Client socket address
128.2.194.242:3479
Server socket address
208.216.181.15:80
Client host address
128.2.194.242
Server host address
208.216.181.15
Note: 3479 is an
ephemeral port allocated
by the kernel
Note: 80 is a well-known port
associated with Web servers

 Examples of client programs
◦ Web browsers, ftp, telnet, ssh
 How does a client find the server?
◦ The IP address in the server socket address identifies the
host
◦ The (well-known) port in the server socket address
identifies the service, and thus implicitly identifies the
server process that performs that service.
◦ Examples of well known ports
 Port 7: Echo server
 Port 23: Telnet server
 Port 25: Mail server
 Port 80: Web server

Web server
(port 80)
Client host
Server host 128.2.194.242
Echo server
(port 7)
Service request for
128.2.194.242:80
(i.e., the Web server)
Web server
(port 80)
Echo server
(port 7)
Service request for
128.2.194.242:7
(i.e., the echo server)
Kernel
Kernel
Client
Client

 Servers are long-running processes
(daemons).
◦ Created at boot-time (typically) by the init process
(process 1)
◦ Run continuously until the machine is turned off.
 Each server waits for requests to arrive on a
well-known port associated with a particular
service.
◦ Port 7: echo server
◦ Port 23: telnet server
◦ Port 25: mail server
◦ Port 80: HTTP server
 Other applications should choose between 1024 and
65535
See /etc/services for a
comprehensive list of the
services available on a
Linux machine.

 What is a socket?
◦ To the kernel, a socket is an endpoint of communication.
◦ To an application, a socket is a file descriptor that lets the
application read/write from/to the network.
 Remember: All Unix I/O devices, including networks, are
modeled as files.
 Clients and servers communicate with each by
reading from and writing to socket descriptors.
 The main distinction between regular file I/O and
socket I/O is how the application “opens” the
socket descriptors.

 Endpoint Address
◦ Generic Endpoint Address
 The socket abstraction accommodates many protocol
families.
 It supports many address families.
 It defines the following generic endpoint address:
 ( address family, endpoint address in that family )
 Data type for generic endpoint address:
◦ TCP/IP Endpoint Address
 For TCP/IP, an endpoint address is composed of the
following items:
 Address family is AF_INET (Address Family for InterNET).
 Endpoint address in that family is composed of an IP
address and a port number.
12
8

 The IP address identifies a particular computer, while the
port number identifies a particular application running on
that computer.
 The TCP/IP endpoint address is a special instance of the
generic one:
 Port Number
 A port number identifies an application running on a
computer.
 When a client program is executed, WinSock randomly
chooses an unused port number for it.
 Each server program must have a pre-specified port
number, so that the client can contact the server.
12
9

 The port number is composed of 16 bits, and its possible
values are used in the following manner:
 0 - 1023: For well-known server applications.
 1024 - 49151: For user-defined server applications
(typical range to be used is 1024 - 5000).
 49152 - 65535: For client programs.
 Port numbers for some well-known server applications:
 WWW server using TCP: 80
 Telnet server using TCP: 23
 SMTP (email) server using TCP: 25
 SNMP server using UDP: 161.
13
0

131
Unix File Descriptor Table
Descriptor Table
0
1
2
3
4
Data structure for file 0
Standard input
Standard output
Standard error

132
Socket Descriptor Data Structure
Descriptor Table
0
1
2
3
4
Family: PF_INET
Service: SOCK_STREAM
Local IP: 111.22.3.4
Remote IP: 123.45.6.78
Local Port: 2249
Remote Port: 3726

 Hierarchical vs. flat
◦ Wisconsin / Madison / UW-Campus / Aditya
vs.
Aditya:123-45-6789
◦ Ethernet addresses are flat
 What information would routers need to route to Ethernet
addresses?
◦ Hierarchical structure crucial for designing scalable binding from interface
name to route
◦ Route to a general area, then to a specific location
 What type of Hierarchy?
◦ How many levels?
◦ Same hierarchy depth for everyone?
 Address broken in segments of increasing specificity
◦ Uniform for everybody: needs centralized management
◦ Non-uniform: more flexible, needs careful decentralized management
13
3

 Fixed length: 32 bits
 Total IP address size: 4 billion
 Initial class-ful structure (1981)
◦ Class A: 128 networks, 16M hosts
◦ Class B: 16K networks, 64K hosts
◦ Class C: 2M networks, 256 hosts
134

13
5
Network ID Host ID
Network ID Host ID
8 16
Class A
32
0
Class B 10
Class C 110
Multicast AddressesClass D 1110
Reserved for experimentsClass E 1111
24

 Address would specify prefix for forwarding table
◦ Simple lookup
 www.cmu.edu address 128.2.11.43
◦ Class B address – class + network is 128.2
◦ Lookup 128.2 in forwarding table
◦ Prefix – part of address that really matters for routing
 Forwarding table contains
◦ List of class+network entries
◦ A few fixed prefix lengths (8/16/24)
 Large tables
◦ 2 Million class C networks
13
6

 Original goal: network part would uniquely identify
a single physical network
 Inefficient address space usage
◦ Class A & B networks too big
 Also, very few LANs have close to 64K hosts
 Easy for networks to (claim to) outgrow class-C
◦ Each physical network must have one network number
 Routing table size is too high
 Need simple way to reduce the number of network
numbers assigned
◦ Subnetting: Split up single network address ranges
◦ Fizes routing table size problem, partially
137

 Add another “floating” layer to hierarchy
 Variable length subnet masks
◦ Could subnet a class B into several chunks
13
8
Network Host
Network HostSubnet
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0
Subnet
Mask

 Assume an organization was assigned
address 150.100 (class B)
 Assume < 100 hosts per subnet
(department)
 How many host bits do we need?
◦ Seven
 What is the network mask?
◦ 11111111 11111111 11111111 10000000
◦ 255.255.255.128
13
9

14
0
• Host configured with IP adress and subnet
mask
• Subnet number = IP (AND) Mask
• (Subnet number, subnet mask)  Outgoing
I/F
D = destination IP address
For each forwarding table entry (SN, SM  OI)
D1 = SM & D
if (D1 == SN)
Deliver on OI
Else
Forward to default router

 Address space depletion
◦ In danger of running out of classes A and B
◦ Why?
 Class C too small for most domains
 Very few class A – very careful about giving them out
 Class B poses greatest problem
◦ Class B sparsely populated
 But people refuse to give it back
14
1

 Allows arbitrary split between network & host part
of address
◦ Do not use classes to determine network ID
◦ Use common part of address as network number
◦ Allows handing out arbitrary sized chunks of address space
◦ E.g., addresses 192.4.16 - 192.4.31 have the first 20 bits in
common. Thus, we use these 20 bits as the network number
 192.4.16/20
 Enables more efficient usage of address space (and
router tables)
◦ Use single entry for range in forwarding tables
◦ Combine forwarding entries when possible
14
2

 Network is allocated 8 contiguous chunks
of 256-host addresses 200.10.0.0 to
200.10.7.255
◦ Allocation uses 3 bits of class C space
◦ Remaining 20 bits are network number, written
as 201.10.0.0/21
 Replaces 8 class C routing entries with 1
combined entry
◦ Routing protocols carry prefix with destination
network address
14
3

Network (network portion):
 Get allocated portion of ISP’s address space:
ISP's block 11001000 00010111 00010000 00000000
200.23.16.0/20
Organization 0 11001000 00010111 00010000 00000000
200.23.16.0/23
Organization 1 11001000 00010111 00010010 00000000
200.23.18.0/23
Organization 2 11001000 00010111 00010100 00000000
200.23.20.0/23
... ….. …. ….
Organization 7 11001000 00010111 00011110 00000000
200.23.30.0/23
14
4

 How does an ISP get block of addresses?
◦ From Regional Internet Registries (RIRs)
 ARIN (North America, Southern Africa), APNIC (Asia-
Pacific), RIPE (Europe, Northern Africa), LACNIC (South
America)
 How about a single host?
◦ Hard-coded by system admin in a file
◦ DHCP: Dynamic Host Configuration Protocol:
dynamically get address: “plug-and-play”
 Host broadcasts “DHCP discover” msg
 DHCP server responds with “DHCP offer” msg
 Host requests IP address: “DHCP request” msg
 DHCP server sends address: “DHCP ack” msg
14
5

14
6
Provider is given 201.10.0.0/21
201.10.0.0/22 201.10.4.0/24 201.10.5.0/24 201.10.6.0/23
Provider
CIDR implications:
Longest prefix match
Route aggregation

14
7
Receiver
Packet
R
Sender
2
3
4
1
2
3
4
1
2
3
4
1
R2
R3
R1
R
RR  3
R  4
R  3
R

14
8
Receiver
Packet
R1, R2, R3, R
Sender
2
3
4
1
2
3
4
1
2
3
4
1
R2
R3
R1
R2, R3, R
R3, R
R

14
9
Receiver
Packet
1,5 
3,7
Sender
2
3
4
1 1,7  4,2
2
3
4
1
2
3
4
1
2,2 
3,6
R2
R3
R1
5 7
2
6
• Network picks a path
• Assigns VC numbers for flow on each link
• Populates forwarding table
5 7
2
6

 Routing Gets Packet to Correct Local Network
◦ Based on IP address
◦ Router sees that destination address is of local machine
 Still Need to Get Packet to Host
◦ Using link-layer protocol
◦ Need to know hardware address
 Same Issue for Any Local Communication
◦ Find local machine, given its IP address
15
0
host host host
LAN 1
...
router
WAN
128.2.198.222
128.2.254.36
Destination = 128.2.198.222

◦ Diagrammed for Ethernet (6-byte MAC addresses)
 Low-Level Protocol
◦ Operates only within local network
◦ Determines mapping from IP address to hardware (MAC)
address
◦ Mapping determined dynamically
 No need to statically configure tables
 Only requirement is that each host know its own IP address
15
1
op
Sender MAC address
Sender IP Address
Target MAC address
Target IP Address
• op: Operation
– 1: request
– 2: reply
• Sender
– Host sending ARP
message
• Target
– Intended receiver of
message

 Requestor
◦ Fills in own IP and MAC address as “sender”
 Why include its MAC address?
 Mapping
◦ Fills desired host IP address in target IP address
 Sending
◦ Send to MAC address ff:ff:ff:ff:ff:ff
 Ethernet broadcast
15
2
op
Sender MAC address
Sender IP Address
Target MAC address
Target IP Address
• op: Operation
– 1: request
• Sender
– Host that wants to
determine MAC address
of another machine
• Target
– Other machine

 Responder becomes “sender”
◦ Fill in own IP and MAC address
◦ Set requestor as target
◦ Send to requestor’s MAC address
15
3
op
Sender MAC address
Sender IP Address
Target MAC address
Target IP Address
• op: Operation
– 2: reply
• Sender
– Host with desired IP
address
• Target
– Original requestor

 Host 128.2.209.100 when plugged into CS ethernet
 Dest 128.2.209.100  routing to same machine
 Dest 128.2.0.0  other hosts on same ethernet
 Dest 127.0.0.0  special loopback address
 Dest 0.0.0.0  default route to rest of Internet
◦ Main CS router: gigrouter.net.cs.cmu.edu (128.2.254.36)
15
4
Destination Gateway Genmask Iface
128.2.209.100 0.0.0.0 255.255.255.255 eth0
128.2.0.0 0.0.0.0 255.255.0.0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 lo
0.0.0.0 128.2.254.36 0.0.0.0 eth0

 IP address, netmask, gateway, hostname, etc., etc.
◦ Type by hand!!!
 IPv4 option 1: RARP (Reverse ARP)
◦ Data-link protocol
 Uses ARP format. New opcodes: “Request reverse”, “reply reverse”
◦ Send query: Request-reverse [ether addr], server responds with IP
 Used primarily by diskless nodes, when they first initialize, to
find their Internet address
 IPv4 option 2: DHCP
◦ Dynamic Host Configuration Protocol
◦ ARP is fine for assigning an IP, but is very limited
◦ DHCP can provide all the info necessary

 DHCPOFFER
◦ IP addressing information
◦ Boot file/server information (for network booting)
◦ DNS name servers
◦ Lots of other stuff - protocol is extensible; half of the options
reserved for local site definition and use.
DHCPDISCOVER - broadcast
DHCPOFFER
DHCPREQUEST
DHCPACK

 Lease-based assignment
◦ Clients can renew: Servers really should preserve this
information across client & server reboots.
 Provide host configuration information
◦ Not just IP address stuff.
◦ NTP servers, IP config, link layer config,…
 Use:
◦ Generic config for desktops/dial-in/etc.
 Assign IP address/etc., from pool
◦ Specific config for particular machines
 Central configuration management

Network Layer
4-
15
9
Goal: allow host to dynamically obtain its IP address from
network server when it joins network
Can renew its lease on address in use
Allows reuse of addresses (only hold address while connected an
“on”)
Support for mobile users who want to join network (more shortly)
DHCP overview:
◦ host broadcasts “DHCP discover” msg [optional]
◦ DHCP server responds with “DHCP offer” msg [optional]
◦ host requests IP address: “DHCP request” msg
◦ DHCP server sends address: “DHCP ack” msg

Network Layer
4-
16
0
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
B
E
DHCP
server
arriving DHCP
client needs
address in this
network

Network Layer
4-
16
1
DHCP server: 223.1.2.5 arriving
client
time
DHCP discover
src : 0.0.0.0, 68
dest.: 255.255.255.255,67
yiaddr: 0.0.0.0
transaction ID: 654
DHCP offer
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
transaction ID: 654
Lifetime: 3600 secs
DHCP request
src: 0.0.0.0, 68
dest:: 255.255.255.255, 67
yiaddrr: 223.1.2.4
transaction ID: 655
Lifetime: 3600 secs
DHCP ACK
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
transaction ID: 655
Lifetime: 3600 secs

DHCP: more than IP address
DHCP can return more than just allocated IP
address on subnet:
 address of first-hop router for client
 name and IP address of DNS sever
 network mask (indicating network versus host
portion of address)

DHCP: example
 connecting laptop needs its
IP address, addr of first-
hop router, addr of DNS
server: use DHCP
router
(runs DHCP)
DHCP
UDP
IP
Eth
Phy
DHCP
DHCP
DHCP
DHCP
DHCP
DHCP
UDP
IP
Eth
Phy
DHCP
DHCP
DHCP
DHCPDHCP
 DHCP request encapsulated
in UDP, encapsulated in IP,
encapsulated in 802.1
Ethernet
 Ethernet frame broadcast
(dest: FFFFFFFFFFFF) on LAN,
received at router running
DHCP server
 Ethernet demux’ed to IP
demux’ed, UDP demux’ed to
DHCP
168.1.1.1

 DCP server formulates
DHCP ACK containing
client’s IP address, IP
address of first-hop
router for client, name &
IP address of DNS server
router
(runs DHCP)
DHCP
UDP
IP
Eth
Phy
DHCP
DHCP
DHCP
DHCP
DHCP
UDP
IP
Eth
Phy
DHCP
DHCP
DHCP
DHCP
DHCP
 encapsulation of DHCP
server, frame forwarded
to client, demux’ing up to
DHCP at client
 client now knows its IP
address, name and IP
address of DSN server, IP
address of its first-hop
router
DHCP: example

DHCP: wireshark
output (home LAN)
Message type: Boot Reply (2)
Hardware type: Ethernet
Hardware address length: 6
Hops: 0
Transaction ID: 0x6b3a11b7
Seconds elapsed: 0
Bootp flags: 0x0000 (Unicast)
Client IP address: 192.168.1.101 (192.168.1.101)
Your (client) IP address: 0.0.0.0 (0.0.0.0)
Next server IP address: 192.168.1.1 (192.168.1.1)
Relay agent IP address: 0.0.0.0 (0.0.0.0)
Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)
Server host name not given
Boot file name not given
Magic cookie: (OK)
Option: (t=53,l=1) DHCP Message Type = DHCP ACK
Option: (t=54,l=4) Server Identifier = 192.168.1.1
Option: (t=1,l=4) Subnet Mask = 255.255.255.0
Option: (t=3,l=4) Router = 192.168.1.1
Option: (6) Domain Name Server
Length: 12; Value: 445747E2445749F244574092;
IP Address: 68.87.71.226;
IP Address: 68.87.73.242;
IP Address: 68.87.64.146
Option: (t=15,l=20) Domain Name = "hsd1.ma.comcast.net."
reply
Message type: Boot Request (1)
Hardware address length: 6
Hops: 0
Transaction ID: 0x6b3a11b7
Seconds elapsed: 0
Bootp flags: 0x0000 (Unicast)
Client IP address: 0.0.0.0 (0.0.0.0)
Your (client) IP address: 0.0.0.0 (0.0.0.0)
Next server IP address: 0.0.0.0 (0.0.0.0)
Relay agent IP address: 0.0.0.0 (0.0.0.0)
Server host name not given
Boot file name not given
Magic cookie: (OK)
Option: (t=53,l=1) DHCP Message Type = DHCP Request
Option: (61) Client identifier
Length: 7; Value: 010016D323688A;
Option: (t=50,l=4) Requested IP Address = 192.168.1.101
Option: (t=12,l=5) Host Name = "nomad"
Option: (55) Parameter Request List
Length: 11; Value: 010F03062C2E2F1F21F92B
1 = Subnet Mask; 15 = Domain Name
3 = Router; 6 = Domain Name Server
44 = NetBIOS over TCP/IP Name Server
……
request

 Serverless (“Stateless”). No manual config at all.
◦ Only configures addressing items, NOT other host things
 Use DHCP for such things
 Link-local address
◦ 1111 1110 10 :: 64 bit interface ID (usually from Ethernet
addr)
 (fe80::/64 prefix)
◦ Uniqueness test (“anyone using this address?”)
◦ Router contact (solicit, or wait for announcement)
 Contains globally unique prefix
 Usually: Concatenate this prefix with local ID -> globally
unique IPv6 ID

 DNS Design
 DNS Today
16
7

 Need naming to identify resources
 Once identified, resource must be located
 How to name resource?
◦ Naming hierarchy
 How do we efficiently locate resources?
◦ DNS: name  location (IP address)
 Challenge: How do we scale these to the
wide area?
16
8

Lookup a Central DNS?
 Single point of failure
 Traffic volume
 Distant centralized database
 Single point of update
 Doesn’t scale!
16
9

Why not use /etc/hosts?
 Original Name to Address Mapping
◦ Flat namespace
◦ Lookup mapping in /etc/hosts
◦ Downloaded regularly
 Count of hosts was increasing: machine per
domain  machine per user
◦ Many more downloads
◦ Many more updates
17
0

 Basically a wide-area distributed database of
name to IP mappings
 Goals:
◦ Scalability
◦ Decentralized maintenance
◦ Robustness
◦ Global scope
 Names mean the same thing everywhere
◦ Don’t need
 Atomicity
 Strong consistency
17
1

 Conceptually, programmers can view the
DNS database as a collection of millions of
host entry structures:
◦ in_addr is a struct consisting of 4-byte IP address
 Functions for retrieving host entries from
DNS:
◦ gethostbyname: query key is a DNS host name.
◦ gethostbyaddr: query key is an IP address.
17
2
/* DNS host entry structure */
struct hostent {
char *h_name; /* official domain name of host */
char **h_aliases; /* null-terminated array of domain names */
int h_addrtype; /* host address type (AF_INET) */
int h_length; /* length of an address, in bytes */
char **h_addr_list; /* null-terminated array of in_addr structs */
};

17
3
Identification
No. of Questions
No. of Authority RRs
Questions (variable number of answers)
Answers (variable number of resource records)
Authority (variable number of resource records)
Additional Info (variable number of resource records)
Flags
No. of Answer RRs
No. of Additional RRs
Name, type fields
for a query
RRs in response
to query
Records for
authoritative
servers
Additional
“helpful info that
may be used
12 bytes

 Identification
◦ Used to match up request/response
 Flags
◦ 1-bit to mark query or response
◦ 1-bit to mark authoritative or not
◦ 1-bit to request recursive resolution
◦ 1-bit to indicate support for recursive resolution
17
4

FOR IN class:
 Type=A
◦ name is hostname
◦ value is IP address
 Type=NS
◦ name is domain (e.g. foo.com)
◦ value is name of authoritative
name server for this domain
17
5
RR format: (class, name, value, type, ttl)
• DB contains tuples called resource records (RRs)
– Classes = Internet (IN), Chaosnet (CH), etc.
– Each class defines value associated with type
• Type=CNAME
– name is an alias name for
some “canonical” (the real)
name
– value is canonical name
• Type=MX
– value is hostname of
mailserver associated with
name

 Different kinds of mappings are possible:
◦ Simple case: 1-1 mapping between domain name and IP
addr:
 kittyhawk.cmcl.cs.cmu.edu maps to 128.2.194.242
◦ Multiple domain names maps to the same IP address:
 eecs.mit.edu and cs.mit.edu both map to 18.62.1.6
◦ Single domain name maps to multiple IP addresses:
 aol.com and www.aol.com map to multiple IP addrs.
◦ Some valid domain names don’t map to any IP address:
 for example: cs.wisc.edu
17
6

17
7
root (.)
edunet
org
ukcom
gwu ucb wisc cmu mit
cs ee
wail
• Each node in hierarchy
stores a list of names that
end with same suffix
• Suffix = path up tree
• E.g., given this tree, where
would following be stored:
• Fred.com
• Fred.edu
• Fred.wisc.edu
• Fred.cs.wisc.edu
• Fred.cs.cmu.edu

17
8
root
edunet
org
ukcom
ca
gwu ucb cmu bu mit
cs ece
cmcl
Single node
Subtree
Complete
Tree
• Zone = contiguous section
of name space
• E.g., Complete tree, single
node or subtree
• A zone has an associated
set of name servers
• Must store list of names
and tree links

 Zones are created by convincing owner
node to create/delegate a subzone
◦ Records within zone store multiple redundant
name servers
◦ Primary/master name server updated manually
◦ Secondary/redundant servers updated by zone
transfer of name space
 Zone transfer is a bulk transfer of the “configuration” of a
DNS server – uses TCP to ensure reliability
 Example:
◦ CS.WISC.EDU created by WISC.EDU administrators
◦ Who creates WISC.EDU or .EDU?
17
9

 Responsible for
“root” zone
 Approx. 13 root
name servers
worldwide
◦ Currently {a-
m}.root-servers.net
 Local name servers
contact root servers
when they cannot
resolve a name
◦ Configured with
well-known root
servers
18
0

 Each host has a resolver
◦ Typically a library that applications can link to
◦ Resolves contacts name server
◦ Local name servers hand-configured (e.g.
/etc/resolv.conf)
 Name servers
◦ Either responsible for some zone or…
◦ Local servers
 Do lookup of distant host names for local hosts
 Typically answer queries about local zone
18
1

 Steps for resolving www.wisc.edu
◦ Application calls gethostbyname() (RESOLVER)
◦ Resolver contacts local name server (S1)
◦ S1 queries root server (S2) for (www.wisc.edu)
◦ S2 returns NS record for wisc.edu (S3)
◦ What about A record for S3?
 This is what the additional information section is for
(PREFETCHING)
◦ S1 queries S3 for www.wisc.edu
◦ S3 returns A record for www.wisc.edu
 Can return multiple A records  what does this
mean?
18
2

Recursive query:
 Server goes out and
searches for more info
(recursive)
 Only returns final
answer or “not found”
Iterative query:
 Server responds with as
much as it knows
(iterative)
 “I don’t know this name,
but ask this server”
Workload impact on
choice?
 Local server typically
does recursive
 Root/distant server
does iterative
18
3
requesting host
surf.eurecom.fr
gaia.cs.umass.edu
root name server
local name server
dns.eurecom.fr
1
2
3
4
5 6authoritative name
server
dns.cs.umass.edu
intermediate name server
dns.umass.edu
7
8
iterated query

 Are all servers/names likely to be equally popular?
◦ Why might this be a problem? How can we solve this problem?
 DNS responses are cached
◦ Quick response for repeated translations
◦ Other queries may reuse some parts of lookup
 NS records for domains
 DNS negative queries are cached
◦ Don’t have to repeat past mistakes
◦ E.g. misspellings, search strings in resolv.conf
 Cached data periodically times out
◦ Lifetime (TTL) of data controlled by owner of data
◦ TTL passed with every record
18
4

18
5
Client
resolver
Local
DNS server
root & edu
DNS server
ns1.wisc.edu
DNS server
www.cs.wisc.edu
ns1.cs.wisc.edu
DNS
server

18
6
Client
Local
DNS server
root & edu
DNS server
wisc.edu
DNS server
cs.wisc.edu
DNS
server
ftp.cs.wisc.edu

 DNS servers are replicated
◦ Name service available if ≥ one replica is up
◦ Queries can be load balanced between replicas
 UDP used for queries
◦ Need reliability  must implement this on top of UDP!
◦ Why not just use TCP?
 Try alternate servers on timeout
◦ Exponential backoff when retrying same server
 Same identifier for all queries
◦ Don’t care which server responds
18
7

 Task
◦ Given IP address, find its name
◦ When is this needed?
 Method
◦ Maintain separate hierarchy based
on IP names
◦ Write 128.2.194.242 as
242.194.2.128.in-addr.arpa
 Why is the address reversed?
 Managing
◦ Authority manages IP addresses
assigned to it
◦ E.g., CMU manages name space
2.128.in-addr.arpa
18
8
edu
cmu
cs
kittyhawk
128.2.194.242
cmcl
unnamed root
arpa
in-addr
128
2
194
242

 Name servers can add additional data to
response
 Typically used for prefetching
◦ CNAME/MX/NS typically point to another host name
◦ Responses include address of host referred to in
“additional section”
18
9

 Generic Top Level Domains (gTLD) = .com,
.net, .org, etc…
 Country Code Top Level Domain (ccTLD) =
.us, .ca, .fi, .uk, etc…
 Root server ({a-m}.root-servers.net) also
used to cover gTLD domains
◦ Load on root servers was growing quickly!
◦ Moving .com, .net, .org off root servers was
clearly necessary to reduce load  done Aug
2000
19
0

 .info  general info
 .biz  businesses
 .aero  air-transport industry
 .coop  business cooperatives
 .name  individuals
 .pro  accountants, lawyers, and physicians
 .museum  museums
 Only new one actives so far = .info, .biz,
.name
19
1

 No centralized caching per site
◦ Each machine runs own caching local server
◦ Why is this a problem?
◦ How many hosts do we need to share cache?  recent studies suggest
10-20 hosts
 Hit rate for DNS = 80%  1 - (#DNS/#connections)
◦ Is this good or bad?
 Most Internet traffic is Web
◦ What does a typical page look like?  average of 4-5 imbedded
objects  needs 4-5 transfers
◦ This alone accounts for 80% hit rate!
 Lower TTLs for A records does not affect performance
 DNS performance really relies more on NS-record caching
19
2

Socket API
 introduced in BSD4.1 UNIX,
1981
 explicitly created, used,
released by apps
 client/server paradigm
 two types of transport
service via socket API:
◦ unreliable datagram
◦ reliable, byte stream-
oriented
19
4
a host-local,
application-created,
OS-controlled interface
(a “door”) into which
application process can
both send and
receive messages
to/from another
application process
socket
Goal: learn how to build client/server application that communicate
using sockets

19
5
TCP/UDP
IP
Ethernet Adapter
Server
TCP/UDP
IP
Ethernet Adapter
Clients
Server and Client exchange messages over the network through a
common Socket API
Socket API
hardware
kernel
space
user
spaceports

Socket: a door between application process and
end-end-transport protocol (UDP or TCP)
TCP service: reliable transfer of bytes from one
process to another
19
6
process
TCP with
buffers,
variables
socket
controlled by
application
developer
controlled by
operating
system
host or
server
process
TCP with
buffers,
variables
socket
controlled by
application
developer
controlled by
operating
system
host or
server
internet

Client must contact server
 server process must first
be running
 server must have created
socket (door) that
welcomes client’s contact
Client contacts server by:
 creating client-local TCP
socket
 specifying IP address, port
number of server process
 When client creates socket:
client TCP establishes
connection to server TCP
 When contacted by client,
server TCP creates new
socket for server process to
communicate with client
◦ allows server to talk with
multiple clients
◦ source port numbers used
to distinguish clients
(more in Chap 3)
19
7
TCP provides reliable, in-order
transfer of bytes (“pipe”)
between client and server
application viewpoint

 A stream is a sequence of
characters that flow into or
out of a process.
 An input stream is attached
to some input source for
the process, eg, keyboard
or socket.
 An output stream is
attached to an output
source, eg, monitor or
socket.
19
8

Example client-server
app:
1) client reads line from
standard input (inFromUser
stream) , sends to server via
socket (outToServer
stream)
2) server reads line from
socket
3) server converts line to
uppercase, sends back to
client
4) client reads, prints
modified line from socket
(inFromServer stream)
19
9
outToServer
to network from network
inFromServer
inFromUser
keyboard monitor
Process
clientSocket
input
stream
input
stream
output
stream
TCP
socket
Client
process
client TCP
socket

20
0
wait for incoming
connection request
connectionSocket =
welcomeSocket.accept()
create socket,
port=x, for
incoming request:
welcomeSocket =
ServerSocket()
create socket,
connect to hostid, port=x
clientSocket =
Socket()
close
connectionSocket
read reply from
clientSocket
close
clientSocket
Server (running on hostid) Client
send request using
clientSocketread request from
connectionSocket
write reply to
connectionSocket
TCP
connection setup

UDP: no “connection” between
client and server
 no handshaking
 sender explicitly attaches IP
address and port of
destination to each packet
 server must extract IP
address, port of sender
from received packet
UDP: transmitted data may be
received out of order, or
lost
20
1
UDP provides unreliable transfer
of groups of bytes (“datagrams”)

20
2
close
clientSocket
Server (running on hostid)
read reply from
clientSocket
create socket,
clientSocket =
DatagramSocket()
Client
Create, address (hostid, port=x,
send datagram request
using clientSocket
create socket,
port=x, for
incoming request:
serverSocket =
DatagramSocket()
read request from
serverSocket
write reply to
serverSocket
specifying client
host address,
port number

20
3
sendPacket
to network from network
receivePacket
inFromUser
keyboard monitor
Process
clientSocket
UDP
packet
input
stream
UDP
packet
UDP
socket
Output: sends
packet (TCP sent
“byte stream”)
Input: receives
packet (TCP
received “byte
stream”)
Client
process
client UDP
socket

 This contains the protocol specific addressing
information that is passed from the user
process to the kernel and vice versa
 Each of the protocols supported by a socket
implementation have their own socket
address structure sockaddr_suffix
Where suffix represents the protocol family
Ex: sockaddr_in – Internet/IPv4 socket address structure
sockaddr_ipx – IPX socket address structure

 The generic socket address structure
sockaddr
{
address family
protocol specific data
};
 The internet/IPv4 socked address structure
sockaddr_in
{
in_family Internet address family
sin_port Transport layer Port Number
in_addr sin_addr IP address;
sin_zero[8] Padding ;
};

 int8_t signed 8-bit integer - <sys/types.h>
 uint8_t unsigned 8-bit integer - <sys/types.h>
 sa_family_t address family of - <sys/socket.h>
 socklen_t length of socket address structure -<sys/socket.h>
 in_addr_t IPv4 address, normally uint32_t <netinet/in.h>
 in_port_t TCP/UDP port, normally uint16_t <netinet/in.h>

 Byte ordering
◦ Network byte order
◦ Host byte order
◦ htons(l), ntohs(l)
 Memory content initialization
◦ memset(buffer,value,buffersize)
 Data copying and comparison
◦ memcpy(dest,src,num_of_bytes)
◦ memcmp(buffer1,buffer2,num_of_bytes)

 IP address notation conversion
◦ Integer notation
◦ Dotted decimal notation
 status inet_aton(ddstring_pointer,address_pointer)
 Returns 1 on success 0 on error
 ddstring_pointer inet_ntoa(address_pointer)
 address_pointer inet_addr(ddstring_pointer)
*deprecated

 sockfd socket(domain, type, protocol)
◦ domain is the protocol/address family AF_INET,AF_IPX..
◦ type is the the type of service SOCK_DGRAM,SOCK_STREAM
…
◦ protocol is the specific protocol that is supported by the
protocol family specified(as param1)
◦ Returns a fresh socket descriptor on success, –1 on error
 status close(sockfd)
◦ Flushes(supposed to) the pending I/O to disk
◦ Returns –1 on error

 status bind(sockfd,ptr_to_sockaddr,sockaddr_size)
◦ Associates the sockaddr with sockfd
◦ The rules for successful binding depend on the protocol
family of the socket(specified during call to socket)
◦ Necessary for receiving connections on STREAM socket
 status listen(sockfd,backlog)
◦ Notifies the willingness to accept connections
◦ backlog Maximum number of established connections
yet to be notified to their respective user
processes(calls to accepts)
◦ On unbounded sockets an implicit bind is done with
IN_ADDRANY and a random port as the address and
port parameters respectively
* Above calls return –1 on error
struct sockaddr_in {
unsigned short sin_family; /* address family (always AF_INET) */
unsigned short sin_port; /* port num in network byte order */
struct in_addr sin_addr; /* IP addr in network byte order */
unsigned char sin_zero[8]; /* pad to sizeof(struct sockaddr) */
};

 connfd accept(sockfd,ptr_to_sockaddr,ptr_to_sockaddr_size)
◦ Blocks till a connection gets established on sockfd and
returns a new file descriptor on which I/O can be
performed with the remote entity
◦ Fills the sockaddr and size parameters with the address
information (and it’s size respectively) of the connecting
entity
◦ bind and listen are assumed to have been called on sockfd
prior to calling accept
 status connect(sockfd, ptr_to_sockaddr, sockaddr_size)
◦ Initiates a new connection with the entity addressed by
sockaddr in case of a STREAM socket
◦ Sets the default remote address for I/O in case of DGRAM
socket
* Above calls return –1 on error

 SEND: int send(int sockfd, const void *msg, int len, int flags);
◦ msg: message you want to send
◦ len: length of the message
◦ flags := 0
◦ returned: the number of bytes actually sent
 RECEIVE: int recv(int sockfd, void *buf, int len, unsigned int
flags);
◦ buf: buffer to receive the message
◦ len: length of the buffer (“don’t give me more!”)
◦ flags := 0
◦ returned: the number of bytes received

 SEND (DGRAM-style): int sendto(int sockfd, const void *msg,
int len, int flags, const struct sockaddr *to, int tolen);
◦ msg: message you want to send
◦ len: length of the message
◦ flags := 0
◦ to: socket address of the remote process
◦ tolen: = sizeof(struct sockaddr)
◦ returned: the number of bytes actually sent
 RECEIVE (DGRAM-style): int recvfrom(int sockfd, void *buf, int
len, unsigned int flags, struct sockaddr *from, int *fromlen);
◦ buf: buffer to receive the message
◦ len: length of the buffer (“don’t give me more!”)
◦ from: socket address of the process that sent the data
◦ fromlen:= sizeof(struct sockaddr)
◦ flags := 0
◦ returned: the number of bytes received
 CLOSE: close (socketfd);

Concurrent server
SOCKET
BIND
LISTEN
CONNECT
ACCEPT
RECEIVE
RECEIVE
SEND
SEND
CLOSE
TCP three-way
handshake

CREATE
BIND
SEND
SEND
CLOSE
RECEIVE

 For example: web server
 What does a web server
need to do so that a web
client can connect to it?
21
7
TCP
IP
Ethernet Adapter
Web Server
Port 80
TCP Server

 Since web traffic uses TCP, the web server must create a
socket of type SOCK_STREAM
int fd; /* socket descriptor */
if((fd = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
perror(“socket”);
exit(1);
}
• socket returns an integer (socket descriptor)
• fd < 0 indicates that an error occurred
• AF_INET associates a socket with the Internet protocol family
• SOCK_STREAM selects the TCP protocol
Socket I/O: socket()

 A socket can be bound to a port
21
9
struct sockaddr_in srv; /* used by bind() */
/* create the socket */
srv.sin_family = AF_INET; /* use the Internet addr family */
srv.sin_port = htons(80); /* bind socket ‘fd’ to port 80*/
/* bind: a client may connect to any of my addresses */
srv.sin_addr.s_addr = htonl(INADDR_ANY);
if(bind(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) {
perror("bind"); exit(1);
}
• Still not quite ready to communicate with a client...
Socket I/O: bind()

 listen indicates that the server will accept a connection
22
0
/* 1) create the socket */
/* 2) bind the socket to a port */
if(listen(fd, 5) < 0) {
perror(“listen”);
exit(1);
}
• Still not quite ready to communicate with a client...

 accept blocks waiting for a connection
22
2
struct sockaddr_in cli; /* used by accept() */
int newfd; /* returned by accept() */
int cli_len = sizeof(cli); /* used by accept() */
/* 3) listen on the socket */
newfd = accept(fd, (struct sockaddr*) &cli, &cli_len);
if(newfd < 0) {
perror("accept"); exit(1);
}
• accept returns a new socket (newfd) with the same properties as the
original socket (fd)
• newfd < 0 indicates that an error occurred

22
3
struct sockaddr_in cli; /* used by accept() */
int newfd; /* returned by accept() */
int cli_len = sizeof(cli); /* used by accept() */
newfd = accept(fd, (struct sockaddr*) &cli, &cli_len);
if(newfd < 0) {
perror("accept");
exit(1);
}
• How does the server know which client it is?
• cli.sin_addr.s_addr contains the client’s IP address
• cli.sin_port contains the client’s port number
• Now the server can exchange data with the client by
using read and write on the descriptor newfd.
• Why does accept need to return a new descriptor?

 read can be used with a socket
 read blocks waiting for data from the client but
does not guarantee that sizeof(buf) is read
22
4
char buf[512]; /* used by read() */
int nbytes; /* used by read() */
/* 3) listen on the socket */
/* 4) accept the incoming connection */
if((nbytes = read(newfd, buf, sizeof(buf))) < 0) {
perror(“read”); exit(1);
}

 For example: web client
 How does a web client
connect to a web server?
22
5
TCP
IP
Ethernet Adapter
2 Web Clients

 IP Addresses are commonly written as strings (“128.2.35.50”),
but programs deal with IP addresses as integers.
22
6
struct sockaddr_in srv;
srv.sin_addr.s_addr = inet_addr(“128.2.35.50”);
if(srv.sin_addr.s_addr == (in_addr_t) -1) {
fprintf(stderr, "inet_addr failed!n"); exit(1);
}
Converting a numerical address to a string:
struct sockaddr_in srv;
char *t = inet_ntoa(srv.sin_addr);
if(t == 0) {
fprintf(stderr, “inet_ntoa failed!n”); exit(1);
}
Converting strings to numerical address:

 Gethostbyname provides interface to DNS
 Additional useful calls
◦ Gethostbyaddr – returns hostent given sockaddr_in
◦ Getservbyname
 Used to get service description (typically port number)
 Returns servent based on name
22
7
#include <netdb.h>
struct hostent *hp; /*ptr to host info for remote*/
struct sockaddr_in peeraddr;
char *name = “www.cs.cmu.edu”;
peeraddr.sin_family = AF_INET;
hp = gethostbyname(name)
peeraddr.sin_addr.s_addr = ((struct in_addr*)(hp->h_addr))->s_addr;

 connect allows a client to connect to a server...
22
8
struct sockaddr_in srv; /* used by connect() */
/* connect: use the Internet address family */
srv.sin_family = AF_INET;
/* connect: socket ‘fd’ to port 80 */
srv.sin_port = htons(80);
/* connect: connect to IP Address “128.2.35.50” */
if(connect(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) {
perror(”connect"); exit(1);
}

 write can be used with a socket
22
9
struct sockaddr_in srv; /* used by connect() */
char buf[512]; /* used by write() */
int nbytes; /* used by write() */
/* 2) connect() to the server */
/* Example: A client could “write” a request to a server
*/
if((nbytes = write(fd, buf, sizeof(buf))) < 0) {
perror(“write”);
exit(1);
}

23
0
socket()
bind()
listen()
accept()
write()
read()
read()
TCP Server
close()
socket()
TCP Client
connect()
write()
read()
close()
connection establishment
data request
data reply
end-of-file notification

Example: C client (TCP)
/* client.c */
void main(int argc, char *argv[])
{
struct sockaddr_in sad; /* structure to hold an IP address */
int clientSocket; /* socket descriptor */
struct hostent *ptrh; /* pointer to a host table entry */
char Sentence[128];
char modifiedSentence[128];
host = argv[1]; port = atoi(argv[2]);
clientSocket = socket(PF_INET, SOCK_STREAM, 0);
memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */
sad.sin_family = AF_INET; /* set family to Internet */
sad.sin_port = htons((u_short)port);
ptrh = gethostbyname(host); /* Convert host name to IP address */
memcpy(&sad.sin_addr, ptrh->h_addr, ptrh->h_length); connect(clientSocket,
(struct sockaddr *)&sad, sizeof(sad));
Create client socket,
connect to server

Example: C client (TCP), cont.
gets(Sentence);
n=write(clientSocket, Sentence, strlen(Sentence)+1);
n=read(clientSocket, modifiedSentence, sizeof(modifiedSentence));
printf("FROM SERVER: %sn”,modifiedSentence);
close(clientSocket);
}
Get
input stream
from user
Send line
to server
Read line
from server
Close
connection

Example: C server (TCP)
/* server.c */
{
struct sockaddr_in cad;
int welcomeSocket, connectionSocket; /* socket descriptor */
char clientSentence[128];
char capitalizedSentence[128];
port = atoi(argv[1]);
welcomeSocket = socket(PF_INET, SOCK_STREAM, 0);
sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */
sad.sin_port = htons((u_short)port);/* set the port number */
bind(welcomeSocket, (struct sockaddr *)&sad, sizeof(sad));
Create welcoming socket at port
&
Bind a local address

Example: C server (TCP), cont
/* Specify the maximum number of clients that can be queued */
listen(welcomeSocket, 10)
while(1) {
connectionSocket=accept(welcomeSocket, (struct sockaddr *)&cad, &alen);
n=read(connectionSocket, clientSentence, sizeof(clientSentence));
/* capitalize Sentence and store the result in capitalizedSentence*/
n=write(connectionSocket, capitalizedSentence, strlen(capitalizedSentence)+1);
close(connectionSocket);
}
}
Write out the result to socket
End of while loop,
loop back and wait for
another client connection
Wait, on welcoming socket
for contact by a client

 Outline for typical concurrent server

 Status transition
*after return
from accept
*after fork()
returns
*after socket
close()

Socket programming with UDP
UDP: no “connection” between
client and server
• no handshaking
• sender explicitly attaches IP
address and port of
destination to each packet
• server must extract IP
address, port of sender from
received packet
UDP: transmitted data may be
received out of order, or lost
UDP provides unreliable transfer
of groups of bytes (“datagrams”)

 For example: NTP
daemon
 What does a UDP server
need to do so that a UDP
client can connect to it?
23
9
UDP
IP
Ethernet Adapter
NTP
daemon
Port 123

 The UDP server must create a datagram socket…
24
0
if((fd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
perror(“socket”);
exit(1);
}
• socket returns an integer (socket descriptor)
• fd < 0 indicates that an error occurred
• AF_INET: associates a socket with the Internet protocol family
• SOCK_DGRAM: selects the UDP protocol

 A socket can be bound to a port
24
1
/* bind: use the Internet address family */
/* bind: socket ‘fd’ to port 80*/
/* bind: a client may connect to any of my addresses */
srv.sin_addr.s_addr = htonl(INADDR_ANY);
if(bind(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) {
perror("bind"); exit(1);
}
• Now the UDP server is ready to accept packets…

 read does not provide the client’s address to the UDP server
24
2
struct sockaddr_in cli; /* used by recvfrom() */
char buf[512]; /* used by recvfrom() */
int cli_len = sizeof(cli); /* used by recvfrom() */
int nbytes; /* used by recvfrom() */
/* 2) bind to the socket */
nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */,
(struct sockaddr*) &cli, &cli_len);
if(nbytes < 0) {
perror(“recvfrom”); exit(1);
}

24
3
nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */,
(struct sockaddr*) cli, &cli_len);
• The actions performed by recvfrom
• returns the number of bytes read (nbytes)
• copies nbytes of data into buf
• returns the address of the client (cli)
• returns the length of cli (cli_len)
• don’t worry about flags

 How does a UDP client
communicate with a UDP
server?
24
4
TCP
IP
Ethernet Adapter
2 UDP Clients
ports

 write is not allowed
 Notice that the UDP client does not bind a port number
◦ a port number is dynamically assigned when the first sendto is called
24
5
struct sockaddr_in srv; /* used by sendto() */
/* sendto: send data to IP Address “128.2.35.50” port 80 */
nbytes = sendto(fd, buf, sizeof(buf), 0 /* flags */,
(struct sockaddr*) &srv, sizeof(srv));
if(nbytes < 0) {
perror(“sendto”); exit(1);
}

24
6
socket()
bind()
recvfrom()
sendto()
UDP Server
socket()
UDP Client
sendto()
recvfrom()
close()
blocks until datagram
received from a client
data request
data reply

Example: C client (UDP)
/* client.c */
{
int clientSocket; /* socket descriptor */
char Sentence[128];
char modifiedSentence[128];
host = argv[1]; port = atoi(argv[2]);
clientSocket = socket(PF_INET, SOCK_DGRAM, 0);
/* determine the server's address */
sad.sin_port = htons((u_short)port);
ptrh = gethostbyname(host); /* Convert host name to IP address */
memcpy(&sad.sin_addr, ptrh->h_addr, ptrh->h_length);
Create client socket,
NO connection to server

Example: C client (UDP), cont.
gets(Sentence);
addr_len =sizeof(struct sockaddr);
n=sendto(clientSocket, Sentence, strlen(Sentence)+1,
(struct sockaddr *) &sad, addr_len);
n=recvfrom(clientSocket, modifiedSentence, sizeof(modifiedSentence).
(struct sockaddr *) &sad, &addr_len);
printf("FROM SERVER: %sn”,modifiedSentence);
close(clientSocket);
}
Get
input stream
from user
Send line
to server
Read line
from server
Close
connection

Example: C server (UDP)
/* server.c */
{
struct sockaddr_in cad;
int serverSocket; /* socket descriptor */
char clientSentence[128];
char capitalizedSentence[128];
port = atoi(argv[1]);
serverSocket = socket(PF_INET, SOCK_DGRAM, 0);
sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */
sad.sin_port = htons((u_short)port);/* set the port number */
bind(serverSocket, (struct sockaddr *)&sad, sizeof(sad));
Create welcoming socket at port
&
Bind a local address

 How can the UDP server
service multiple ports
simultaneously?
25
0
UDP
IP
Ethernet Adapter
UDP Server
Port 2000Port 3000

 What problems does this code have?
25
1
int s1; /* socket descriptor 1 */
int s2; /* socket descriptor 2 */
/* 1) create socket s1 */
/* 2) create socket s2 */
/* 3) bind s1 to port 2000 */
/* 4) bind s2 to port 3000 */
while(1) {
recvfrom(s1, buf, sizeof(buf), ...);
/* process buf */
recvfrom(s2, buf, sizeof(buf), ...);
/* process buf */
}

client 1 server client 2
call connect
call accept
call read
ret connect
ret accept
call connect
call fgets
User goes
out to lunch
Client 1 blocks
waiting for user
to type in data
Client 2 blocks
waiting to complete
its connection
request until after
lunch!
Server blocks
waiting for
data from
Client 1
Server Flaw

Concurrent Servers
client 1 server client 2
call connect
call accept
ret connect
ret accept
call connect
call fgets
User goes
out to lunch
Client 1
blocks
waiting for
user to type
in data
call accept
ret connect
ret accept call fgets
write
write
call read
end read
close
close
call read (don’t block)
call read

 while (1) {
 newsock = (int *)malloc(sizeof (int));
 *newsock=accept(sock, (struct sockaddr
*)&from, &fromlen);
 if (*newsock < 0) error("Accepting");
 printf("A connection has been accepted from
%sn",
 inet_ntoa((struct in_addr)from.sin_addr));
 retval = pthread_create(&tid, NULL,
ConnectionThread, (void *)newsock);
 if (retval != 0) {
 error("Error, could not create thread");
 }
 }

 /****** ConnectionThread **********/
 void *ConnectionThread(void *arg)
 {
 int sock, n, len;
 char buffer[BUFSIZE];
 char *msg = "Got your message";

 sock = *(int *)arg;
 len = strlen(msg);
 n = read(sock,buffer,BUFSIZE-1);
 while (n > 0) {
 buffer[n]='0';
 printf("Message is %sn",buffer);
 n = write(sock,msg,len);
 if (n < len) error("Error writing");
 n = read(sock,buffer,BUFSIZE-1);
 if (n < 0) error("Error reading");
 }
 if (close(sock) < 0) error("closing");
 pthread_exit(NULL);
 return NULL;
 }

Concurrency
• Threading
– Easier to understand
– Race conditions increase complexity
• Select()
– Explicit control flows, no race conditions
– Explicit control more complicated
• There is no clear winner, but you
MUST use select()…

What is select()?
• Monitor multiple descriptors
• How does it work?
– Setup sets of sockets to monitor
– select(): blocking until something
happens
– “Something” could be
• Incoming connection: accept()
• Clients sending data: read()
• Pending data to send: write()
• Timeout

Concurrency – Step 1
• Allowing address reuse
• Then we set the sockets to be non-
blocking
int sock, opts=1;
sock = socket(...); // To give you an idea of where the new code goes
setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &opts, sizeof(opts));
if((opts = fcntl(sock, F_GETFL)) < 0) { // Get current options
printf(“Error...n”);
...
}
opts = (opts | O_NONBLOCK); // Don't clobber your old settings
if(fcntl(sock, F_SETFL, opts) < 0) {
printf(“Error...n”);
...
}
bind(...); // To again give you an idea where the new code goes

Concurrency – Step 2
• Monitor sockets with select()
– int select(int maxfd, fd_set *readfds, fd_set
*writefds, fd_set *exceptfds, const struct
timespec *timeout);
• maxfd
– max file descriptor + 1
• fd_set: bit vector with FD_SETSIZE bits
– readfds: bit vector of read descriptors to monitor
– writefds: bit vector of write descriptors to
monitor
– exceptfds: set to NULL
• timeout
– how long to wait without activity before
returning

What about bit vectors?
• void FD_ZERO(fd_set *fdset);
– clear out all bits
• void FD_SET(int fd, fd_set *fdset);
– set one bit
• void FD_CLR(int fd, fd_set *fdset);
– clear one bit
• int FD_ISSET(int fd, fd_set *fdset);
– test whether fd bit is set

The Server
// socket() call and non-blocking code is above this point
if((bind(sockfd, (struct sockaddr *) &saddr, sizeof(saddr)) < 0) { // bind!
printf(“Error bindingn”);
...
}
if(listen(sockfd, 5) < 0) { // listen for incoming connections
printf(“Error listeningn”);
...
}
clen=sizeof(caddr);
// Setup pool.read_set with an FD_ZERO() and FD_SET() for
// your server socket file descriptor. (whatever socket() returned)
while(1) {
pool.ready_set = pool.read_set; // Save the current state
pool.nready = select(pool.maxfd+1, &pool.ready_set, &pool.write_set, NULL, NULL);
if(FD_ISSET(sockfd, &pool.ready_set)) { // Check if there is an incoming conn
isock=accept(sockfd, (struct sockaddr *) &caddr, &clen); // accept it
add_client(isock, &pool); // add the client by the incoming socket fd
}
check_clients(&pool); // check if any data needs to be sent/received from clients
}
...
close(sockfd);

What is pool?
typedef struct { /* represents a pool of connected descriptors */
int maxfd; /* largest descriptor in read_set */
fd_set read_set; /* set of all active read descriptors */
fd_set write_set; /* set of all active read descriptors */
fd_set ready_set; /* subset of descriptors ready for reading */
int nready; /* number of ready descriptors from select */
int maxi; /* highwater index into client array */
int clientfd[FD_SETSIZE]; /* set of active descriptors */
rio_t clientrio[FD_SETSIZE]; /* set of active read buffers */
... // ADD WHAT WOULD BE HELPFUL FOR PROJECT1
} pool;

What about checking clients?
• The main loop only tests for incoming
connections
– There are other reasons the server wakes up
– Clients are sending data, pending data to
write to buffer, clients closing connections,
etc.
• Store all client file descriptors
– in pool
• Keep the while(1) loop thin
– Delegate to functions
• Come up with your own design

 maxfds: number of descriptors to be tested
◦ descriptors (0, 1, ... maxfds-1) will be tested
 readfds: a set of fds we want to check if data is available
◦ returns a set of fds ready to read
◦ if input argument is NULL, not interested in that condition
 writefds: returns a set of fds ready to write
 exceptfds: returns a set of fds with exception conditions
26
4
int select(int maxfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
FD_CLR(int fd, fd_set *fds); /* clear the bit for fd in fds */
FD_ISSET(int fd, fd_set *fds); /* is the bit for fd in fds? */
FD_SET(int fd, fd_set *fds); /* turn on the bit for fd in fds */
FD_ZERO(fd_set *fds); /* clear all bits in fds */

Introduction to socket programming nbv

Introduction to socket programming nbv

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to socket programming nbv

Similar to Introduction to socket programming nbv (20)

More from Nagasuri Bala Venkateswarlu

More from Nagasuri Bala Venkateswarlu (20)

Recently uploaded

Recently uploaded (20)

Introduction to socket programming nbv