SIP (Session initiation protocol)
RFC 3261 https://www.ietf.org/rfc/rfc3261.txt
Alexey KOVRIZHNYKH
Traditional telephony challenges
´ Capacity, 1xE1=2Mbps=30-31 calls (depending on signaling)
´ Lower diversity, a need to have end-to-end links. Low portability for end-
users
´ Cost
E1 E1
What is IP telephony?
Transfer of audio and/or video over IP from one endpoint to another
´ Capacity, 1xE1=2Mbps=124 calls (depending on
settings)
´ Full diversity of IP networks
´ Lower cost
SIP architecture
´ SIP signaling
´ RTP (real-time transport protocol) – transports the voice and video over IP
networks.
´ RTCP (real-time control protocol) – monitors transmission statistics and
quality of service (QoS). Uses ~5% of RTP
´ SDP (session description protocol) – negotiates technical data for RTP
SIP (session initiation protocol)
´ By default uses UDP port 5060
´ Carries signaling messages
´ Register
´ Invite
´ Trying (100)
´ Ok (200)
´ Ack
´ Ringing (180)
´ Session progress (183)
´ Bye
´ Etc
RTP (real-time transport protocol)
´ Audio and video runs separately
´ Mostly use unprivileged UDP ports (1024 to 65535)
´ The audio payload formats include G.711, G.723, G.726, G.729, GSM and
other codecs
´ The video payload formats include H.261, H.263, H.264, MPEG-4 etc
SDP (session description protocol)
It’s a protocol that describes the media of a session.
Session description
v= (protocol version number, currently only 0)
o= (originator and session identifier : username, id, version number, network address)
s= (session name : mandatory with at least one UTF-8-encoded character)
c=* (connection information—not required if included in all media)
b=* (zero or more bandwidth information lines)
Media description (if present)
m= (media name and transport address, and set of codecs)
i=* (media title or information field)
c=* (connection information — optional if included at session level)
b=* (zero or more bandwidth information lines)
k=* (encryption key)
a=* (zero or more media attribute lines — overriding the Session attribute lines, with codec and sampling rate)
[Request]
v=0
o=Makara 2890844526 2890844526 IN IP4 10.120.42.3
s= KHNOG
c=IN IP4 10.120.42.3
m=audio 49170 RTP/AVP 0 8 97 (Real-time protocol, Audio-video profile)
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
m=video 51372 RTP/AVP 31 32
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000
[Answer]
v=0
o=Vibol 2808844564 2808844564 IN IP4 10.120.32.12
s=KHNOG
c=IN IP4 10.120.32.12
m=audio 49174 RTP/AVP 0
a=rtpmap:0 PCMU/8000
m=video 49170 RTP/AVP 32
a=rtpmap:32 MPV/90000
Packetization period
Packetization period (cont’d)
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
20
ms
50 packets*20ms/packet=1000ms=1s
Voice codecs
IP overhead=Eth(14)+IP(20)+UDP(8)+RTP(12) = 54 bytes
ETH=src_mac(6)+dst_mac(6)+ether_type(2)=14 bytes
Codec G.711(PCM) G.711(PCM)		 G.729r8 G.729r8	 G.711(PCM)			 G.729r8		
Codec	bw,	Kbit/s 64 64 8 8 64 8
Codec	bw,	bit/s 64000 64000 8000 8000 64000 8000
Codec	bw,	bytes/s 8000 8000 1000 1000 8000 1000
Packetization,	ms 20 30 20 40 20 500
PPS	(packet	per	second) 50 33 50 25 2 2
Packet	size,	bytes 160 240 20 40 4000 500
IP	overhead,	bytes 54 54 54 54 54 54
Total	IP	packet	size,	bytes 214 294 74 94 4054 554
Bandwith	on	the	line,	Kbps 85.6 78.4 29.6 18.8 64.864 8.864
Calls	fitting	into	1xE1	(2048Kbps) 24 26 69 109 32 231
PCM=Pulse code modulation,
coming from traditional phone
system
MOS (mean opinion score)
MOS=5 for a
face to face talk
• zero loss
• zero delay
• zero compression
• zero processing delay
Signaling sample
VoIP related features
´ VAD (voice activity detection) to not transfer packets if no voice
´ CNG (comfort noise generation) to create a background noise in the
phone
´ ALG (application layer gateway) to support NAT
´ RTP header compression
Sample call
Sample call
#111
10.0.111.2/30
#222
10.0.222.2/30
Wireshark	sniffer
SPAN	
port
Console
10.0.111.1/30 10.0.222.1/30
Signaling,	SIP+SDP,	
UDP	5060
Voice,	RTP
UDP	random	port

SIP (Session Initiation Protocol)

  • 1.
    SIP (Session initiationprotocol) RFC 3261 https://www.ietf.org/rfc/rfc3261.txt Alexey KOVRIZHNYKH
  • 2.
    Traditional telephony challenges ´Capacity, 1xE1=2Mbps=30-31 calls (depending on signaling) ´ Lower diversity, a need to have end-to-end links. Low portability for end- users ´ Cost E1 E1
  • 3.
    What is IPtelephony? Transfer of audio and/or video over IP from one endpoint to another ´ Capacity, 1xE1=2Mbps=124 calls (depending on settings) ´ Full diversity of IP networks ´ Lower cost
  • 4.
    SIP architecture ´ SIPsignaling ´ RTP (real-time transport protocol) – transports the voice and video over IP networks. ´ RTCP (real-time control protocol) – monitors transmission statistics and quality of service (QoS). Uses ~5% of RTP ´ SDP (session description protocol) – negotiates technical data for RTP
  • 5.
    SIP (session initiationprotocol) ´ By default uses UDP port 5060 ´ Carries signaling messages ´ Register ´ Invite ´ Trying (100) ´ Ok (200) ´ Ack ´ Ringing (180) ´ Session progress (183) ´ Bye ´ Etc
  • 6.
    RTP (real-time transportprotocol) ´ Audio and video runs separately ´ Mostly use unprivileged UDP ports (1024 to 65535) ´ The audio payload formats include G.711, G.723, G.726, G.729, GSM and other codecs ´ The video payload formats include H.261, H.263, H.264, MPEG-4 etc
  • 7.
    SDP (session descriptionprotocol) It’s a protocol that describes the media of a session. Session description v= (protocol version number, currently only 0) o= (originator and session identifier : username, id, version number, network address) s= (session name : mandatory with at least one UTF-8-encoded character) c=* (connection information—not required if included in all media) b=* (zero or more bandwidth information lines) Media description (if present) m= (media name and transport address, and set of codecs) i=* (media title or information field) c=* (connection information — optional if included at session level) b=* (zero or more bandwidth information lines) k=* (encryption key) a=* (zero or more media attribute lines — overriding the Session attribute lines, with codec and sampling rate) [Request] v=0 o=Makara 2890844526 2890844526 IN IP4 10.120.42.3 s= KHNOG c=IN IP4 10.120.42.3 m=audio 49170 RTP/AVP 0 8 97 (Real-time protocol, Audio-video profile) a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 iLBC/8000 m=video 51372 RTP/AVP 31 32 a=rtpmap:31 H261/90000 a=rtpmap:32 MPV/90000 [Answer] v=0 o=Vibol 2808844564 2808844564 IN IP4 10.120.32.12 s=KHNOG c=IN IP4 10.120.32.12 m=audio 49174 RTP/AVP 0 a=rtpmap:0 PCMU/8000 m=video 49170 RTP/AVP 32 a=rtpmap:32 MPV/90000
  • 8.
  • 9.
  • 10.
    Voice codecs IP overhead=Eth(14)+IP(20)+UDP(8)+RTP(12)= 54 bytes ETH=src_mac(6)+dst_mac(6)+ether_type(2)=14 bytes Codec G.711(PCM) G.711(PCM) G.729r8 G.729r8 G.711(PCM) G.729r8 Codec bw, Kbit/s 64 64 8 8 64 8 Codec bw, bit/s 64000 64000 8000 8000 64000 8000 Codec bw, bytes/s 8000 8000 1000 1000 8000 1000 Packetization, ms 20 30 20 40 20 500 PPS (packet per second) 50 33 50 25 2 2 Packet size, bytes 160 240 20 40 4000 500 IP overhead, bytes 54 54 54 54 54 54 Total IP packet size, bytes 214 294 74 94 4054 554 Bandwith on the line, Kbps 85.6 78.4 29.6 18.8 64.864 8.864 Calls fitting into 1xE1 (2048Kbps) 24 26 69 109 32 231 PCM=Pulse code modulation, coming from traditional phone system
  • 11.
    MOS (mean opinionscore) MOS=5 for a face to face talk • zero loss • zero delay • zero compression • zero processing delay
  • 12.
  • 13.
    VoIP related features ´VAD (voice activity detection) to not transfer packets if no voice ´ CNG (comfort noise generation) to create a background noise in the phone ´ ALG (application layer gateway) to support NAT ´ RTP header compression
  • 14.
  • 15.