SIP is a signaling protocol used to create, manage, and terminate multimedia sessions over IP networks. It allows for establishing the location of users, negotiating features between participants in a session, and managing calls by adding, dropping or transferring participants. SIP is responsible for setting up sessions but not for transmitting media or controlling quality of service - those functions are handled by other protocols. A typical SIP call involves a SIP client sending an INVITE request which is forwarded through SIP proxies to the destination, where if accepted, a 200 OK response confirms call setup along with an ACK from the initiator.
1. SIP (Session Initiation Protocol)
Introduction
SIP (Session Initiation Protocol) is a signaling protocol used to create, manage and
terminate sessions in an IP based network. A session could be a simple two-way
telephone call or it could be a collaborative multi-media conference session. This makes
possible to implement services like voice-enriched e-commerce, web page click-to-dial or
Instant Messaging with buddy lists in an IP based environment. Don't worry if you don't
know about these services. You don't need to know them before you learn about SIP.
SIP has been the choice for services related to Voice over IP (VoIP) in the recent past. It
is a standard (RFC 3261) put forward by Internet Engineering Task Force (IETF). SIP is
still growing and being modified to take into account all relevant features as the
technology expands and evolves. But it should be noted that the job of SIP is limited to
only the setup and control of sessions. The details of the data exchange within a session
e.g. the encoding or codec related to an audio/video media is not controlled by SIP and is
taken care of by other protocols. For an overview of the major SIP functions, click here.
This introduction is meant for beginners. This beginners' made easy tutorial is to give a
brief introduction to SIP before one ventures into the long RFC documents. However, if
you are a veteran please go through this short tutorial and suggest modifications.
Here on this site the aim is not to make you an expert of SIP based applications. I doubt
whether any site can do that. You have to have hands on experience to muster the aspects
related to Internet multimedia or IP telephony. Here I am proposing nothing new. The
whole job is to initiate a newcomer with the facets of the Session Initiation protocol (SIP)
so that a near 200 page RFC document does not intimidate you. However I strongly
recommend that you go through the document of RFC 3261 once you have completed
this tutorial.
If you need a book that you can use to start with SIP, SIP Demystified is a good
option. It starts with standard telephony systems and gradually guide you into Session
Initiation Protocol.
We shall start with a little background history of SIP. If you are in a hurry, you can skip
to the functions of SIP.
2. After going through the online tutorial, I recommend that you go through some of the
books as your needs and interests are. You can visit the books section or directly check
those available in amazon.com.
A Brief History of SIP
Initially only the traditional switch-based telephone system was the main medium for
transmitting messages. However with the advent of the Internet, the need was felt to
fabricate a system, which connects people over the IP based network. Different
communities put forward different solutions but the solution presented by IETF was
finally accepted as the most general one. However the development of SIP in IETF was
not a one-step process.
February 1996
Initial Internet drafts were produced in the form of -
Session Invitation Protocol (SIP) – M.Handley, E.Schooler
Simple Conference Invitation Protocol (SCIP) – H.Schulzrinne
SIP was originally intended to create a mechanism for inviting people to large-scale
multipoint conferences on the Internet Multicast Backbone (Mbone). At this stage, IP
telephony didn't really exist. The first draft was known as "draft-ietf-mmusic-sip-00”. It
included only one request type, which was a call setup request. (Wondering what music is
doing in SIP??? Well, it is an acronym for Multiparty Multimedia Session Control. IETF
people are not that music crazy after all.)
December 1996
A newer version “draft-ietf-mmusic-sip-01” was proposed as a modification to SIP-0.
Still it was yet to take the shape of SIP as we know it now.
January 1999
The IETF published the draft called "draft-ietf-mmusic-sip-12". It contained the six
requests that SIP has today.
March 1999
SIP published RFC 2543 as a standard.
It was modified further to generate the more modern version of RFC 3261.
3. Let's leave the history to get older and concentrate on perhaps the most important part of
this tutorial. Let's know about the functions of SIP.
Functions of SIP
SIP is limited to only the setup, modification and termination of sessions. It serves four
major purposes
• SIP allows for the establishment of user location (i.e. translating from a user's
name to their current network address).
• SIP provides for feature negotiation so that all of the participants in a session can
agree on the features to be supported among them.
• SIP is a mechanism for call management - for example adding, dropping, or
transferring participants.
• SIP allows for changing features of a session while it is in progress.
All of the other key functions are done with other protocols.
Yes! this does indeed mean that SIP is not a session description protocol, and that SIP
does not do conference control. SIP is not a resource reservation protocol and it has
nothing to do with quality of service (QoS). SIP can work in a framework with other
protocols to make sure these roles are played out - but SIP does not do them. SIP can
function with SOAP, HTTP, XML, VXML , WSDL, UDDI, SDP and others. Everyone
has a role to play!
With all that said, SIP is still one of the most important protocols. Better learn about the
SIP components.
Components of SIP
Entities interacting in a SIP scenario are called User Agents (UA)
User Agents may operate in two fashions -
• User Agent Client (UAC) : It generates requests and send those to servers.
• User Agent Server (UAS) : It gets requests, processes those requests and generate
responses.
Note: A single UA may function as both.
4. Clients:
In general we associate the notion of clients to the end users i.e. the applications running
on the systems used by people. It may be a softphone application running on your PC or a
messaging device in your IP phone. It generates a request when you try to call another
person over the network and sends the request to a server (generally a proxy server). We
will go through the format of requests and proxy servers in more detail later.
Servers:
Servers are in general part of the network. They possess a predefined set of rules to
handle the requests sent by clients.
Servers can be of several types -
• Proxy Server: These are the most common type of server in a SIP environment.
When a request is generated, the exact address of the recipient is not know in
advance. So the client sends the request to a proxy server. The server on behalf of
the client (as if giving a proxy for it) forwards the request to another proxy server
or the recipient itself.
• Redirect Server: A redirect server redirects the request back to the client
indicating that the client needs to try a different route to get to the recipient. It
generally happens when a recipient has moved from its original position either
temporarily or permanently.
• Registrar: As you might have guessed already, one of the prime jobs of the
servers is to detect the location of an user in a network. How do they know the
location? If you are thinking that users have to register their locations to a
Registrar server, you are absolutely right. Users from time to time refreshes their
locations by registering (sending a special type of message) to a Registrar server.
• Location Server: The addresses registered to a Registrar are stored in a Location
Server.
Now that the components are ready, we need the SIP commands to make them work.
Commands of SIP
• INVITE :Invites a user to a call
• ACK : Acknowledgement is used to facilitate reliable message exchange for
INVITEs.
• BYE :Terminates a connection between users
• CANCEL :Terminates a request, or search, for a user. It is used if a client sends an
INVITE and then changes its decision to call the recipient.
• OPTIONS :Solicits information about a server's capabilities.
5. • REGISTER :Registers a user's current location
• INFO :Used for mid-session signaling
If you don't realise how the commands exactly work, don't worry. We will discuss the
format of some of the above SIP commands in more detail shortly.
It's time to go through a typical SIP session so that you can appreciate what we have
learnt so far and what follows in our journey through SIP.
A Typical Example of SIP session
SIP signaling follows the server-client paradigm as used widely in the Internet by
protocols like HTTP or SMTP. The following picture presents a typical exchange of
requests and responses. Please note that it is only a typical case and doesn't include all
possible cases.
If you unfamiliar with terms like SIP phone or softphone, learn about VoIP phones.
Better open it in a new window.
Before understanding the methods, first you should understand the pictorial diagram.
User 1 uses his softphone to reach the SIP phone of user2. Server1 and server2 help to
setup the session on behalf of the users. This common arrangement of the proxies and the
end-users is called "SIP Trapezoid" as depicted by the dotted line. The messages appear
vertically in the order they appear i.e. the message on top (INVITE M1) comes first
followed by others. The direction of arrows shows the sender and recipient of each
message. Each message contains a 3-digit-number followed by a name and each one is
labeled by 'M' and a serial number. The 3-digit-number is the numerical code of the
associated message comprehended easily by machines. Human users use the name to
identify the message.
6. Figure : SIP session example with SIP trapezoid
The transaction starts with user1 making an INVITE request for user2. But user1 doesn't
know the exact location of user2 in the IP network. So it passes the request to server1.
Server1 on behalf of user1 forwards an INVITE request for user2 to server2. It sends a
TRYING response to user1 informing that it is trying to reach user2. The response could
have been different but we will discuss the other types of responses later. If you are
wondering how server1 knows that it has to forward the request to server2, just hold on
for a moment. We will discuss that while going through the registration process of SIP.
Receiving INVITE M2 from server1, server2 works in a similar fashion as server1. It
forwards an INVITE request to user2 (note: Here server2 knows the location of user2. If
it didn't know the location, it would have forwarded it to another proxy server. So an
INVITE request may travel through several proxies before reaching the recipient). After
forwarding INVITE M3 server2 issues a TRYING response to server1.
The SIP phone, on receiving the INVITE request, starts ringing informing user2 that a
call request has come. It sends a RINGING response back to server2 which reaches user1
through server1. So user1 gets a feedback that user2 has received the INVITE request.
User2 at this point has a choice to accept or decline the call. Let's assume that he decides
to accept it. As soon as he accepts the call, a 200 OK response is sent by the phone to
server2. Retracing the route of INVITE, it reaches user1. The softphone of user1 sends an
ACK message to confirm the setup of the call. This 3-way-handshaking
7. (INVITE+OK+ACK) is used for reliable call setup. Note that the ACK message is not
using the proxies to reach user2 as by now user1 knows the exact location of user2.
Once the connection has been setup, media flows between the two endpoints. Media flow
is controlled using protocols different from SIP e.g. RTP.
When one party in the session decides to disconnect, it (user2 in this case) sends a BYE
message to the other party. The other party sends a 200 OK message to confirm the
termination of the session.
Was that a bit long? Need a break? Go, get it! You deserve a break after going through
such a long SIP session -:) When you get back, we will dive inside a SIP request
message.
Request Message Format of SIP
Back already! Well, let's continue.
In the previous SIP session example we have seen that requests are sent by clients to
servers. We will now discuss what that request actually contains. The following is the
format of INVITE request as sent by user1.
INVITE sip:user2@server2.com SIP/2.0
Via: SIP/2.0/UDP pc33.server1.com;branch=z9hG4bK776asdhds Max-Forwards: 70
To: user2 <sip:user2@server2.com>
From: user1 <sip:user1@server1.com>;tag=1928301774
Call-ID: a84b4c76e66710@pc33.server1.com
CSeq: 314159 INVITE
Contact: <sip:user1@pc33.server1.com>
Content-Type: application/sdp
Content-Length: 142
---- User1 Message Body Not Shown ----
The first line of the text-encoded message is called Request-Line. It identifies that the
message is a request.
Request-Line
Method SP Request-URI SP SIP-Version CRLF
[SP = single-space & CRLF=Carriage Return + Line Feed (i.e. the character inserted when you press the
"Enter" or "Return" key of your computer)]
Here method is INVITE, request-uri is "user2@server2.com" and SIP version is 2.
The following lines are a set of header fields.
• Via:
8. It contains the local address of user1 i.e. pc33.server1.com where it is expecting
the responses to come.
• Max-Forward:
It is used to limit the number of hops that this request may take before reaching
the recipient. It is decreased by one at each hop. It is necessary to prevent the
request from traveling forever in case it is trapped in a loop.
• To:
It contains a display name "user2" and a SIP or SIPS URI <user2@server2.com>
• From:
It also contains a display name "user1" and a SIP or SIPS URI
<user1@server1.com>. It also contains a tag which is a pseudo-random sequence
inserted by the SIP application. It works as an identifier of the caller in the dialog.
• Call-ID:
It is a globally unique identifier of the call generated as the combination of a
pseudo-random string and the softphone's IP address.
The Call-ID is unique for a call. A call may contain several dialogs. Each dialog is uniquely
identified by a combination of From, To and Call-ID. If you are in confusion click here.
• CSeq:
It contains an integer and a method name. When a transaction starts, the first
message is given a random CSeq. After that it is incremented by one with each
new message. It is used to detect non-delivery of a message or out-of-order
delivery of messages.
• Contact:
It contains a SIP or SIPS URI that is a direct route to user1. It contains a username
and a fully qualified domain name(FQDN). It may also have an IP address.
Via field is used to send the response to the request. Contact field is used to send future
requests. That is why the 200 OK response from user2 goes to user1 through proxies. But when
user2 generates a BYE request (a new request and not a response to INVITE), it goes directly to
user1 bypassing the proxies.
• Content-Type:
It contains a description of the message body (not shown).
9. • Content-Length:
It is an octet (byte) count of the message body.
The header may contain other header fields also. However those fields are optional.
Please note that the body of the message is not shown here. The body is used to convey
information about the media session written in Session Description Protocol (SDP). You
may continue your journey through SIP without worrying about SDP right now. However
it doesn't hurt to take a peep.
Your SIP request is waiting to get a SIP response message.
Response Message Format of SIP
Here is what the SIP response of user2 will look like.
SIP/2.0 200 OK
Via: SIP/2.0/UDP site4.server2.com;branch=z9hG4bKnashds8;received=192.0.2.3
Via: SIP/2.0/UDP
site3.server1.com;branch=z9hG4bK77ef4c2312983.1;received=192.0.2.2
Via: SIP/2.0/UDP pc33.server1.com;branch=z9hG4bK776asdhds;received=192.0.2.1
To: user2 <sip:user2@server2.com>;tag=a6c85cf
From: user1 <sip:user1@server1.com>;tag=1928301774
Call-ID: a84b4c76e66710@pc33.server1.com
CSeq: 314159 INVITE
Contact: <sip:user2@192.0.2.4>
Content-Type: application/sdp
Content-Length: 131
---- User2 Message Body Not Shown ----
Status Line
The first line in a response is called Status line.
SIP-Version SP Status-Code SP Reason-Phrase CRLF
[SP = single-space & CRLF=Carriage Return + Line Feed (i.e. the character inserted when you press the
"Enter" or "Return" key of your computer)]
Here SIP version is 2, Status-Code is 200 and Reason Phrase is OK.
The header fields that follow the status line are similar to those in a request. I will just
mention the differences
• Via:
10. There are more than one via field. This is because each element through which the
INVITE request has passed has added its identity in the Via field. Three Via fields
are added by softphone of user1, server1 the first proxy and server2 the second
proxy. The response retraces the path of INVITE using the Via fields. On its way
back, each element removes the corresponding Via field before forwarding it back
to the caller.
• To:
Note that the To field now contains a tag. This tag is used to represent the callee
in a dialog.
• Contact:
It contains the exact address of user2. So user1 doesn't need to use the proxy
servers to find user2 in the future.
It is a 2xx response. However responses can be differnet depending on particular
situations. Learn about the different types of SIP responses.
Response Types of SIP
The first digit of a Status-Code defines the category of response. So any response
between 100 and 199 is termed as a "1xx" response and so is done for any other type.
SIP/2.0 allows six types of response. They are similar to those of HTTP.
• 1xx: Provisional -- request received, continuing to process the request;
• 2xx: Success -- the action was successfully received, understood, and accepted;
• 3xx: Redirection -- further action needs to be taken in order to complete the
request;
• 4xx: Client Error -- the request contains bad syntax or cannot be fulfilled at this
server;
• 5xx: Server Error -- the server failed to fulfill an apparently valid request;
• 6xx: Global Failure -- the request cannot be fulfilled at any server.
If a response is received having a Status-Code of the form yxx which is not understood
by the receiving party, it treats the response as a y00 response i.e. if a client receives an
unknown response 345, it treats that as a 300 response. An unknown 1xx is treated as 183
(Session in Progress). So each UA must know how to react to 100,183,200,300,400,500
and 600.
In SIP we talk about calls, dialogs, transactions and messages. Frankly, I was pretty
confused initially about how they are related. The next page clarifies their inter-relation.
11. Relation among Call, Dialog, Transaction
& Message
If you are confused with the relation among Call, Dialog, Transaction & Message, you
are not alone. I think quite a good number of people get confused regarding the relation
in the beginning.
Messages are the individual textual bodies exchanged between a server and a client.
There can be two types of messages. Bingo! You already know them ... Requests and
Responses.
Transaction occurs between a client and a server and comprises all messages from the
first request sent from the client to the server up to a final (non-1xx) response sent from
the server to the client. If the request is INVITE and the final response is a non-2xx, the
transaction also includes an ACK to the response. The ACK for a 2xx response to an
INVITE request is a separate transaction.
Dialog is a peer-to-peer SIP relationship between two UAs that persists for some time. A
dialog is identified by a Call-ID, a local tag and a remote tag. A dialog used to be referred
as a 'call leg'.
Call of a callee comprises of all the dialogs it is involved in. I think a Call is same as a
Session.
12. The following figure will make the relation clearer.
(RINGING is a 1xx response and OK is a 2xx response.)
A caller may have connections to a number of callees at a time forming a number of
dialogs. All these dialogs make a single call.
Well, time to reveal a old secret! If you want to know how server1 knew the location of
user2 during the call setup, the page about SIP registration will help you.
Registration in SIP
While going through a typical SIP session you have already seen that the caller doesn't
know the address of the callee initially. The proxy servers do the job of finding out the
exact location of the recipient. What actually happens is that every user registers its
current location to a REGISTRAR server. The application sends a message called
REGISTER informing the server of its present location. The Registrar stores this
binding (between the user and its present address) in a location server which is used by
other proxies to locate the user.
13. User yy uses the IP 195.31.65.152 as its current location and registers it with the server.
This actually helps in user mobility. Say there is a messaging application. You can log in
from different computers. As soon as you log in using your username, the application
REGISTER the username with the IP of that computer. The 'Expire' field reflects the
duration for which this registration will be valid. So the user has to refresh its registration
from time to time.
Please note that the difference between a proxy server and a registration or a location
server is often only logical. Physically they may be situated on the same machine.
Wow!! You have completed the whole of the SIP tutorial. Congratulations! I insist that
you go through the conclusion. It has important information to move forward in your SIP
education.
Conclusion
I hope by now you have got a basic idea of what SIP is and what it does. You should be
able to recognize the major components in a SIP scenario and how different messages are
exchanged to establish and terminate sessions. But you must remember that it is just the
beginning. You should go through the document of RFC 3261. If you are serious about
your learning better get your hands on a book as recommended in the books section.
You should go through the other sections of the site -
• Introduction to RTP : RTP manages the realtime transmission of audio/vedio data
in a session.
• Introduction to SDP : SDP is used for describing a session needed for establishing
and sustaining a session.
• VoIP : VoIP is the technology to transmit voice over an IP network. It's an
emerging area you would like to know about.
I encourage you to go through the resources listed in internet multimedia resources page.
14. I intend to include some more pages regarding header fields and proxy servers in near
future. So keep coming back. If you have any query or suggestion and more importantly
if have found any mistakes in the tutorial, please feel free to email me at
intro_to_multimedia@yahoo.co.in.
RTP Introduction
This introduction is meant for beginners. This beginners' made easy tutorial is to give a
brief introduction to RTP before one ventures into the long RFC documents. However, if
you are a veteran please go through this short tutorial and suggest modifications.
What is RTP?
Real-time Transfer Protocol (RTP) provides end-to-end delivery services for data (such
as interactive audio and video) with real-time characteristics.
It was primarily designed to support multiparty multimedia conferences. However it is
used for different types of applications which we will go through shortly.
RTP is a standard specified in RFC 1889. More recent versions are RFC 3550 and RFC
3551. For an introduction like this we will stick to RFC1889
Real Time aspect of RTP
What is meant by real-time?
The class of methods whose correctness depends not only on whether the result is the
correct one, but also on the time at which the result is delivered.
To make things simpler, lets take an example. Say you want to listen to a song. When you
are downloading it from a site, you don't care whether it is downloaded at the same rate
or not. You just need a reliable download (preferably fast -:)). But what if want to listen
to the song without downloading it? Then you are not only interested to get the whole
data but also the rate at which you receive, otherwise the song loses its charm. Here you
need a real-time transmission.
Note that the example is given only to show how the time-factor is important in transmission of data. Real-
time transmission is more important in multimedia conferences.
RTP gives No Guarantee for timely Delivery
15. Confused?? I bet you are!
Well, the point is that RTP itself does not provide any mechanism to ensure timely
delivery or provide other quality-of-service guarantees. It relies on lower-layer services
(e.g. UDP, TCP) to do so. The dependence will be clearer when we discuss the RTP
packet structure.
So how come is it called a real-time protocol?
RTP provides suitable functionality for carrying real-time content, e.g., a timestamp and
control mechanisms for synchronizing different streams with timing properties. We will
discuss those in more detail soon.
Components of RTP
Before going into the detailed structure of RTP, you should know that RTP is basically a
combination of two parts -
• Real Time Protocol (RTP): It carries real-time data.
• Real Time Control Protocol (RTCP): It monitors the quality of service and
conveys information about the participants.
We will go through RTP first and then discuss RTCP. Both play important roles in the
transmission. Here you should note that the data and control messages are separated in
the forms of RTP and RTCP.
Applications of RTP
The applications in which RTP plays an important role can be classified as follows -
Simple Multicast Audio Conference
Initially the owner of the conference (say the leader of a group) through some allocation
mechanism obtains a multicast group address and pair of ports. One port is used for audio
data, and the other is used for control (RTCP) packets. This address and port information
is distributed to the intended participants. If privacy is desired, the data and control
packets may be encrypted, in which case an encryption key must also be generated and
distributed.
16. Each participant sends the audio data in small chunks (say 20ms) or packets. The
structure of the packets will be discussed later.
Each instance of the audio application (i.e. each participant) in the conference
periodically multicasts a reception report plus the name of its user on the RTCP (control)
port. This helps to monitor quality of transmission and also determine who the present
participants are.
Audio and Video Conference
If both audio and video media are used in a conference, they are transmitted as separate
RTP sessions RTCP packets are transmitted for each medium using two different UDP
port pairs and/or multicast addresses. The canonical name or CNAME of individual
participants are used to match the audio and video sessions. We will CANME when
discuss functions of RTCP.
The sessions are divided so that a participant may choose only one of them. If there is
lecture going on, you can just listen to the professor without having to see his face -:).
Mixers and Translators
So far, we have assumed that all sites want to receive media data in the same format.
However, this may not always be appropriate. For users having connections of different
bandwidth or those working behind a firewall which won't allow IP packets to pass will
need some extra processing. This is done in the form of mixers and translators. We will
discuss them briefly in the next two pages.
Mixer in RTP
It may so happen that all participants in a conference do not have the connection of same
bandwidth. So how do they take part simultaneously?
One solution is that all of them use a lower bandwidth. But this leads to reduced-quality
audio encoding.
A smarter solution exists in the use of a RTP-level relay called a mixer. A mixer may be
placed near the low-bandwidth area. This mixer resynchronizes incoming audio packets
to reconstruct the constant 20 ms spacing generated by the sender, mixes these
reconstructed audio streams into a single stream, translates the audio encoding to a lower-
bandwidth one and forwards the lower-bandwidth packet stream across the low-speed
link. The following figure gives a graphical representation -
17. The mixer puts its own identification as the source (SSRC) of the packet and puts the
contributing sources in CSRC fields. If you don't know about SSRC and CSRC, come
back to this paragraph after going through the RTP header structure.
Mixers have other uses too. An example is a video mixer that scales the images of
individual people in separate video streams and composites them into one video stream to
simulate a group scene.
Translator in RTP
A problem occurs if one or more participants of a conference are behind a firewall which
won't allow an IP packet containing the RTP message to pass. For this situation
translators are used.
Two translators are installed, one on either side of the firewall, with the outside one
funneling all multicast packets received through a secure connection to the translator
inside the firewall. The translator inside the firewall sends them again as multicast
packets to a multicast group restricted to the site's internal network. The following picture
illustrates it -
18. Translator do not change SSRC or CSRC fields unlike mixers. If you don't know about
SSRC and CSRC, come back to this paragraph after going through the RTP header
structure.
Translators can be used for other purposes too e.g. to connect of a group of hosts
speaking only IP/UDP to a group of hosts that understand only ST-II.
Packet Structure of RTP
The structure of a RTP packet is shown below.
The real-time media that is being transferred forms the 'RTP Payload'. RTP header
contains information related to the payload e.g. the source, size, encoding type etc. We
will go through the header structure in the next page.
However the RTP packet can't be transferred as it is over the network. For transferring
we use a transfer protocol called User Datagram Protocol (UDP). We won't discuss UDP
header.
To transfer the UDP packet over the IP network, we need to encapsulate it with a IP
packet. We won't discuss IP header either. To transfer the IP packet over the physical
network even the IP packet is sent within other packets. Those are not shown here.
Header Structure of RTP
The following figure shows the RTP header structure -
19. • version (V): 2 bits
This field identifies the version of RTP. The version is 2 upto RFC 1889.
• padding (P): 1 bit
If the padding bit is set, the packet contains one or more additional padding octets
at the end which are not part of the payload. The last octet of the padding contains
a count of how many padding octets should be ignored. Padding may be needed
by some encryption algorithms with fixed block sizes or for carrying several RTP
packets in a lower-layer protocol data unit.
• extension (X): 1 bit
If the extension bit is set, the fixed header is followed by exactly one header
extension.
• CSRC count (CC): 4 bits
The CSRC count contains the number of CSRC identifiers that follow the fixed
header.
• marker (M): 1 bit
Marker bit is used by specific applications to serve a purpose of its own. We will
discuss this in more detail when we study Application Level Framing.
• payload type (PT): 7 bits
This field identifies the format (e.g. encoding) of the RTP payload and determines
its interpretation by the application. This field is not intended for multiplexing
separate media.
• sequence number: 16 bits
The sequence number increments by one for each RTP data packet sent, and may
be used by the receiver to detect packet loss and to restore packet sequence. The
initial value of the sequence number is random (unpredictable).
• timestamp: 32 bits
The timestamp reflects the sampling instant of the first octet in the RTP data
packet. The sampling instant must be derived from a clock that increments
monotonically and linearly in time to allow synchronization and jitter
calculations.
• SSRC: 32 bits
The SSRC field identifies the synchronization source. This identifier is chosen
randomly, with the intent that no two synchronization sources within the same
RTP session will have the same SSRC identifier.
20. • CSRC list: 0 to 15 items, 32 bits each
The CSRC list identifies the contributing sources for the payload contained in this
packet. The number of identifiers is given by the CC field. If there are more than
15 contributing sources, only 15 may be identified. CSRC identifiers are inserted
by mixers, using the SSRC identifiers of contributing sources.
Synchronization in RTP
The receiver needs three key information for synchronization - the synchronization
source, packets in order and sampling instant of packets which it gets from three header
fields. You must know about the header fields first.
Synchronization Source (SSRC)
The receiver may be receiving data from several sources. So for proper arrangement it
needs to identify the source of individual packets which is possible from the SSRC field.
Sequence Number
It is not enough to identify the source, the order is important too. The sequence number
increments by one for each RTP data packet sent, and may be used by the receiver to
detect packet loss and to restore packet sequence. The loss or out-of-order delivery occurs
due network problems.
Timestamp
For media delivery not just the order of the packets but also the sampling instant of
individual packets are important. Please go through the following paragraph carefully.
Several consecutive RTP packets may have equal timestamps if they are (logically)
generated at once, e.g., belong to the same video frame. Consecutive RTP packets may
contain timestamps that are not monotonic if the data is not transmitted in the order it was
sampled, as in the case of MPEG interpolated video frames. (The sequence numbers of
the packets as transmitted will still be monotonic.) So the sequence number is not enough
for synchronization.
You already know that in a audio/video session audio and video data are transmitted
using separate channels (if you don't know this, please go through applications of RTP).
The receiver matches the video data with corresponding audio data using timestamp.
21. Application Level Framing in RTP
RTP is a protocol framework that is deliberately not complete. It is not steadfast in
certain structures and can be modified in a way to suit a specific application. RTP is
intended to be malleable to provide adequate functionality. This characteristic is known
as Application Level Framing and is an important aspect of RTP.
So a profile specification document is needed for each application to specify the way
RTP is used e.g. to define extensions or modifications to RTP that are specific to a
particular class of applications. Participants in a RTP session should agree to a common
format. Several header fields can be manipulated according to a specific application.
The extension bit may be set to indicate that the fixed header is followed by exactly one
header extension. Extra fields may carry extra information useful for the using
application.
The interpretation of the marker is defined by a profile. It is intended to allow significant
events such as frame boundaries to be marked in the packet stream. A profile may define
additional marker bits or specify that there is no marker bit by changing the number of
bits in the payload type field
A profile also specifies a default static mapping of payload type codes to payload
formats.
RTCP
What is RTCP?
The RTP control protocol (RTCP) is based on the periodic transmission of control
packets to all participants in the session, using the same distribution mechanism as the
data packets.
Functions of RTCP
• It provides feedback on the quality of the data distribution. Different types of
packets are used. We will discuss those in the next page.
• It carries a persistent transport-level identifier for an RTP source called the
canonical name or CNAME. SSRC may change from time to time but CNAME
22. remains the same. It is used to identify a participant during the session. RTCP
may also contain extra information for the participants like email.
• By having each participant send its control packets to all the others, each can
independently observe the number of participants. This number is used to
calculate the rate at which the packets are sent. More users in a session means an
individual source may send packets less frequently.
Types of RTCP packets
• SR: Sender report, for transmission and reception statistics from participants that
are active senders
• RR: Receiver report, for reception statistics from participants that are not active
senders
• SDES: Source description items, including CNAME
• BYE: Indicates end of participation
• APP: Application specific functions
Conclusion of RTP
You should understand that this is only the tip of the iceberg. If you just needed an
introduction, it is OK to stop here. But for bigger things you must go through RFC 1889
and that is not enough. You have to work yourself to be a master in applications
employing RTP.
RFC 1889 has been superceeded by RFC 3550. Thanks to John York for pointing it out.
At this point, I will strongly recommend that if you are serious about the subject please
go through some of the books listed in the books section.
If you have any suggestion, correction, query just mail to