An Introduction to the Real-time
Transport Protocol (RTP)
• Application Support
– Reliability control: loss recover, in-sequence delivery,
• Network Control
– Congestion control, rate allocation, etc
• The distinction between the two is not sharp.
– Rate allocation and scheduling can be viewed as part of
either one above.
• This dual view arises when we contemplate
traditional transports: TCP and UDP
Violation of the Old View Leads
to New Ideas
Fine-grained Application Support
• In monolithic transport, application support
function needs to be general. Why?
– Transport sits in the kernel. Hard to modify.
– API needs to be stable.
– The philosophy of some transport designers: transport
should have sufficient generality.
• How to accommodate specific application’s
– Build complex logic into the (monolithic) transport. But
should not be overly ambitious.
WebTP - Current
• WebTP is still monolithic
• Some trade-off of programmability with efficiency, but
may be justifiable.
– The key is to make the user-IP path fast.
Overview of RTP
• Provides end-to-end delivery services for real-time
traffic: interactive audio and video
– Payload identification, sequence numbering,
timestamping and delivery monitoring
• Runs on top of UDP, and less often, TCP.
– RTP does not guarantee delivery or prevent out-of-
• Primarily designed to support multiparty
multimedia conferences, typically assumes IP
Overview – Cont.
• The protocol has two parts.
– RTP: carry real-time data
– RTP control protocol (RTCP): monitor the quality of
service and to convey information about the
• Principles of application level framing and
integrated layer processing.
– Is malleable to provide application specific info.
– Is typically integrated into the application processing.
– Protocol is deliberately not complete. It only contains
the common functions.
– A complete specification for an application also
includes a profile and a payload format document.
Example- Multicast audio
• Need a multicast address and a pair of ports: one
for data and one for (RTCP) control.
• RTP header contains type of audio encoding (such
as PCM). Senders can change encoding during the
• RTP header contains timing information. Audio
data can be played out as they are produced by the
• Senders and receivers multicast reports through
RTCP. Packet loss ratio, delay jitter, and other
status info can be monitored.
Example – Audio and Video
• Audio and video are transmitted using
separate RTP sessions. (with different UDP
ports and/or multicast addresses.)
• Each participant of both sessions can be
identified by the same name in RTCP
• The decoupling of the two sessions allows
some participants to join only one session.
Example – Mixers and
• Mixers: a RTP-level entity that receives streams of RTP
data packets from one or more sources and combines them
into a single stream.
• A translator forwards RTP packets from different sources
• Mixer is like a new RTP-level source to the receivers.
• Translator is more transparent. Receivers can identify
individual sources even though packets pass through the
same translator and carry the translator’s network source
• Mixer can re-synchronize the incoming stream and
generates its own timing info.
Translators and Mixers
• The real distinction between mixers and translators: SSRC
identifier is not changed at a translator, but is changed at a
• They both use a different transport address (network
address + port) at the output side.
• Multiple data packets can be combined into one.
• Uses of translators and mixers: go-through firewalls;
transcoding for low-bandwidth links; adding or removing
encryption; emulating multicast address with one or more
Example: Translator at Firewall
Address a, port p, p+1
Address b, port q, q+1
Note that UDP or TCP connections terminates at Firewall.
Some RTP Definitions
• Transport address: network address + port
• RTP session: communications on a pair of transport
addresses (data + control)
• Synchronization source (SSRC): the source of a stream of
– Identified by 32-bit SSRC identifier.
– All packets from the same SSRC form a single timing and
sequencing space. Receivers group packets by SSRC for playback.
– Not dependent on network address.
– Examples: all packets from a camera; from a mixer; for layered
encoding transmitted on separate RTP sessions a single SSRC is
used for all layers.
– A participant need not use the same SSRC for all RTP sessions in a
RTP Fixed Header
X: Header Extension
CC: CSRC count
M: Marker of record
PT: Payload type; mapping can be
specified by profile of the
Sequence number: for each packet
can be used by the receiver to
detect loss or restore sequence.
RTP Fixed Header – Cont.
– Reflects sampling instant of the first byte of data
– Clock frequency can be specified by profile of payload
format documents for the application.
– Example: for fixed-rate audio, clock may increment by
one for each sampling period.
• SSRC: chosen randomly for each synchronization
source; with the intent that no two synchronization
sources in the same session have the same SSRC.
Profile-Specific Modifications to
the RTP Header
• Marker bit and payload type are interpreted
according to the application’s profile.
• Moreover, the byte containing them can be
redefined by the profile.
• If a particular class of application needs additional
functionality, the profile should define additional
fixed fields following SSRC.
• If X bit is 1, exactly one header extension follows
CSRC list (if present).
– Variable length
– Used to experimental purpose
• Primary function is to provide feedback on the quality
of data distribution.
– Through sender and receiver reports;
– For adaptive encoding (adaptive to network congestion);
– Can be used to diagnose faults
• RTCP carries a persistent transport-level identifier for
an RTP source, called canonical name, CNAME.
– Receivers use CNAME to keep track of each participant
– And to synchronize related media streams (with the help of
• Passes participant’s identification for display.
• SR: sender reports; sending and reception
• RR: receiver reports; for reception statistics
from multiple sources.
• SDES: source description item, include
• BYE: indicates end of participation
• APP: application specific functions
Compound RTCP Packets
• A compound RTCP packet contains multiple
RTCP packets of the previous types.
SR Packet – Cont.
• RC: receiver report count
• Length: in 32-bit words – 1
• NTP ts: wallclock time, used to calculate RTT
• RTP ts: in unit and offset of RTP ts in data packets. Can be used
with NTP ts for inter-media synchronization.
• Fraction lost: since the last RR or SR packet was sent. Short term
• LSR: last SRT time stamp; middle 32 bits of NTP timestamp.
• DLSR: delay since last SR; expressed in 1/65536 seconds between
receiving the last SR packet from SSRC_n and sending this report.
Source SSRC_n can compute RTT using DLSR, LSR and the
reception time of the report, A.
RTT = A – LSR – DLSR
• An application’s profile can define extensions to SR or RR packets
CNAME Item in SDES Packet
• Provides a persistent identifier for a source.
• Provides a binding across multiple media used by one
participant in a set of related RTP sessions. CNAME should
be fixed for that participant.
• SSRC is bound to CNAME
• Example: email@example.com; firstname.lastname@example.org, etc.
Other Items in SDES Packet
• NAME: user name
• LOC: location
• TOOL: application or tool name
• NOTE: notice/status
BYE: Goodbye RTCP Packet
• Mixers should forward the BYE packet with
• Reason for leaving: string field; e.g., “camera
APP RTCP Packet
• Subtype: allows a set of APP packets to be
defined under one unique name.
• Name: unique name in the scope of one this
Conclusions - I
• RTP defines transport support for common
functions of real-time applications.
– Timing information: sampling period and NTP
– Synchronization source for playback
– Payload types (encoding)
– Quality reports: short-term and long-term packet loss, and jitters.
– Participants indication: CNAME, NAME, EMAIL, etc.
– Multicast distribution support
– Conversion: mixers and translators
• Extensible protocol by profile payload format documents
• Customizable to application or application classes.
Necessity of this feature is not clear.
Conclusion – II
• Separation of control and data stream
(analogous to out-band signaling)
– Data header overhead is small.
– Can accomplish complex control features.
– Complexity of the protocol/algorithm is not so
bad, because there is little hard guarantee (It
relies on TCP or application for hard
Conclusions – III
• Congestion control is not defined in baseline
document, but may be defined by application’s
– Leads to application-specific congestion control or
• RTP can be considered user-space transport
entities, but does not run as stand-alone process.
• Mixers and translators are stand-alone processes.
They terminate TCP or UDP connections.
A View of Future Network
Layer 3 Systems
RTP Algorithms - I
• RTCP packets generation: need to limit the control
– Control traffic takes 5% of data traffic bandwidth (not
– ¼ of the RTCP bandwidth is used by senders
– Interval between RTCP packets scales linearly with the
number of members in the group.
– Each compound RTCP packet must include a report
packet and a SDES packet for timely feedback.
RTP Algorithms - II
• SSRCs are chosen randomly and locally and can
• Loops introduced by mixers and translators
– A translator may incorrectly forward a packet to the
same multicast group from which it has received the
– Parallel translators.
• Collision avoidance of SSRC and loop detection
Example of A Profile Document
• RTP data header:
– use one marker bit
– No additional fixed fields
– No RTP header extensions are defined.
– No additional RTCP packet types.
– No SR/RR extensions are defined
– SDES use: CNAME is sent every reporting interval,
other items should be sent only every fifth reporting
RFC1890: RTP Profile for Audio and Video Conferences with Minimal Control.