Become familiar with transport technologies and protocols used to transport media over IP networks
RTP / RTCP
SIGNALIG AND MEDIA PROTOCOLS
VOIP signaling protocols define how to set up and tear down calls, how to carry information required to locate users and negotiate capabilities
VOIP media protocols define how to transport audio and video as packets over the IP network
The transport technologies and protocols are fairly independent of the signaling protocols.
CMS Network INVITE TRYING RINGING OK ACK 2 Way media path Signaling Protocol SIP Media Protocol RTP
In a VOIP enabled network, the voice signal is digitized, compressed and converted to IP packets and then transmitted over the IP network.
The mechanism that provides end-to-end delivery services for real-time data such as audio and video or simulation data is defined by Real-Time Transport protocol (RTP).
What the RTP does:
payload type identification,
What the RTP doesn't do:
ensure timely delivery
A new set of rules were introduced to address the quality of service for real-time data
The Real Time Transport Control Protocol (RTP control protocol or RTCP) is based on the periodic transmission of control packets to all participants in the session, using the same distribution mechanism as the data packets.
What RTCP does:
RTCP provides feedback on the quality of the data distribution. This feedback function is performed by the RTCP sender and receiver reports.
RTCP carries a persistent transport-level identifier for an RTP source called the canonical name or CNAME. The CNAME is been used to keep track of each participant in case of multicast environment.
Allows large number of participants in a conference. By using CNAME each party can observe the number of participants and this number is used to calculate the rate at which RTP packets are sent.
What RTCP doesn't do:
ensure timely delivery
How It works - RTP
Using a signaling protocol (SIP) 2 or more end-points negotiate the addresses and a pair of ports. One port is used for audio data (RTP), and the other is used for control (RTCP) packets.
Each end-point sends audio data in small chunks of, say, 20 ms duration.
Each chunk of audio data is preceded by an RTP header; RTP header and data are in turn contained in a UDP packet.
How It Works - RTP
The RTP header indicates the type of audio encoding (such as PCM, ADPCM or LPC), timing information and a sequence number that allow the receivers to reconstruct the timing produced by the source, so that in this example, chunks of audio are contiguously played out the speaker every 20 ms.
The RTP Payload represent the data transported by RTP in a packet, for example audio samples or compressed video data.
To monitor the quality of service each end-point periodically multicasts a reception report plus the name of its user on the RTCP (control) port. The reception report indicates how well the current speaker is being received and may be used to control adaptive encodings.
How It works - RTCP
RTCP header contains information how to interpret the payload.
The RTCP payload may contain
SR: Sender report, for transmission and reception statistics from participants that are active senders
RR: Receiver report, for reception statistics from participants that are not active senders
SDES: Source description items, including CNAME.
IP Header UDP Header RTCP Header RTCP Payload Data
Not all end-points are equal
So far, we have assumed that all sites want to receive media data in the same format. However, this may not always be appropriate. For users having connections of different bandwidth or those working behind a firewall which won't allow IP packets to pass will need some extra processing.
The mechanism that defines how to connect two or more transport-level “clouds“ is called RTP translator/mixer.
Typically, each cloud is defined by a common network and transport protocol (e.g., IP/UDP) plus a multicast address and transport level destination port or a pair of unicast addresses and ports.
Mixer in RTP
Not all participants in a conference have a connection with the same bandwidth. So how do they take part simultaneously? .
A mixer may be placed near the low-bandwidth area. This mixer resynchronizes incoming audio packets to reconstruct the constant 20 ms spacing generated by the sender, mixes these reconstructed audio streams into a single stream, translates the audio encoding to a lower-bandwidth one and forwards the lower-bandwidth packet stream across the low-speed link .
Mixer in RTP
The mechanism that receives streams of RTP data packets from one or more sources, possibly changes the data format, combines the streams in some manner and then forwards the combined stream is called Mixer
A problem occurs if one or more participants of a conference are behind a firewall which won't allow an IP packet containing the RTP message to pass. For this situation translators are used .
Two translators are installed, one on either side of the firewall, with the outside one funneling all multicast packets received through a secure connection to the translator inside the firewall. The translator inside the firewall sends them again as multicast packets to a multicast group restricted to the site's internal network.
Translator in RTP
The mechanism that receives streams of RTP data packets from one or more sources and forwards RTP packets with their SSRC identifier intact is called Mixer.
Source FireWall Translator Receiver Translator
RTP / RTCP
What is a Codec
The word codec is a blending of two words, ' co mpressor- dec ompressor' or, more accurately, ' co der- dec oder'.
A codec is an algorithm (a program), most of the time installed as a software on a server or embedded within a piece of hardware (ATA, IP Phone, etc’), that is used to convert Voice/Video signals into digital data to be transmitted over the Internet or any network during a VoIP call.
PCM (Pulse Code Modulation) is a simple technique of sampling the sound signal at a fixed rate (8000 times/second) and generate a number corresponding to each sample. It assumes no specific property of the signal. So it works reasonably well with all types of sounds.
LPC (Liner Predictive Coding) assumes specific properties of human voice and uses a more complex algorithm to digitize and compress voice data. It works well for sending human utterances offering a low data rate but is not suitable for transmitting music or fax.
SBC (Sub Band Coder) uses a different approach of representing sounds in terms of frequencies rather than sampling at regular intervals.
G.7xx, including G.711, G.721, G.722, G.726, G.727, G.728, G.729, is an suite of ITU-T standards for audio compression and de-compression.
In telephony, there are 2 main algorithms defined in the standard, mu-law algorithm (used in America) and a-law algorithm (used in Europe and the rest of the world). Both are logarithmic, but the later a-law was specifically designed to be simpler for a computer to process.
H.261 is video coding standard defined by ITU. It was designed for data rates which are multiples of 64Kbit/s, and is sometimes called p x 64Kbit/s (p is in the range 1-30).
The coding algorithm is a hybrid of inter-picture prediction, transform coding, and motion compensation.
The H.263, defined by ITU, supports video compression (coding) for video-conferencing and video-telephony applications.
The coding algorithm of H.263 is similar to that used by H.261, however with some improvements and changes to improve performance and error recovery.
The H.264 and the MPEG-4 Part 10, also named Advanced Video Coding (AVC), is jointly developed by ITU and ISO. H.264/MPEG-4 supports video compression (coding) for video-conferencing and video-telephony applications.
RTP / RTCP
Dual-tone multi-frequency ( DTMF ) signaling is used for telephone signaling over the line in the voice-frequency band to instructing a telephone switching system of the telephone number to be dialed, or to issue commands to switching systems or related telephony equipment.
DTMF keypad frequencies DTMF event frequencies
DTMF over VoIP
In-band means that DTMF tones are transmitted as audio packets along with voice data. Therefore only uncompressed codecs like g711 alaw or ulaw can carry in-band DTMF reliably.
RFC2833: DTMFs are detected at the Gateway level and transmitted as special RTP packets (with different payload ID) along with voice data
Out-of-band means that DTMF is transmitted outside of the audio phone conversation usually with signaling protocol.
DTMF Schemes with VoIP Protocols H.323 SIP In-band In-Band In-Band RFC 2833 Out-of-Band H.245 SIP INFO
DTMF Out-of-Band SIP INFO
The SIP INFO method can be used by SIP network elements to transmit DTMF tones out-of-band as telephone-events in a reliable manner independent of the media stream.
In the DTMF relay method the body of the SIP message consists of signaling information and uses the content-type application/dtmf-relay