VoIP Terminology and Concepts
A P V
• Anti-tromboning • Packet Loss • Vamming
Concealment • Vishing
B • PacketCable • VoIP VPN
• Purple minutes • VoIP phone
• Back-to-back user agent • VoIP spam
R • Voice chat
C • Voice over
• Real-time Transport IP
• Call origination Protocol
• Chatter bug • Voice
• Downstream QoS • Session Border
I • Session Initiation
• IP Multimedia Subsystem • Signaling gateway
• Internet telephony service • Soft phone
• Traversal Using Relay
• Lawful interception NAT
• Media Gateway Control
Retrieved from "http://en.wikipedia.org/wiki/Category:VoIP_terminology_
DHN 9-18-07 1
Anti-tromboning (Also referred to as Anti-Hairpinning or Media Release) is a feature
employed in Voice over IP networks that optimizes the use of the access network. A
Session Border Controller handling calls as they pass from the Access Network to the
Core Network can examine the IP Address of both the caller and called parties and if they
reside in the same part of the network the media path can be “released” allowing media to
flow directly between the two parties without entering the access network. The benefits
of this action are twofold: 1) the Caller is not paying for any bandwidth usage on the
carrier network and 2) The carrier's network is less congested.
DHN 9-18-07 2
The Back-to-Back User Agent (B2BUA) acts as a user agent to both ends of a Session
Initiation Protocol (SIP) call. The B2BUA is responsible for handling all SIP signalling
between both ends of the call, from call establishment to termination. Each call is tracked
from beginning to end, allowing the operators of the B2BUA to offer value-added
features to the call.
To SIP clients, the B2BUA acts as a User Agent server on one side and as a User Agent
client on the other (back-to-back) side. The basic implementation of a B2BUA is defined
in RFC 3261. The B2BUA may provide the following functionalities:
• call management (billing, automatic call disconnection, call transfer, etc.)
• network interworking (perhaps with protocol adaptation)
• hiding of network internals (private addresses, network topology, etc.)
• codec translation between two call legs
Because it maintains call state for all SIP calls it handles, failure of a B2BUA affects all
these calls. Often, B2BUAs also terminate and bridge the media streams to have full
control over the whole session.
A Signaling gateway, part of a Session Border Controller, or Asterisk PBX are good
examples of a B2BUA.
DHN 9-18-07 3
Call Origination, also known as voice origination, refers to the collecting of the calls
initiated by a calling party on a telephone exchange of PSTN, and handing off the calls to
a VoIP endpoint or to another exchange or telephone company for completion to a called
In the VoIP world, the opposite of call origination is call termination, where a call
initiated as a VoIP call is terminated to the PSTN.
The term is often used in referring to a VoIP trunking service.
DHN 9-18-07 4
Chatter Bug is a service and hardware device used to route long distance voice calls
over an IP network.
Small hardware device that plugs in between the telephone and phone line. Detects and
automatically routes long distance calls to the Chatter Bug VoIP service without the need
of a separate or high speed internet provider.
Routes calls using a VoIP network to terminate calls to normal telephones.
DHN 9-18-07 5
Downstream QoS (see Qos) is a technology innovation that enhances VoIP calls by
improving the clarity of incoming voice. When data floods the caller's downstream
Internet-access line, Downstream QoS can throttle incoming data to ensure time-sensitive
voice traffic gets through promptly to the listener. One notable version of the technology
was pioneered by Patton Electronics Co. for their SmartNode brand of VoIP equipment.
Other versions can be found in VoIP equipment from Cisco Systems and application
software from Packeteer.
Standard Upstream QoS. Upstream QoS improves voice quality for the remote side of a
VoIP call by transmitting upstream voice traffic at a higher priority than data. Applying
upstream QoS at both ends of the call improves voice quality for both users. Yet
upstream QoS still leaves the caller vulnerable to downstream data surges. During large
file downloads voice quality may degrade.
Some Internet providers may offer services similar to Downstream QoS as a higher-
priced, premium service. VoIP systems that combine upstream and downstream QoS
mechanisms, reduce the user's dependence on the service provider while providing local
control over total voice quality.
Retrieved from "http://en.wikipedia.org/wiki/Downstream_QoS"
DHN 9-18-07 6
The IP Multimedia Subsystem (IMS) is an architectural framework for delivering
internet protocol (IP) multimedia to mobile users. It was originally designed by the
wireless standards body 3rd Generation Partnership Project (3GPP), and is part of the
vision for evolving mobile networks beyond GSM. Its original formulation (3GPP R5)
represented an approach to delivering "Internet services" over GPRS. This vision was
later updated by 3GPP, 3GPP2 and TISPAN by requiring support of networks other than
GPRS, such as Wireless LAN, CDMA2000 and fixed line.
To ease the integration with the Internet, IMS as far as possible uses IETF (i.e. Internet)
protocols such as Session Initiation Protocol (SIP). According to the 3GPP, IMS is not
intended to standardize applications itself but to aid the access of multimedia and voice
applications across wireless and wireline terminals, i.e. aid a form of fixed mobile
convergence (FMC). This is done by having a horizontal control layer that isolates the
access network from the service layer. Services need not have their own control
functions, as the control layer is a common horizontal layer.
Alternative and overlapping technologies for access and provision of services across
wired and wireless networks depend on the actual requirements, and include
combinations of Generic Access Network, soft switches and "naked" SIP. This makes the
business use of IMS less appealing. It is easier to sell services than to sell the virtues of
"integrated services". But, services for IMS have not been prolific.
Since IMS was conceived years ago, it is becoming increasingly easier to access content
and contacts using mechanisms outside the control of traditional wireless/fixed operators,
and so those operators are likely to reconsider their strategies . Although it is expected
that eventually IP will be available on all mobile phones and operators, it is not clear how
much of the 3GPP/3GPP2/TISPAN IMS as it exists today will be deployed. "Early IMS"
might be used in IMS implementations that do not yet support all "Full IMS"
requirements, although it's not clearly defined what differences there might be (IPv4
support instead of IPv6 is often mentioned).
DHN 9-18-07 7
• 1 History
• 2 Architecture
o 2.1 Access network
o 2.2 Core network
2.2.1 Home subscriber server
184.108.40.206 User identities
2.2.2 Call/session control
2.2.3 Application servers
2.2.4 Media Servers
2.2.5 Breakout Gateway
2.2.6 PSTN Gateways
2.2.7 Media Resources
o 2.3 Charging
o 2.4 Interfaces description
• 3 Security aspects of early IMS systems
• 4 Specifications
o 4.1 3GPP Specs
o 4.2 IETF Specs
• 5 See also
• 6 References
• 7 External links
• 8 Books
• IMS was originally defined by an industry forum called 3G.IP, formed in 1999.
3G.IP developed the initial IMS architecture, which was brought to the 3rd
Generation Partnership Project (3GPP), as part of their standardization work for
3G mobile phone systems in UMTS networks. It first appeared in release 5
(evolution from 2G to 3G networks), when SIP-based multimedia was added.
Support for the older GSM and GPRS networks was also provided.
• 3GPP2 (a different organization) based their CDMA2000 Multimedia Domain
(MMD) on 3GPP IMS, adding support for CDMA2000.
• 3GPP release 6 added interworking with WLAN.
• 3GPP release 7 added support for fixed networks, by working together with
TISPAN release R1.1.
DHN 9-18-07 8
3GPP / TISPAN IMS Architectural Overview
The IP Multimedia Core Network Subsystem is a collection of different functions, linked
by standardized interfaces, which grouped form one IMS administrative network. A
function is not a node (hardware box): an implementer is free to combine 2 functions in 1
node, or to split a single function into 2 or more nodes. Each node can also be present
multiple times in a single network, for load balancing or organizational issues.
The user can connect to an IMS network in various ways, all of which use the standard
Internet Protocol (IP). Direct IMS terminals (such as mobile phones, personal digital
assistants (PDAs) and computers) can register directly on an IMS network, even when
they are roaming in another network or country (the visited network). The only
requirement is that they can use IPv6 (also IPv4 in early IMS) and run Session Initiation
Protocol (SIP) user agents. Fixed access (e.g., Digital Subscriber Line (DSL), cable
modems, Ethernet), mobile access (e.g. W-CDMA, CDMA2000, GSM, GPRS) and
wireless access (e.g. WLAN, WiMAX) are all supported. Other phone systems like plain
old telephone service (POTS -- the old analogue telephones), H.323 and non IMS-
compatible VoIP systems, are supported through gateways.
Home subscriber server
The Home Subscriber Server (HSS), or User Profile Server Function (UPSF), is a master
user database that supports the IMS network entities that actually handle calls. It contains
the subscription-related information (user profiles), performs authentication and
authorization of the user, and can provide information about the user's physical location.
It is similar to the GSM Home Location Register (HLR) and Authentication Centre
An SLF (Subscriber Location Function) is needed to map user addresses when multiple
HSSs are used. Both the HSS and the SLF communicate through the DIAMETER
DHN 9-18-07 9
Normal 3GPP networks use the following identities:
• International Mobile Subscriber Identity (IMSI)
• Temporary Mobile Subscriber Identity (TMSI)
• International Mobile Equipment Identity (IMEI)
• Mobile Subscriber ISDN Number (MSISDN)
IMSI is a unique phone identity that is stored in the SIM. To improve privacy, a TMSI is
generated per geographical location. While IMSI/TMSI are used for user identification,
the IMEI is a unique device identity and is phone specific. The MSISDN is the telephone
number of a user.
IMS also requires IP Multimedia Private Identity (IMPI) and IP Multimedia Public
Identity (IMPU). Both are not phone numbers or other series of digits, but Uniform
Resource Identifier (URIs), that can be digits (a tel-uri, like tel:+1-555-123-4567) or
alphanumeric identifiers (a sip-uri, like sip:firstname.lastname@example.org). There can be
multiple IMPU per IMPI (often a tel-uri and a sip-uri). The IMPU can also be shared with
another phone, so both can be reached with the same identity (for example, a single
phone-number for an entire family).
The HSS user database contains, the IMPU, IMPI, IMSI, and MSISDN and other
DHN 9-18-07 10
Several roles of Session Initiation Protocol (SIP) servers or proxies, collectively called
Call Session Control Function (CSCF), are used to process SIP signalling packets in the
• A Proxy-CSCF (P-CSCF) is a SIP proxy that is the first point of contact for the
IMS terminal. It can be located either in the visited network (in full IMS
networks) or in the home network (when the visited network isn't IMS compliant
yet). Some networks may use a Session Border Controller for this function. The
terminal discovers its P-CSCF with either DHCP, or it is assigned in the PDP
Context (in General Packet Radio Service (GPRS).
o it is assigned to an IMS terminal during registration, and does not change
for the duration of the registration
o it sits on the path of all signalling messages, and can inspect every
o it authenticates the user and establishes an IPsec security association with
the IMS terminal. This prevents spoofing attacks and replay attacks and
protects the privacy of the user. Other nodes trust the P-CSCF, and do not
have to authenticate the user again.
o it can also compress and decompress SIP messages using SigComp, which
reduces the round-trip over slow radio links
o it may include a Policy Decision Function (PDF), which authorizes media
plane resources e.g. quality of service (QoS) over the media plane. It's
used for policy control, bandwidth management, etc. The PDF can also be
a separate function.
o it also generates charging records
• A Serving-CSCF (S-CSCF) is the central node of the signalling plane. It is a SIP
server, but performs session control too. It is always located in the home network.
It uses DIAMETER Cx and Dx interfaces to the HSS to download and upload
user profiles — it has no local storage of the user. All necessary information is
loaded from the HSS.
o it handles SIP registrations, which allows it to bind the user location (e.g.
the IP address of the terminal) and the SIP address
o it sits on the path of all signaling messages, and can inspect every message
o it decides to which application server(s) the SIP message will be
forwarded, in order to provide their services
o it provides routing services, typically using Electronic Numbering
o it enforces the policy of the network operator
o there can be multiple S-CSCFs in the network for load distribution and
high availability reasons. It's the HSS that assigns the S-CSCF to a user,
when it's queried by the I-CSCF.
DHN 9-18-07 11
• An I-CSCF (Interrogating-CSCF) is another SIP function located at the edge of
an administrative domain. Its IP address is published in the Domain Name System
(DNS) of the domain (using NAPTR and SRV type of DNS records), so that
remote servers can find it, and use it as a forwarding point (e.g. registering) for
SIP packets to this domain. The I-CSCF queries the HSS using the DIAMETER
Cx interface to retrieve the user location (Dx interface is used from I-CSCF to
SLF to locate the needed HSS only), and then routes the SIP request to its
assigned S-CSCF. Up to Release 6 it can also be used to hide the internal network
from the outside world (encrypting part of the SIP message), in which case it's
called a THIG (Topology Hiding Inter-network Gateway). From Release 7
onwards this "entry point" function is removed from the I-CSCF and is now part
of the IBCF (Interconnection Border Control Function). The IBCF is used as
gateway to external networks, and provides NAT and Firewall functions
Application servers (AS) host and execute services, and interface with the S-CSCF using
Session Initiation Protocol (SIP). An example of an application server that is being
developed in 3GPP is the Voice call continuity Function (VCC Server). Depending on the
actual service, the AS can operate in SIP proxy mode, SIP UA (user agent) mode or SIP
B2BUA (back-to-back user agent) mode. An AS can be located in the home network or
in an external third-party network. If located in the home network, it can query the HSS
with the DIAMETER Sh interface (for a SIP-AS) or the Mobile Application Part (MAP)
interface (for IM-SSF).
• SIP AS: native IMS application server
• IM-SSF: an IP Multimedia Service Switching Function interfaces with
Customized Applications for Mobile networks Enhanced Logic (CAMEL)
Application Servers using Camel Application Part (CAP)
The MRF (Media Resource Function) provides media related functions such as media
manipulation (e.g. voice stream mixing) and playing of tones and announcements.
Each MRF is further divided into a Media Resource Function Controller (MRFC) and a
Media Resource Function Processor (MRFP).
• The MRFC is a signalling plane node that acts as a SIP User Agent to the S-
CSCF, and which controls the MRFP with a H.248 interface
• The MRFP is a media plane node that implements all media-related functions.
DHN 9-18-07 12
A BGCF (Breakout Gateway Control Function) is a SIP server that includes routing
functionality based on telephone numbers. It is only used when calling from the IMS to a
phone in a circuit switched network, such as the Public Switched Telephone Network
(PSTN) or the Public land mobile network (PLMN).
A PSTN/CS gateway interfaces with PSTN circuit switched (CS) networks. For
signalling, CS networks use ISDN User Part (ISUP) (or BICC) over Message Transfer
Part (MTP), while IMS uses Session Initiation Protocol (SIP) over IP. For media, CS
networks use Pulse-code modulation (PCM), while IMS uses Real-time Transport
• A Signalling Gateway (SGW) interfaces with the signalling plane of the CS. It
transforms lower layer protocols as Stream Control Transmission Protocol (SCTP,
an Internet Protocol (IP) protocol) into Message Transfer Part (MTP, an
Signalling System 7 (SS7) protocol), to pass ISDN User Part (ISUP) from the
MGCF to the CS network.
• A Media Gateway Controller Function (MGCF) does call control protocol
conversion between SIP and ISUP and interfaces with the SGW over SCTP. It
also controls the resources in an MGW with an H.248 interface.
• A Media Gateway (MGW) interfaces with the media plane of the CS network, by
converting between RTP and PCM. It can also transcode when the codecs don't
match (e.g. IMS might use AMR, PSTN might use G.711).
Media Resources are those components that operate on the media plane and are under the
control of IMS Core functions. Specifically, Media Server (MS) and Media gateway
DHN 9-18-07 13
Offline charging is applied to users who pay for their services periodically (e.g., at the
end of the month). Online charging, also known as credit-based charging, is used for
prepaid services, or real-time credit control of postpaid services. Both may be applied to
the same session.
• Offline Charging : All the SIP network entities (P-CSCF, I-CSCF, S-CSCF,
BGCF, MRFC, MGCF, AS) involved in the session use the DIAMETER Rf
interface to send accounting information to a CCF (Charging Collector Function)
located in the same domain. The CCF will collect all this information, and build a
CDR (Call Detail Record), which is sent to the billing system (BS) of the domain.
Each session carries an ICID (IMS Charging Identifier) as a unique identifier. IOI
(Inter Operator Identifier) parameters define the originating and terminating
Each domain has its own charging network. Billing systems in different domains
will also exchange information, so that roaming charges can be applied.
• Online charging: The S-CSCF talks to an SCF (Session Charging Function)
which looks like a regular SIP application server. The SCF can signal the S-CSCF
to terminate the session when the user runs out of credits during a session. The AS
and MRFC use the DIAMETER Ro interface towards an ECF (Event Charging
o When IEC (Immediate Event Charging) is used, a number of credit units
are immediately deducted from the user's account by the ECF and the
MRFC or AS is then authorized to provide the service. The service is not
authorized when not enough credit units are available.
o When ECUR (Event Charging with Unit Reservation) is used, the ECF
first reserves a number of credit units in the user's account and then
authorizes the MRFC or the AS. After the service is over, the number of
spent credit units is reported and deducted from the account; the reserved
credit units are then cleared.
DHN 9-18-07 14
IMS entities Description Protocol
Used by MRFC to fetch documents
Cr MRFC, AS dedicated
(scripts and other resources) from an AS
I-CSCF, S- Used to communicate between I-CSCF/
CSCF, HSS S-CSCF and HSS
SIP AS, OSA,
Used by AS to find a correct HSS in a
Dh SCF, IM-SSF, DIAMETER
I-CSCF, S- Used by I-CSCF/S-CSCF to find a
CSCF, SLF correct HSS in a multi-HSS environment
Used to exchange messages between UE
Gm UE, P-CSCF SIP
Allows operators to control QoS in a
user plane and exchange charging
Go PDF, GGSN DIAMETER
correlation information between IMS
and GPRS network
Used to exchange policy decisions-
Gq P-CSCF, PDF related information between P-CSCF DIAMETER
S-CSCF, I- Used to exchange messages between
CSCF, AS CSCF and AS
Used to directly forward SIP requests
Ma I-CSCF -> AS which are destinated to a Public Service SIP
Identity hosted by the AS
MGCF converts ISUP signalling to SIP
MGCF -> I-
Mg signalling and forwards SIP signalling to SIP
S-CSCF -> Used to exchange messages between S-
BGCF CSCF and BGCF
Mj BGCF -> Used to exchange messages between SIP
MGCF BGCF and MGCF in the same IMS
DHN 9-18-07 15
BGCF -> Used to exchange messages between
BGCF BGCFs in different IMS networks
Used for exchanging messages between
Mm CSCF, external Not specified
IMS and external IP networks
Mn Allows control of user-plane resources H.248
Used to exchange messages between
Mp MRFC, MRFP H.248
MRFC and MRFP
S-CSCF, Used to exchange messages between S-
MRFC CSCF and MRFC
P-CSCF, I- Used to exchange messages between
CSCF, S-CSCF CSCFs
SIP AS, OSA Used to exchange information between
SCS, HSS SIP AS/OSA SCS and HSS
Used to exchange information between
Si IM-SSF, HSS MAP
IM-SSF and HSS
Used by MRFC to fetch documents
Sr MRFC, AS HTTP
(scripts and other resources) from an AS
UE, AS (SIP
Enables UE to manage information
Ut AS, OSA SCS, HTTP(s)
related to his services
Security aspects of early IMS systems
It is envisaged that security defined in TS 33.203 may not be available for a while
especially because of the lack of USIM/ISIM interfaces and prevalence of devices that
support IPv4. For this situation, to provide some protection against the most significant
threats, 3GPP defines some security mechanisms, which are informally known as "early
IMS security", in TR33.978.
They can be downloaded from http://www.3gpp.org/specs/numbering.htm . The list
below is a small selection.
DHN 9-18-07 16
• TR 21.905 Vocabulary for 3GPP Specifications
• TS 22.066 Support of Mobile Number Portability (MNP); Stage 1
• TS 22.101 Service Aspects; Service Principles
• TS 22.141 Presence Service; Stage 1
• TS 22.228 Service requirements for the IP multimedia core network subsystem;
• TS 22.250 IMS Group Management; Stage 1
• TS 22.340 IMS Messaging; Stage 1
• TR 22.800 IMS Subscription and access scenarios
• TS 23.002 Network Architecture
• TS 23.003 Numbering, Addressing and Identification
• TS 23.008 Organization of Subscriber Data
• TS 23.107 Quality of Service (QoS) principles
• TS 23.125 Overall high level functionality and architecture impacts of flow based
charging; Stage 2
• TS 23.141 Presence Service; Architecture and functional description; Stage 2
• TS 23.167 IMS emergency sessions
• TS 23.207 End-to-end QoS concept and architecture
• TS 23.218 IMS session handling; IM call model; Stage 2
• TS 23.221 Architectural Requirements
• TS 23.228 IMS stage 2
• TS 23.234 WLAN interworking
• TS 23.271 Location Services (LCS); Functional description; Stage 2
• TS 23.278 Customized Applications for Mobile network Enhanced Logic
(CAMEL) - IMS interworking; Stage 2
• TR 23.864 Commonality and interoperability between IMS core networks
• TR 23.867 IMS emergency sessions
• TR 23.917 Dynamic policy control enhancements for end-to-end QoS, Feasibility
• TR 23.979 3GPP enablers for Push-to-Talk over Cellular (PoC) services; Stage 2
• TR 23.981 Interworking aspects and migration scenarios for IPv4-based IMS
implementations (early IMS)
• TS 24.141 Presence Service using the IMS Core Network subsystem; Stage 3
• TS 24.147 Conferencing using the IMS Core Network subsystem
• TS 24.228 Signalling flows for the IMS call control based on SIP and SDP; Stage
• TS 24.229 IMS call control protocol based on SIP and SDP; Stage 3
• TS 24.247 Messaging using the IMS Core Network subsystem; Stage 3
• TS 26.235 Packet switched conversational multimedia applications; Default
DHN 9-18-07 17
• TS 26.236 Packet switched conversational multimedia applications; Transport
• TS 29.162 Interworking between the IMS and IP networks
• TS 29.163 Interworking between the IMS and Circuit Switched (CS) networks
• TS 29.198 Open Service Architecture (OSA)
• TS 29.207 Policy control over Go interface
• TS 29.208 End-to-end QoS signalling flows
• TS 29.209 Policy control over Gq interface
• TS 29.228 IMS Cx and Dx interfaces : signalling flows and message contents
• TS 29.229 IMS Cx and Dx interfaces based on the Diameter protocol; Protocol
• TS 29.278 CAMEL Application Part (CAP) specification for IMS
• TS 29.328 IMS Sh interface : signalling flows and message content
• TS 29.329 IMS Sh interface based on the Diameter protocol; Protocol details
• TR 29.962 Signalling interworking between the 3GPP SIP profile and non-3GPP
• TS 31.103 Characteristics of the IMS Identity Module (ISIM) application
• TS 32.240 Telecommunication management; Charging management; Charging
architecture and Principles
• TS 32.260 Telecommunication management; Charging management; IMS
• TS 32.299 Telecommunication management; Charging management; Diameter
• TS 32.421 Telecommunication management; Subscriber and equipment trace:
Trace concepts and requirements
• TS 33.102 3G security; Security architecture
• TS 33.108 3G security; Handover interface for Lawful Interception (LI)
• TS 33.141 Presence service; security
• TS 33.203 3G security; Access security for IP-based services
• TS 33.210 3G security; Network Domain Security (NDS); IP network layer
• TR 33.978 Security aspects of early IP Multimedia Subsystem (IMS)
• RFC 2327 Session Description Protocol (SDP)
• RFC 2748 Common Open Policy Server protocol (COPS)
• RFC 2782 a DNS RR for specifying the location of services (SRV)
• RFC 2806 URLs for telephone calls (TEL)
• RFC 2915 the naming authority pointer DNS resource record (NAPTR)
• RFC 2916 E.164 number and DNS
• RFC 3087 Control of Service Context using SIP Request-URI
DHN 9-18-07 18
• RFC 3261 Session Initiation Protocol (SIP)
• RFC 3262 reliability of provisional responses (PRACK)
• RFC 3263 locating SIP servers
• RFC 3264 an offer/answer model with the Session Description Protocol
• RFC 3265 SIP-Specific Event Notification
• RFC 3310 HTTP Digest Authentication using Authentication and Key Agreement
• RFC 3311 update method
• RFC 3312 integration of resource management and SIP
• RFC 3319 DHCPv6 options for SIP servers
• RFC 3320 signalling compression (SigComp)
• RFC 3323 a privacy mechanism for SIP
• RFC 3324 short term requirements for network asserted identity
• RFC 3325 private extensions to SIP for asserted identity within trusted networks
• RFC 3326 the reason header field
• RFC 3327 extension header field for registering non-adjacent contacts (path
• RFC 3329 security mechanism agreement
• RFC 3420 Internet Media Type message/sipfrag
• RFC 3428 SIP Extension for Instant Messaging
• RFC 3455 private header extensions to SIP for 3GPP
• RFC 3485 SIP and SDP static dictionary for signaling compression
• RFC 3515 the SIP REFER method
• RFC 3550 Real-time Transport Protocol (RTP)
• RFC 3574 Transition Scenarios for 3GPP Networks
• RFC 3588 DIAMETER base protocol
• RFC 3589 DIAMETER command codes for 3GPP release 5 (informational)
• RFC 3608 extension header field for service route discovery during registration
• RFC 3665 SIP Basic Call Flow Examples
• RFC 3680 SIP event package for registrations
• RFC 3725 best current practices for Third Party Call Control (3pcc) in SIP
• RFC 3824 using E164 numbers with SIP
• RFC 3840 indicating user Agent Capabilities in SIP
• RFC 3841 caller preferences for SIP
• RFC 3842 SIP event package for message waiting indication and summary
• RFC 3856 SIP event package for presence
• RFC 3857 SIP event template-package for watcher info
• RFC 3858 XML based format for watcher information
• RFC 3891 the SIP Replaces Header
• RFC 3903 SIP Extension for Event State Publication
• RFC 3911 the SIP Join Header
DHN 9-18-07 19
• RFC 4028 session timers in SIP
• RFC 4235 an INVITE-Initiated dialog event package for SIP
• RFC 4475 Session Initiation Protocol (SIP) Torture Test Messages
DHN 9-18-07 20
An ITSP (Internet Telephony Service Provider) offers an Internet data service for making
telephone calls using VoIP (Voice over IP) technology. Most ITSPs use SIP, H.323, or
IAX (although H.323 use is declining) for transmitting telephone calls as IP data packets.
Customers may use traditional telephones with an analog telephony adapter (ATA)
providing RJ11 to Ethernet connection.
In the United States, net2Phone began offering consumer VoIP service in 1995.
Before 2003, many VoIP services required customers to make and receive phone calls
through a personal computer on a LAN.
ITSPs are also known as VSP (Voice Service Provider) or simply VoIP Providers.
DHN 9-18-07 21
Lawful interception (aka wiretapping) is the interception of telecommunications by law
enforcement authorities (LEA's) and intelligence services, in accordance with local law
and after following due process and receiving proper authorization from competent
With the existing Public Switched Telephone Network (PSTN), Wireless, and Cable
Systems, Lawful Interception (LI) is generally performed by accessing the digital
switches supporting the target's calls in response to a warrant from a Law Enforcement
Agency (LEA). However, mobile phone and Voice over IP (VoIP) technologies have
enabled the mobility of the end-user, which have introduced new challenges.
Whilst the detailed requirements for LI differ from one jurisdiction to another, the general
requirements are the same. The LI system must provide transparent interception of
specified traffic only, and the subject must not be aware of the interception. The service
provided to other users must not be affected during interception.
DHN 9-18-07 22
• 1 Technical description
• 2 Laws
o 2.1 United States of America
o 2.2 Europe
o 2.3 Elsewhere
• 3 Illegal Use
• 4 References
• 5 See also
• 6 External links
DHN 9-18-07 23
Almost all countries have LI requirements and have adopted global LI requirements and
standards developed by the European Telecommunications Standards Institute (ETSI)
organization. In the USA, the requirements are governed by the Communications
Assistance for Law Enforcement Act (CALEA). For an overview of laws and standards,
see the Global LI Industry Forum site.
In order to prevent investigations being compromised, LI systems may be designed in a
manner that hides the interception from the telecommunications operator concerned. This
is a requirement in some jurisdictions.
To ensure systematic procedures for carrying out interception, while also lowering the
costs of interception solutions, industry groups and government agencies worldwide have
attempted to standardize the technical processes behind lawful interception. One
organization, ETSI, has been a major driver in lawful interception standards not only for
Europe, but worldwide. The following figure provides a generalized view of the lawful
interception architecture as proposed by ETSI:
DHN 9-18-07 24
This architecture attempts to define a systematic and extensible means by which network
operators and law enforcement agents (LEAs) can interact, especially as networks grow
in sophistication and scope of services. Note this architecture applies to not only
“traditional” wireline and wireless voice calls, but to IP-based services such as Voice
over IP, email, instant messaging, etc. The architecture is now applied worldwide (in
some cases with slight variations in terminology), including in the United States in the
context of CALEA conformance. Three stages are called for in the architecture: 1)
collection where target-related “call” data and content are extracted from the network; 2)
mediation where the data is formatted to conform to specific standards; and 3) delivery of
the data and content to the law enforcement agency (LEA).
DHN 9-18-07 25
The call data (known as Intercept Related Information or IRI in Europe and Call Data or
CD in the US) consists of information about the targeted communications, including
destination of a voice call (e.g., called party’s telephone number), source of a call
(caller’s phone number), time of the call, duration, etc. Call content is namely the stream
of data carrying the call. Included in the architecture is the lawful interception
management function, which covers interception session set-up and tear down,
scheduling, target identification, etc. Communications between the network operator and
LEA are via the Handover Interfaces (designated HI). Communications data and content
are typically delivered from the network operator to the LEA in an encrypted format over
an IP-based VPN. The interception of traditional voice calls still often relies on the
establishment of an ISDN channel that is set up at the time of the interception.
As stated above, the ETSI architecture is equally applicable to IP-based services where
IRI (or CD) is dependent on parameters associated with the traffic from a given
application to be intercepted. For example, in the case of email IRI would be similar to
the header information on an email message (e.g., destination email address, source email
address, time email was transmitted) as well as pertinent header information within the IP
packets conveying the message (e.g., source IP address of email server originating the
email message). Of course, more in-depth information would be obtained by the
interception system so as to avoid the usual email address spoofing that often takes place
(e.g., spoofing of source address). Voice-over-IP likewise has its own IRI, including data
derived from Session Initiation Protocol (SIP) messages that are used to set up and tear
down a VOIP call.
USA interception standards that help network operators and service providers conform to
CALEA are mainly those specified by the CableLabs, the Alliance for
Telecommunications Industry Solutions (ATIS), and the TIA. TIA's standards include J-
STD-025B which updates the earlier J-STD-025A to include packetized voice and
CDMA wireless interception, although it has recently been challenged as "deficient" by
the U.S. Dept of Justice. Generic global standards have also been developed by the
[http://www.ietf.org Internet Engineering Task Force (IETF) that provides a front-end
means of supporting most LI handover standards. Although the terms are different, the
concepts behind the interception architecture resemble those formulated under ETSI.
More recent standards address packetized voice and data (e.g., ETSI TS102232 et seq,
ATIS T1.678, T1.IAS) and interception for PacketCable. Interception standardization
efforts for wireless networks are primarily overseen by the Third Generation Partnership
Various countries have different rules with regards to lawful interception. In the United
Kingdom the law is known as RIPA (Regulation of Investigatory Powers Act), in United
DHN 9-18-07 26
States there is an array of federal and state criminal law, in Commonwealth of
Independent States countries as SORM. A subset of LI law deals with the the ability of
communication providers to support interception handovers.
United States of America
In the United States, two laws cover most of the governance of lawful interception. The
1968 Omnibus Crime Control and Safe Streets Act, Title III pertains mainly to lawful
interception criminal investigations. The second law, the 1978 Foreign Intelligence
Surveillance Act, or FISA, governs wiretapping for intelligence purposes where the
subject of the investigation must be a foreign (non-US) national or a person working as
an agent on behalf of a foreign country. Most of the congressionally mandated wiretap
records indicate that the cases are related to illegal drug distribution, with cell phones as
the dominant form of intercepted communication.
During the 1990s, to help law enforcement and the FBI more effectively carry out
wiretap operations, especially in view of the emerging digital voice and wireless
networks at the time, the US Congress passed CALEA in 1994 . This act provides broad
guidelines to network operators on how to assist the LEAs in setting up interceptions and
the types of data to be delivered. CALEA does not, as many believe, provide specific
implementation directives on interception. More recently, the US Federal
Communications Commission (FCC) mandated that CALEA be extended to include
interception of publicly-available broadband networks and Voice over IP services that are
interconnected to the Public Switched Telephone Network (PSTN).
As a response to the terrorist events of 9/11, the US Congress incorporated various
provisions related to enhanced electronic surveillance in the “Uniting and Strengthening
America by Providing Appropriate Tools Required to Intercept and Obstruct Terrorism”
Act (USA Patriot Act). These wiretap provisions are mainly updates to those expressed
under the FISA law.
In the European Union, the European Council Resolution of 17 January 1995 on the
Lawful Interception of Telecommunications (Official Journal C 329) mandated similar
DHN 9-18-07 27
measures to CALEA on a pan-European basis. Although some EU member countries
reluctantly accepted this resolution out of privacy concerns (which are more pronounced
in Europe than the US), there appears now to be general agreement with the resolution.
Interestingly enough, interception mandates in Europe are generally more rigorous than
those of the US; for example, both voice and ISP public network operators in the
Netherlands have been required to support interception capabilities for years.
Most countries worldwide maintain LI requirements similar to those in the US and
Europe, and have moved to the ETSI handover standards. The, for example, collaboration
through the numerous ISS World forums.
As with many law enforcement tools, LI systems may be subverted for illicit purposes.
This occurred in Greece during the 2004 Olympics, the telephone operator concerned was
fined US$1,000,000 in 2006 for failing to secure it's systems against hacking.
1. ^ http://www.askcalea.com
2. ^ http://europa.eu.int/eur-lex/lex/LexUriServ/LexUriServ.do?
3. ^ http://news.bbc.co.uk/1/hi/business/6182647.stm
DHN 9-18-07 28
• Handover Interface for the Lawful Interception of Telecommunications Traffic,
ETSI ES-201-671, under Lawful Interception, Telecommunications Security,
version 3.1.1, May 2007.
• Handover Specification for IP delivery, ETSI TS-102-232-1, under Lawful
Interception, Telecommunications Security, version 2.1.1, December 2006.
• Lawfully Authorized Electronic Surveillance, T1P1/T1S1 joint standard,
document number J-STD-025B, December 2003.
• 3rd Generation Partnership Project, Technical Specification 3GPP TS 33.106
V5.1.0 (2002-09), “Lawful Interception Requirements (Release 5),” September
• 3rd Generation Partnership Project, Technical Specification 3GPP TS 33.107
V6.0.0 (2003-09), “Lawful interception architecture and functions (Release 6),”
• 3rd Generation Partnership Project, Technical Specification 3GPP TS 33.108
V6.3.0 (2003-09), “Handover interface for Lawful Interception (Release 6),”
• PacketCable Electronic Surveillance Specification, PKT-SP-ESP-I03-040113,
Cable Television Laboratories Inc., 13 January 2004.
• T1.678, Lawfully Authorized Electronic Surveillance (LAES) for Voice over
Packet Technologies in Wireline Telecommunications Networks.
DHN 9-18-07 29
In computing, Media Gateway Control Protocol (MGCP) is a protocol used within a
distributed Voice over IP system.
MGCP is defined in RFC 3435, which obsoletes an earlier definition in RFC 2705. It
superseded the Simple Gateway Control Protocol (SGCP).
Another protocol for the same purpose is Megaco, a co-production of IETF (RFC 3525)
and ITU (Recommendation H.248-1). Both protocols follow the guidelines of the API
Media Gateway Control Protocol Architecture and Requirements at RFC 2805.
• 1 Architecture
• 2 Protocol Overview
• 3 Implementations
• 4 RFCs
• 5 See also
The distributed system is composed of a Call Agent (or Media Gateway Controller), at
least one Media Gateway (MG) that performs the conversion of media signals between
circuits and packets, and at least one Signaling Gateway (SG) when connected to the
The Call Agent uses MGCP to tell the Media Gateway:
• what events should be reported to the Call Agent
• how endpoints should be connected together
• what signals should be played on endpoints.
MGCP also allows the Call Agent to audit the current state of endpoints on a Media
The Media Gateway uses MGCP to report events (such as off-hook, or dialed digits) to
the Call Agent.
(While any Signaling Gateway is usually on the same physical switch as a Media
Gateway, this needn't be so. The Call Agent does not use MGCP to control the Signaling
Gateway; rather, SIGTRAN protocols are used to backhaul signaling between the
Signaling Gateway and Call Agent).
DHN 9-18-07 30
In MGCP, every command has a transaction ID and receives a response.
Typically, a Media Gateway is configured with a list of Call Agents from which it may
accept programming (where that list normally comprises only one or two Call Agents). In
principle, event notifications may be sent to different Call Agents for each endpoint on
the gateway (as programmed by the Call Agents, by setting the NotifiedEntity
parameter). In practice however, it is usually desirable that at any given moment all
endpoints on a gateway should be controlled by the same Call Agent; other Call Agents
are available only to provide redundancy in the event that the primary Call Agent fails, or
loses contact with the Media Gateway. In the event of such a failure it is the backup Call
Agent's responsibility to reprogram the MG so that the gateway comes under the control
of the backup Call Agent. Care is needed in such cases; two Call Agents may know that
they have lost contact with one another, but this does not guarantee that they are not both
attempting to control the same gateway. The ability to audit the gateway to determine
which Call Agent is currently controlling can be used to resolve such conflicts.
MGCP assumes that the multiple Call Agents will maintain knowledge of device state
among themselves (presumably with an unspecified protocol) or rebuild it if necessary (in
the face of catastrophic failure). Its failover features take into account both planned and
MGCP packets are unlike what you find in many other protocols. Usually wrapped in
UDP port 2427, the MGCP datagrams are formatted with whitespace, much like you
would expect to find in TCP protocols. An MGCP packet is either a command or a
Commands begin with a four-letter verb. Responses begin with a three number response
There are eight (8) command verbs:
AUEP, AUCX, CRCX, DLCX, MDCX, NTFY, RQNT, RSIP
Two verbs are used by a Call Agent to query (the state of) a Media Gateway:
AUEP - Audit Endpoint
AUCX - Audit Connection
DHN 9-18-07 31
Three verbs are used by a Call Agent to manage an RTP connection on a Media Gateway
(a Media Gateway can also send a DLCX when it needs to delete a connection for its
CRCX - Create Connection
DLCX - Delete Connection
MDCX - Modify Connection
One verb is used by a Call Agent to request notification of events on the Media Gateway,
and to request a Media Gateway to apply signals:
RQNT - Request for Notification
One verb is used by a Media Gateway to indicate to the Call Agent that it has detected an
event for which the Call Agent had previously requested notification of (via the RQNT
NTFY - Notify
One verb is used by a Media Gateway to indicate to the Call Agent that it is in the
process of restarting:
RSIP - Restart In Progress
• Vovida MGCP
• RFC 3435 - Media Gateway Control Protocol (MGCP) Version 1.0 (this
supersedes RFC 2705)
• RFC 3660 - Basic Media Gateway Control Protocol (MGCP) Packages
• RFC 3661 - Media Gateway Control Protocol (MGCP) Return Code Usage
• RFC 3064 - MGCP CAS Packages
• RFC 3149 - MGCP Business Phone Packages
• RFC 3991 - Media Gateway Control Protocol (MGCP) Redirect and Reset
• RFC 3992 - Media Gateway Control Protocol (MGCP) Lockstep State Reporting
• RFC 2805 - Media Gateway Control Protocol Architecture and Requirements
DHN 9-18-07 32
MGCP Information Site
MGCP was originally developed to address the need of scaling ingress and egress
gateways in order to meet the demands of service providers. MGCP utilizes SDP for
negotiating the media streams transmitted and received on the packet network, which
significantly reduces the interworking complexity between SIP-based media gateway
controllers (or "call agents") and media gateways. MGCP was published by the IETF as
Informational RFCs (shown on this page) and also standardized by the ITU-T and
adopted for use within cable networks. MGCP is widely deployed around the world.
Core Documents (IETF)
RFC 2705 Media Gateway Control Protocol 1.0 (obsolete)
RFC 3435 Media Gateway Control Protocol 1.0
RFC 3660 Basic Media Gateway Control Protocol Packages
RFC 3661 MGCP Return Code Usage
RFC 2897 Proposal for an MGCP Advanced Audio Package
RFC 3064 MGCP CAS Packages
RFC 3149 MGCP Business Phone Packages
RFC 3441 Asynchronous Transfer Mode (ATM) Package
RFC 3624 Bulk Audit Package
RFC 3991 Redirect and Reset Package
RFC 3992 Lockstep State Reporting Mechanism
Please also refer to these IANA pages:
MGCP Package Registry
MGCP LocalConnectionOptions Sub-registry
DHN 9-18-07 33
1.0 Specs Network-Based Call Signaling Protocol Specification (NCS)
PSTN Gateway Call Signaling Protocol Specification (TGCP)
NCS Signaling MIB Specification
NCS Basic Packages
... Complete List of PacketCable 1.0 Specs ...
1.5 Specs Network-Based Call Signaling Protocol Specification (NCS)
PSTN Gateway Call Signaling Protocol Specification (TGCP)
Audio Server Package
NCS Signaling MIB Specification
... Complete List of PacketCable 1.5 Specs ...
PacketCable Specifications Home Page
Network call signalling protocol for the delivery of time-critical services
over cable television networks using cable modems
J.169 IPCablecom network call signalling (NCS) MIB requirements
J.171 IPcablecom trunking gateway control protocol (TGCP)
J.175 Audio server protocol
SCTE Network Call Signaling Protocol for the Delivery of Time-Critical
24-3 Services over Cable Television Using Data Modems
SCTE IPCablecom Part 8: Network Call Signaling Management Information
24-8 Base (MIB) Requirements
IPCablecom Part 12: Trunking Gateway Control Protocol (TGCP)
DHN 9-18-07 34
Packet Loss Concealment (PLC) is a technique to mask the effects of packet loss in
VoIP communications. Because the voice signal is sent as packets on a VoIP network,
they may travel different routes to get to destination. At the receiver a packet might arrive
very late, corrupted or simply might not arrive. One of the situations in which the latter
could happen is where a packet is rejected by a server which has a full buffer and cannot
accept any more data. In a VoIP connection, error control techniques such as ARQ are
not feasible and the receiver should be able to cope with packet loss. Some of PLC
• zero insertion: the lost speech frames are replaced with zero
• waveform substitution: the missing gap is reconstructed by repeating a portion of
already received speech. The simplest form of this would be to repeat the last
received frame. Other techniques account for Fundamental frequency, gap
duration, etc. Waveform substitution methods are popular because of their
simplicity to understand and implement. An example of such algorithm is
proposed in ITU recommendation G.711 Appendix I.
• model based methods: increasing number of algorithms that take advantage of
speech models of interpolating and extrapolating speech gaps are being
introduced and developed.
DHN 9-18-07 35
PacketCable is a project started by CableLabs. The purpose of the organization is to
define standards for the Cable TV industry.
CableLabs leads this initiative for interoperable interface specifications in order to deliver
real-time multimedia services over two-way cable networks. Built on top of the
industry’s DOCSIS 1.1 (Data Over Cable Service Interface Specifications) cable modem
infrastructure, PacketCable networks use Internet Protocol (IP) to enable a wide range of
multimedia services, such as IP telephony, multimedia conferencing, interactive gaming,
and general multimedia applications. A DOCSIS 1.1 network with PacketCable
extensions enables cable operators to deliver data and voice traffic efficiently using a
single high-speed, quality-of-service (QoS)-enabled broadband (cable) architecture.
The PacketCable effort dates back to 1997 when cable operators identified the need for a
real-time multimedia architecture to support the delivery of advanced multimedia
services over the DOCSIS 1.1 architecture.
• 1 Technical overview
o 1.1 PacketCable interconnects 3 networks
o 1.2 PacketCable Protocols
o 1.3 PacketCable Voice Codecs per PacketCable Codec Specifications
o 1.4 PacketCable 1.0
o 1.5 PacketCable 1.5
o 1.6 PacketCable 2.0
• 2 Deployment
• 3 References
• 4 External links
• 5 Further reading
PacketCable interconnects 3 networks
• Hybrid Fibre Coaxial (HFC) Access Network
• Public Switched Telephone Network (PSTN)
• TCP/IP Managed IP Networks
DHN 9-18-07 36
• DOCSIS (Data Over Cable Service Interface Specification) - standard for data
over cable and details mostly the RF band
• Real-time Transport Protocol (RTP) & Real Time Control Protocol (RTCP)
required for media transfer
• PSTN Gateway Call Signaling Protocol Specification (TGCP) which is an MGCP
extension for Media Gateways
• Network-Based Call Signaling Protocol Specification (NCS) which is an MGCP
extension for analog residential Media Gateways - the NCS specification, which
is derived from the IETF MGCP RFC 2705, details VoIP signalling.
o Basically the IETF version is a subset of the NCS version. The Packet
Cable group has defined more messages and features than the IETF.
• Common Open Policy Service (COPS) for Quality of Service
PacketCable Voice Codecs per PacketCable Codec Specifications
o ITU G.711 (both µ-law and A-law versions) - for V1.0 & 1.5
o iLBC - for V1.5
o BV16 - for V1.5
o ITU G.728
o ITU G.729 Annex E
• PacketCable 1.0 comprises eleven specifications and six technical reports which
define the call signaling, Quality of Service (QoS), Codec, client provisioning,
billing event message collection, PSTN (Public Switched Telephone Network)
interconnection, and security interfaces necessary to implement a single-zone
PacketCable solution for residential Internet Protocol (IP) voice services.
• PacketCable 1.5 contains additional capabilities that do not exist in PacketCable
1.0, and superseded previous versions (1.1, 1.2, and 1.3).
• PacketCable 1.5 comprises 21 specifications and one technical report which
together define the call signaling, Quality of Service (QoS), Codec, client
provisioning, billing event message collection, PSTN (Public Switched Telephone
DHN 9-18-07 37
Network) interconnection, and security interfaces necessary to implement a
single-zone or multi-zone PacketCable solution for residential Internet Protocol
(IP) voice services.
• Version 2.0 of PacketCable will replace MGCP with SIP.
VoIP services based on PacketCable architecture are being widely deployed by operators:
• Videotron - "VoIP services" (Canada: Quebec)
• Time Warner - Digital Phone (System wide)
• Cablevision – Optimum Voice (System wide)
• Comcast - Comcast Digital Voice (System-wide)
• Cox – Cox Digital Telephone (System-wide)
• Charter (St. Louis, Wisconsin)
• Bright House Networks (Florida)
• Liberty Cablevision (Puerto Rico)
• GCI (Alaska)
• Shaw - "Shaw Digital Phone" (Canada: Calgary, Edmonton, Winnipeg and
• "BRAGATEL" / Bragatel (Braga, Portugal)
• "TVCABO" / PT Multimedia (Portugal)
• Rogers - Rogers Home Phone (Canada wide (Major cities and towns
serviceable with rogers high-speed internet are eligable, still expanding, St John's,
NL to Vancouver, BC Serviceable as of July 2007))
• Bresnan Communications - Bresnan Digital Phone (System wide)
• CableOne - "CableONE.net" (System wide)
• Casema - "Casema Telefonie" (The Netherlands)
DHN 9-18-07 38
• PacketCable™ 1.5 Specifications Audio/Video Codecs - PKT-SP-CODEC1.5-
• PacketCable™ 1.5 Specifications Network-Based Call Signaling Protocol - PKT-
SP-NCS1.5-I01-050128 (see external link for MGCP information)
• PSTN Gateway Call Signaling Protocol Specification - PKT-SP-TGCP1.5-
I01-050128 (see external link for MGCP information)
DHN 9-18-07 39
Purple minutes in internet communications refers to IP network traffic that has a value-
added component, e.g. voice, video etc.
DHN 9-18-07 40
The Real-time Transport Protocol (or RTP) defines a standardized packet format for
delivering audio and video over the Internet. It was developed by the Audio-Video
Transport Working Group of the IETF and first published in 1996 as RFC 1889 which
was made obsolete in 2003 by RFC 3550. Real time transport protocol can also be used
in conjunction with RSVP protocol which enhances the field of multimedia applications.
RTP does not have a standard TCP or UDP port on which it communicates. The only
standard that it obeys is that UDP communications are done via an even port and the next
higher odd port is used for RTP Control Protocol (RTCP) communications. Although
there are no standards assigned, RTP is generally configured to use ports 16384-32767.
RTP can carry any data with real-time characteristics, such as interactive audio and
video. Call setup and tear-down is usually performed by the SIP protocol. The fact that
RTP uses a dynamic port range makes it difficult for it to traverse firewalls. In order to
get around this problem, it is often necessary to set up a STUN server.
It was originally designed as a multicast protocol, but has since been applied in many
unicast applications. It is frequently used in streaming media systems (in conjunction
with RTSP) as well as videoconferencing and push to talk systems (in conjunction with
H.323 or SIP), making it the technical foundation of the Voice over IP industry. It goes
along with the RTCP and it's built on top of the User Datagram Protocol (UDP).
Applications using RTP are less sensitive to packet loss, but typically very sensitive to
delays, so UDP is a better choice than TCP for such applications.
According to RFC 1889, the services provided by RTP include:
• Payload-type identification - Indication of what kind of content is being carried
• Sequence numbering - PDU sequence number
• Time stamping - allow synchronization and jitter calculations
• Delivery monitoring
The protocols themselves do not provide mechanisms to ensure timely delivery. They
also do not give any Quality of Service (QoS) guarantees. These things have to be
provided by some other mechanism.
Also, out of order delivery is still possible, and flow and congestion control are not
supported directly. However, the protocols do deliver the necessary data to the
application to make sure it can put the received packets in the correct order. Also, RTCP
provides information about reception quality which the application can use to make local
adjustments. For example if a congestion is forming, the application could decide to
lower the data rate.
DHN 9-18-07 41
RTP was also published by the ITU-T as H.225.0, but later removed once the IETF had a
stable standards-track RFC published. It exists as an Internet Standard (STD 64) defined
in RFC 3550 (which obsoletes RFC 1889). RFC 3551 (STD 65) (which obsoletes RFC
1890) defines a specific profile for Audio and Video Conferences with Minimal Control.
RFC 3711 defines the Secure Real-time Transport Protocol (SRTP) profile (actually an
extension to RTP Profile for Audio and Video Conferences) which can be used
(optionally) to provide confidentiality, message authentication, and replay protection for
audio and video streams being delivered.
The position of RTP in the protocol stack is somewhat strange. It was decided to put RTP
in user space and have it (normally) run over UDP. It operates as follows. The
multimedia application consists of multiple audio, video, text, and possibly other streams.
These are fed into the RTP library, which is in user space along with the application. This
library then multiplexes the streams and encodes them in RTP packets, which it then
stuffs into a socket. At the other end of the socket (in the operating system kernel), UDP
packets are generated and embedded in IP packets. If the computer is on an Ethernet, the
IP packets are then put in Ethernet frames for transmission. As a consequence of this
design, it is a little hard to say which layer RTP is in. Since it runs in user space and is
linked to the application program, it certainly looks like an application protocol. On the
other hand, it is a generic, application-independent protocol that just provides transport
facilities, so it also looks like a transport protocol. Probably the best description is that it
is a transport protocol that is implemented in the application layer.
DHN 9-18-07 42
• 1 Packet structure
• 2 Potential further development of RTP & RTCP
• 3 Mathematical background
• 4 Structure of RTP/RTCP applications
• 5 See also
• 6 References
• 7 External links
o 7.1 RFCs
+ Bits 0-1 2 3 4-7 8 9-15 16-31
0 Ver. P X CC M PT Sequence Number
64 SSRC identifier
96 ... CSRC identifiers ...
96+(CC×32) Extension header (optional).
Ver. (2 bits) indicates the version of the protocol. Current version is 2. P (one bit) is used to indicate if there
are extra padding bytes at the end of the RTP packet. X (one bit) indicates if the extensions to the protocol
are being used in the packet. CC (four bits) contains the number of CSRC identifiers that follow the fixed
header. M (one bit) is used at the application level and is defined by a profile. If it's set, it means that the
current data has some special relevance for the application. PT (7 bits) indicates the format of the payload
and determines its interpretation by the application. SSRC indicates the synchronization source. The
optional (see X) extension's header indicates the length of the extension (EHL=extension header length) in
32bit units. Excluding the 32 of the extension header.
DHN 9-18-07 43
Potential further development of RTP & RTCP
The Real-time Transport Protocol (RTP) and the Real-time Transport Control Protocol
(RTCP) are commonly used together. RTP is used to transmit data (e.g. audio and video)
and RTCP is used to monitor QoS. The monitoring of quality of service is very important
for modern applications. In large scale applications (e.g. IPTV), there is an unacceptable
delay between RTCP reports, which can cause quality of service related problems.
For more information read about problems and potential further development of RTCP
The equations for RTCP protocol are explained in section I. and II.A in the Optimization
of Large-Scale RTCP Feedback Reporting in Fixed and Mobile Networks paper.
Structure of RTP/RTCP applications
RTP/RTCP protocols are commonly used to transport audio or audio/video data. Separate
sessions are used for each media content (e.g. audio and video). The main advantage of
this separation is to make it possible to receive only one part of the transmission,
commonly audio data, which lowers the total bandwidth.
• Real time control protocol
• Real Time Streaming Protocol (RTSP)
• Secure Real-time Transport Protocol
• Stream Control Transmission Protocol
• Henning Schulzrinne and Stephen Casner. RTP: A Transport Protocol for Real-
Time Applications. (1993) Internet Engineering Task Force, Internet Draft,
October 20, 1993. The memo originating RTP; only an early draft, does not
describe the current standard.
• Perkins, Colin (2003). RTP: Audio and Video for the Internet (1st ed.) Addison-
Wesley. ISBN 0-672-32249-8
DHN 9-18-07 44
STUN (Simple Traversal of UDP (User Datagram Protocol) through NATs
(Network Address Translators)) is a network protocol allowing a client behind a NAT
(or multiple NATs) to find out its public address, the type of NAT it is behind and the
internet-side port associated by the NAT with a particular local port. This information is
used to set up UDP communication between two hosts that are both behind NAT routers.
The protocol is defined in RFC 3489.
• 1 Protocol overview
• 2 Algorithm
• 3 See also
• 4 External links
o 4.1 Implementations
STUN is a client-server protocol. A VoIP phone or software package may include a
STUN client, which will send a request to a STUN server. The server then reports back to
the STUN client what the public IP address of the NAT router is, and what port was
opened by the NAT to allow incoming traffic back in to the network.
The response also allows the STUN client to determine what type of NAT is in use, as
different types of NATs handle incoming UDP packets differently. It will work with three
of four main types: Full Cone, Restricted Cone, and Port Restricted Cone. (In the case of
Restricted Cone or Port Restricted Cone NATs, the client must send out a packet to the
endpoint before the NAT will allow packets from the endpoint through to the client.)
STUN will not work with Symmetric NAT (also known as bi-directional NAT) which is
often found in the networks of large companies. With Symmetric NAT, the IP address of
the STUN server is different than that of the endpoint, and therefore the NAT mapping
the STUN server sees is different than the mapping that the endpoint would use to send
packets through to the client. For details on the different types of NAT, see network
Once a client has discovered its external addresses, it can relate it to its peers. If the
NATs are full cone then either side can initiate communication. If they are restricted cone
or restricted port cone both sides must start transmitting together.
Note that using the techniques described in the STUN RFC does not necessarily require
using the STUN protocol; they can be used in the design of any UDP protocol.
DHN 9-18-07 45
Protocols like SIP use UDP packets for the transfer of sound/video/text signaling traffic
over the Internet. Unfortunately as both endpoints are often behind NAT, a connection
cannot be set up in the traditional way. This is where STUN is useful.
The STUN server is contacted on UDP port 3478, however the server will hint clients to
perform tests on alternate IP and port number too (STUN servers have two IP addresses).
The RFC states that this port and IP are arbitrary.
STUN uses the following algorithm (adapted from RFC 3489) to discover the presence of
NAT gateways and firewalls:
Where the path through the diagram ends in a red box, UDP communication is not
possible. Where the path ends in a yellow or green box, communication is possible.
DHN 9-18-07 46
A Session Border Controller is a device used in some VoIP networks to exert control
over the signaling and usually also the media streams involved in setting up, conducting,
and tearing down calls.
Within the context of VoIP, the word "Session" in Session Border Controller refers to a
call. Each call consists of one or more call signaling streams that control the call, and one
or more call media streams which carry the call's audio, video, or other data along with
information concerning how that data is flowing across the network. Together, these
streams make up a session, and it is the job of a Session Border Controller to exert
influence over the data streams that make up one or more sessions.
The word "Border" in Session Border Controller refers to a point of demarcation between
one part of a network and another. As a simple example, at the edge of a corporate
network, a firewall demarcs the local network (inside the corporation) from the rest of the
Internet (outside the corporation). A more complex example is that of a large corporation
where different departments have security needs for each location and perhaps for each
kind of data. In this case, filtering routers or other network elements are used to control
the flow of data streams. It is the job of a Session Border Controller to assist policy
administrators in managing the flow of session data across these borders.
The word "Controller" in Session Border Controller refers to the influence that Session
Border Controllers have on the data streams that comprise Sessions, as they traverse
borders between one part of a network and another. Additionally, Session Border
Controllers often provide measurement, access control, and data conversion facilities for
the calls they control.
DHN 9-18-07 47
• 1 Theory of operation
• 2 Controversy
• 3 Lawful Intercept and CALEA
• 4 History and market
• 5 References
• 6 External links
Theory of operation
SBCs are inserted into the signaling and/or media paths between calling and called
parties in a VoIP call, predominantly those using the SIP, H.323, and MGCP call
In some cases, the SBC acts as if it were the called VoIP phone and places a second call
to the called party. In technical terms, when used within the SIP protocol, this is defined
as being a Back-to-Back User-Agent, or B2BUA. The effect of this behavior is that not
only the signaling traffic, but also the media traffic (voice, video etc) can be controlled by
the SBC. SBCs also make it possible to redirect media traffic to a completely different
element elsewhere in the network, perhaps for recording, generation of music-on-hold, or
other media-related purposes. Without an SBC, the media traffic travels directly between
the VoIP phones, without the in-network call signaling elements having control over their
However, in other cases, the SBC simply modifies the stream of call control (signaling)
data involved in each call, perhaps limiting the kinds of call that can be conducted,
changing the codec choices, and so on. Ultimately, SBCs allow their owners to control
the kinds of calls that can be placed through the networks on which they reside, fix or
change protocols and protocol syntax to achieve interoperability, and also overcome
some of the problems that firewalls and NAT cause for VoIP calls.
DHN 9-18-07 48
SBCs are often used by corporations along with firewalls to enable VoIP calls to and
from a protected enterprise network. VoIP service providers use SBCs to allow the use of
VoIP protocols from private networks with internet connections using NAT, and also to
implement strong security measures that are necessary to maintain a high quality of
service. SBCs also perform the function of application-level gateways.
Additionally, some SBCs can also allow VoIP calls to be set up between two phones
using different VoIP signaling protocols (SIP, H.323, Megaco/MGCP, etc...) as well as
performing transcoding of the media stream when different codecs are in use. Many
SBCs also provide firewall features for VoIP traffic (denial of service protection, call
filtering, bandwidth management, etc...).
In contrast to conventional phone systems, the OSI layers of a VoIP-based network need
not be operated by a single company. A VoIP user may purchase their internet access
from one internet service provider and their VoIP service from a second company.
From an IMS architecture perspective, the SBC is the integration of the P-CSCF and C-
BGF functions on the access side, and the I-BCF, IWF, and I-BGF functions on the
peering side. Some SBCs can be "decomposed", meaning the signaling functions can be
on a separate hardware platform than the media relay functions - in other words the P-
CSCF can be separated from the C-BGF, or the I-BCF/IWF can be separated from the I-
BGF functions physically. A proprietary or standards based protocol, such as the H.248
Ia profile, can be used by the signaling platform to control the media one.
DHN 9-18-07 49
The concept of SBC is controversial to proponents of end-to-end systems and peer-to-
peer networking in consideration of the following:
• SBCs can extend the length of the media path (the way of media packets through
the network) significantly. A long media path is undesirable, as it increases the
delay of voice packets (especially if the SBC implements transcoding) and the
probability of packet loss. Both effects deteriorate the voice/video quality.
However, sometimes there are obstacles to communication such as firewalls
between the call parties, and in these cases SBCs can be used to guide media
streams towards an acceptable path between caller and callee, whereas without the
SBC the call media would be blocked. Some SBCs can detect if the ends of the
call are in the same subnetwork and release control of the media enabling it to
flow directly between the clients, this is anti-tromboning. Also, some SBCs can
create a media path where none would otherwise be allowed to exist (by virtue of
various firewalls and other security apparatus between the two endpoints). Lastly,
for specific VoIP network models where the service provider owns the network,
SBCs can actually decrease the media path by shortcut routing approaches.
• SBCs often restrict the flow of information between call endpoints, restricting
end-to-end transparency. VoIP phones may not be able to use new protocol
features unless they are understood by the SBC. However, some SBCs are more
able than others to cope with previously unseen and unanticipated protocol
features. End-to-End encryption can't be used if the SBC does not have the key,
DHN 9-18-07 50
although some portions of the information stream in an encrypted call are not
encrypted, and those portions can be used and influenced by the SBC. Some
SBCs are able to offload this encryption function from other elements in the
network by terminating SIP-TLS, IPSec, and/or SRTP. Furthermore, some SBCs
can actually make calls and other SIP scenarios work when they couldn't have
before, by performing specific protocol "normalization" or "fix-up".
• In some cases, far-end or hosted NAT traversal can be done without SBCs if the
VoIP phones support protocols like STUN, TURN, ICE, or Universal Plug and
Play (UPnP). To date STUN, TURN, ICE and others have not seen wide
deployment, and their complexity leaves much to be desired.
Most of the controversy surrounding SBCs pertains to whether call control should remain
solely with the two endpoints in a call (in service to their owners), or should rather be
shared with other network elements owned by the organizations managing various
networks involved in connecting the two call endpoints. For example, should call control
remain with Alice and Bob (two callers), or should call control be shared with the
operators of all the IP networks involved in connecting Alice and Bob's VoIP phones
together. The debate of this point is vigorous, almost religious, in nature. Those who want
control in the endpoints only, are greatly frustrated by the various realities of today's
networks, such as firewalls, filtering/throttling, and the lack of adoption of a universal
VoIP equivalent to the phone number. Those who want control in the middle of the call
end-points, are typically trying to replicate the old-style phone system, where virtually all
control rested with the service provider. So far, these views have not proven to be
reconcilable. Note that it may be required for a third call control element such as an SBC
to be inserted in between the two endpoints in order to satisfy local lawful interception
Lawful Intercept and CALEA
An SBC may provide session media (normally RTP) and signalling (normally SIP)
wiretap services, which can be used by providers to enforce requests for the lawful
interception of network sessions. Standards for the interception of such services are
provided by CALEA and ETSI, among others.
DHN 9-18-07 51
History and market
The history of SBCs shows that several corporations were involved in creating and
popularizing the SBC market segment for carriers and enterprises. The "big six" of
carrier-oriented SBC companies are (or were, since several have been acquired or are
defunct): Acme Packet (NASDAQ: APKT), Kagoor Networks (acquired in 2005 by
Juniper Networks and later end-of-lifed), Jasomi Networks (acquired in 2005 by Ditech
Communications which is now known as Ditech Networks), Netrake (acquired in 2006
by Audiocodes), NexTone, and Aravox (acquired in 2003 by Alcatel and terminated).
According to Jonathan Rosenberg, the author of RFC 3261 (SIP) and numerous other
related RFCs, Dynamicsoft actually developed the first working SBC in conjunction with
Aravox, but the product never truly gained marketshare. Four companies also played a
major role in delivering enterprise-oriented SBCs: Jasomi Networks with its PeerPoint
product line, Edgewater, Borderware, and Ingate.
During the evolution of SBCs, many other companies undertook software development
programs to create SBCs. However, doing so turned out to be a far greater technical
challenge than most had anticipated, and there were few successes. An even larger group
of companies began to remarket their existing products as SBCs when it became clear
that the SBC market was "hot" with respect to acquisitions and IPOs.
Of these companies, Acme Packet is the market segment leader, and is the only company
of the group to have had a successful IPO. With the field narrowed by acquisition,
NexTone is generally considered to be in second place, although they traditionally target
a different market segment, having started life as a softswitch vendor.
1. ^ Internet Communication Using SIP (p 180), Henry Sinnreich & Alan B. Johnston,
The Session Initiation Protocol (SIP) is an application-layer control (signaling)
protocol for creating, modifying, and terminating sessions with one or more participants.
It can be used to create two-party, multiparty, or multicast sessions that include Internet
telephone calls, multimedia distribution, and multimedia conferences. (cit. RFC 3261).
SIP is designed to be independent of the underlying transport layer; it can run on TCP,
UDP, or SCTP. It was originally designed by Henning Schulzrinne (Columbia
University) and Mark Handley (UCL) starting in 1996. The latest version of the
specification is RFC 3261 from the IETF SIP Working Group. In November 2000, SIP
was accepted as a 3GPP signaling protocol and permanent element of the IMS
architecture. It is widely used as a signaling protocol for Voice over IP, along with H.323
SIP has the following characteristics:
DHN 9-18-07 52
• Transport-independent, because SIP can be used with UDP, TCP, ATM & so on.
• Text-based, allowing for humans to read SIP messages.
• 1 Protocol design
• 2 SIP network elements
• 3 Instant messaging (IM) and presence
• 4 Commercial applications
• 5 See also
• 6 External links
SIP clients use TCP or UDP (typically on port 5060) to connect to SIP servers and other
SIP endpoints. SIP is primarily used in setting up and tearing down voice or video calls.
However, it can be used in any application where session initiation is a requirement.
These include Event Subscription and Notification, Terminal mobility and so on. There
are a large number of SIP-related RFCs that define behavior for such applications. All
voice/video communications are done over separate session protocols, typically RTP.
A motivating goal for SIP was to provide a signalling and call setup protocol for IP-based
communications that can support a superset of the call processing functions and features
present in the public switched telephone network (PSTN). SIP by itself does not define
these features; rather, its focus is call-setup and signalling. However, it has been designed
to enable the building of such features in network elements known as Proxy Servers and
User Agents. These are features that permit familiar telephone-like operations: dialing a
number, causing a phone to ring, hearing ringback tones or a busy signal. Implementation
and terminology are different in the SIP world but to the end-user, the behavior is similar.
SIP-enabled telephony networks can also implement many of the more advanced call
processing features present in Signalling System 7 (SS7), though the two protocols
themselves are very different. SS7 is a highly centralized protocol, characterized by a
highly complex central network architecture and dumb endpoints (traditional telephone
handsets). SIP is a peer-to-peer protocol. As such it requires only a very simple (and thus
DHN 9-18-07 53
highly scalable) core network with intelligence distributed to the network edge,
embedded in endpoints (terminating devices built in either hardware or software). SIP
features are implemented in the communicating endpoints (i.e. at the edge of the
network) as opposed to traditional SS7 features, which are implemented in the network.
Although many other VoIP signalling protocols exist, SIP is characterized by its
proponents as having roots in the IP community rather than the telecom industry. SIP has
been standardized and governed primarily by the IETF while the H.323 VoIP protocol
has been traditionally more associated with the ITU. However, the two organizations
have endorsed both protocols in some fashion.
SIP works in concert with several other protocols and is only involved in the signalling
portion of a communication session. SIP acts as a carrier for the Session Description
Protocol (SDP), which describes the media content of the session, e.g. what IP ports to
use, the codec being used etc. In typical use, SIP "sessions" are simply packet streams of
the Real-time Transport Protocol (RTP). RTP is the carrier for the actual voice or video
The first proposed standard version (SIP 2.0) was defined in RFC 2543. The protocol was
further clarified in RFC 3261, although many implementations are still using interim
draft versions. Note that the version number remains 2.0.
SIP is similar to HTTP and shares some of its design principles: It is human readable and
request-response structured. SIP shares many HTTP status codes, such as the familiar
'404 not found'. SIP proponents also claim it to be simpler than H.323. However, some
would counter that while SIP originally had a goal of simplicity, in its current state it has
become as complex as H.323. Others would argue that SIP is a stateless protocol, hence
making it possible to easily implement failover and other features that are difficult in
stateful protocols such as H.323. SIP and H.323 are not limited to voice communication
but can mediate any kind of communication session from voice to video or future,
SIP network elements
Hardware endpoints — devices with the look, feel, and shape of a traditional telephone,
but that use SIP and RTP for communication — are commercially available from several
vendors. Some of these can use Electronic Numbering (ENUM) or DUNDi to translate
existing phone numbers to SIP addresses, so calls to other SIP users can bypass the
telephone network, even though your service provider might normally act as a gateway to
the PSTN network for traditional phone numbers (and charge you for it). Today, software
SIP endpoints are common.
SIP also requires proxy and registrar network elements to work as a practical service.
Although two SIP endpoints can communicate without any intervening SIP
infrastructure, which is why the protocol is described as peer-to-peer, this approach is
DHN 9-18-07 54
impractical for a public service. There are various implementations that can act as proxy
From the RFCs:
"SIP makes use of elements called proxy servers to help route requests to the
user's current location, authenticate and authorize users for services, implement
provider call-routing policies, and provide features to users."
"SIP also provides a registration function that allows users to upload their current
locations for use by proxy servers. "
"Since registrations play an important role in SIP, a User Agent Server that
handles a REGISTER is given the special name registrar."
"It is an important concept that the distinction between types of SIP servers is
logical, not physical."
Instant messaging (IM) and presence
A standard instant messaging protocol based on SIP, called SIMPLE, has been proposed
and is under development. SIMPLE can also carry presence information, conveying a
person's willingness and ability to engage in communications. Presence information is
most recognizable today as buddy status in IM clients.
Some efforts have been made to integrate SIP-based VoIP with the XMPP specification
used by Jabber. Most notably Google Talk, which extends XMPP to support voice, plans
to integrate SIP. Google's XMPP extension is called Jingle and, like SIP, it acts as a
Session Description Protocol carrier.
SIP itself defines a method of passing instant messages between endpoints, similar to
SMS messages. This is not generally supported by commercial operators.
Firewalls typically block media packet types such as UDP, though one way around this is
to use TCP tunnelling and relays for media in order to provide NAT and firewall
traversal. One solution involves tunnelling the media packets within TCP or HTTP
packets to a relay. This solution uses additional functionality in conjunction with SIP, and
packages the media packets into a TCP stream which is then sent to the relay. The relay
then extracts the packets and sends them on to the other endpoint. If the other endpoint is
behind a symmetrical NAT, or corporate firewall that does not allow VOIP traffic, the
relay would transfer the packets to another tunnel. One disadvantage of this approach is
that TCP was not designed for real time traffic such as voice, so an optimized form of the
protocol is sometimes used.
DHN 9-18-07 55
As envisioned by its originators, SIP's peer-to-peer nature does not enable network-
provided services. For example, the network can not easily support legal interception of
calls (referred to in the United States by the law governing wiretaps, CALEA).
Emergency calls (calls to E911 in the USA) are difficult to route. It is difficult to identify
the proper Public Service Answering Point, PSAP because of the inherent mobility of IP
end points and the lack of any network location capability. However, as commercial SIP
services begin to take off practical solutions to these problems are being proven.
Standards being developed by such organizations as 3GPP and 3GPP2 define
applications of the basic SIP model which facilitate commercialization and enable
support for network-centric capabilities such as CALEA.
Many VoIP phone companies allow customers to bring their own SIP devices, as SIP-
capable telephone sets, or softphones. The new market for consumer SIP devices
continues to expand.
The free software community started to provide more and more of the SIP technology
required to build both end points as well as proxy and registrar servers leading to a
commoditization of the technology, which accelerates global adoption. SIPfoundry has
made available and actively develops a variety of SIP stacks, client applications and
SDKs, in addition to entire IP PBX solutions that compete in the market against mostly
proprietary IP PBX implementations from established vendors.
The National Institute of Standards and Technology (NIST), Advanced Networking
Technologies Division provides a public domain implementation of the JAVA Standard
for SIP JAIN-SIP which serves as a reference implementation for the standard. The stack
can work in proxy server or user agent scenarios and has been used in numerous
commercial and research projects. It supports RFC 3261 in full and a number of
extension RFCs including RFC 3265 (Subscribe / Notify) and RFC 3262 (Provisional
Reliable Responses) etc.
DHN 9-18-07 56
A Signaling Gateway is a network component solely responsible for translating
signaling messages (i. e. information about call establishment and teardown) between one
medium (usually IP) and another (PSTN). For example, a signaling gateway might
translate between ISUP and SIP. A signaling gateway is often part of a softswitch in
modern VoIP deployments.
DHN 9-18-07 57