Security Requirements and Constraints Analysis
For Voice Over IP System
Raisul Haque Rajib
063 435 056
North South University
MS in Electronics and Telecommunication Engineering
Course: ETE 605
Course Teacher: Dr. Mashiur Rahman
Voice over IP (VoIP) is a technology by which voice can be transferred from circuit-
switched networks to or over IP networks, and vice versa. This is interesting since the
amount of voice, video and data traffic is rapidly increasing as more and more people get
online. This will result in increased need to expand capacity especially in the trunk
networks. So far the only way to add capacity in the trunk networks has been building a
new, additional trunk network separately with respect to each media: voice, video and
data. The concept of converged networks, i.e. all media can be transported via the very
same network gives an option to build one unified trunk network to take care of increased
capacity needs, therefore providing an elegant and low-cost solution.
However, even though at the moment cost savings can justify transition from traditional
telephony to VoIP telephony for multinational corporations, that is not the ultimate driver
which will foster the penetration of VoIP. Such a killer application(s) will be those value-
added services, which do not exist yet and can relatively easily be provided only via
VoIP. Integration of VoIP into existing Internet-based services like e-mail and web will
be an area to watch.
Numerous VoIP products and solutions are already available in the market. Products are
not yet very interoperable. Everyone seems to be interested in getting into this rather new
lucrative area of business – a number of product development projects are started to
create a new VoIP product and almost as many are cancelled. The world does not yet
know which of all the VoIP related, overlapping standards is going to win. Also the
quality of voice in VoIP due to non-guaranteed QoS, latency and some other
characteristics of IP networks cause troubles.
Security of VoIP is considered right from the beginning, and parallel with the
development in other areas of VoIP. Many security mechanisms are already defined in
the standards but they also have some flaws and many security problems remain
1.2 Objectives of the Paper
Objectives of the paper are to first introduce characteristics, in terms of security, of some
VoIP architectures already in use, then identify some components of which VoIP security
comprises, and finally analyze security constraints imposed by some, selected VoIP
architectures or protocols.
2. Security Characteristics of Some Existing VoIP Protocols
2.1 ITU-T’s H.323
Addressing in H.323 is based on transport address, which is composed of network
address and TSAP Identifier. In case of TCP/IP networks transport address equals to IP
address and TCP port. H.323 uses three different types of channels, which are Call
Connection Channel, Call Control Channel and Media Channels, depending on the type
of information to be transferred. Media Channels are usually directly routed between the
participants of a call but both Call Control Channel and Call Connection Channel can be
routed via gatekeeper or directly between the participants as chosen by the gatekeeper. A
gatekeeper may or may not be present in different areas of the Internet.
H.235 recommendation contains provisions for entities utilizing the H.235 to support a
key recovery technique but it is not specifically required for operation.
Call Connection Channel is secured a priori with TLS (TCP-port 1300). In the initial
connection setup, the Call Connection Channel is first opened in case there is no
gatekeeper present. If the gatekeeper is present, the first channel to be opened is RAS
channel with the gatekeeper. In RAS channel, some authentication mechanism should be
used but not much is said on encrypting the traffic between the terminal and the
gatekeeper. After connection establishment procedures the H.245 Call Control Channel is
opened with using secured mode if negotiated. Information on Media Channels including
encryption keys is passed via the H.245 Call Control Channel.
Figure 1: H.323 Network Architecture
There are two types of authentication that may be utilized: 1) symmetric encryption-
based that requires no prior contact between the communicating entities and 2) an ability
to have some prior shared secret (in the H.235 recommendation referenced to as
“subscription” based). This second way to authenticate can either be symmetric or
As symmetric encryption-based authentication, there is Diffie-Hellman key-exchange
available which is intended to be used to generate a shared secret between two entities. It
may also be used for the authentication of application- or protocol-specific request
messages, not users of terminals, by the responder.
Subscription-based authentication has three variations, which are:
• Password-based with symmetric encryption.
• Password-based with hashing (also symmetric).
• Certificate-based with signatures (asymmetric).
Other “authentication” mechanisms that can be specified as the authentication mechanism
to be used are “IPSEC based connection”, “TLS” and “something else”.
Encryption can be done in RTP-layer (see figure 1) which is used to transport information
streams. Encryption can be utilities packet-by-packet basis, which means that it is up to
the specified policy how encryption capabilities are applied to each packet.
2.2 IETF’s SIP
SIP is a text-based protocol, similar to HTTP and SMTP, for initiating interactive
communication sessions between users. Such sessions include voice, video, chat,
interactive games, and virtual reality.
Figure 2: SIP Network Architecture
All authentication mechanisms that SIP provides are challenge-response based. There
exist three alternatives, which are:
• Basic authentication;
• Digest authentication; and
• PGP authentication.
As client-server architecture, the communications is based on requests and responses. In
addition to requests, it is also possible within SIP to authenticate responses, but this
option is not widely used.
Encryption capabilities of SIP itself are very limited. Only PGP-based encryption of
certain headers in a message is possible.
2.3 IETF’s and ITU-T’s MGCP/MEGACO/H.248
Approach of MGCP/MEGACO/H.248 to authentication and encryption is clear and
straightforward. According to RFC2885 as of August 2000, IPSEC “should” be used for
the authentication and encryption of protocol connections. IKE should also be used to
provide more robust keying options. Encryption keys for Media Gateways to encrypt
media connections are provided by Media Gateway Controllers via IPSEC-secured
protocol connections, i.e. media connections are not be encrypted via IPSEC.
Figure 3: MGCP/MEGACO Network Architecture
3. Security Requirements for VoIP
VoIP security requirements list is given below:
• Dynamic per-call firewall control
• Dynamic per-call bandwidth control
• NAT traversal
• Signalling protocol compatibility
• Ability to handle encrypted VoIP Traffic
• WAN failover to back-up gateway for local survivability
• Adjunct Systems Health Monitoring
• Protecting the IP PBX and Adjunct Systems
• Endpoint-to-endpoint media traffic
• Preservation of location-specific IP addresses
• Network Call Usage Reporting
• VoIP Overlay Installation
• CALEA support
• Video, IM, Wireless, and other support
• Solutions for a full range of VoIP style deployments
• Redundancy options
3.1 VoIP Security Requirement Details
Dynamic per-call firewall control
The VoIP security solution should employ multiple dynamic layers of security to protect
the VoIP enabled network, including:
• Dynamically opening and closing firewall “pinholes” on a per-call basis.
• Subdividing the network in multiple security zones (for instance, separate zones
for voice and data).
• Allowing for per-user network authentication.
Dynamic per-call bandwidth control
In a recent InformationWeek VoIP survey of 300 business technology executives,
performance and quality of service was the number one concern in deploying VoIP. The
VoIP security solution should address this concern by its ability to:
• Allocate bandwidth on a per-call basis.
• Allocate bandwidth based on call classification.
• Allocate bandwidth and route calls over multiple WAN links.
• WAN link fail over with automatic call policy adjustment.
• Immediate bandwidth allocation for emergency calls.
The use of NAT between Private and Public address spaces can cause call set-up
problems. The VoIP security solution should solve this issue by communicating this
information to the IP PBX on a per-call basis, in effect making the IP PBX NAT-aware.
Figure 4: IP Telephones Behind NAT and Firewall
Signaling protocol compatibility
The VoIP security solution should be able to be used in conjunction with any version of
SIP, H.323, MGCP, or other standard or proprietary signaling protocols. Thus it should
not be necessary to make changes to the VoIP security equipment as these protocols
evolve or new protocols are conceived.
Ability to handle encrypted VoIP Traffic
Since this ideal VoIP security solution does not depend upon the inspection of signaling
or media stream traffic, this traffic can be encrypted using any encryption mechanism.
All functionality of the VoIP security solution should still be available. It is increasingly
being acknowledged that the encryption of signaling traffic is quite important, since it
often contains sensitive customer-specific information including, for instance, credit card
WAN failover to back-up gateway for local survivability
In a Hosted IP PBX environment, it may be common to minimize the amount of CPE
(Customer Premises Equipment). In such a scenario, the IP PBX is hosted in “the
network” and the only CPE may be IP Phones and a back-up gateway to enable calls to
the PSTN should that be necessary. The ideal VoIP solution should detect if the WAN
interface is down, and automatically re-route the VoIP traffic (signaling and media) to the
backup gateway (instead of to the networkhosted IP PBX).
Adjunct Systems Health Monitoring
A full VoIP security solution should also provide Heath Monitoring of IP PBX and
adjunct systems with automatic failure notification to both the IP PBX and network
and/or security management systems via SNMP. Adjunct systems that should be
• Messaging (Voicemail, etc.)
• Media Servers.
• Media Gateways and Analog Terminal Adapters.
• Conferencing Servers.
• IP Phones.
• Switches and Routers.
Protecting the IP PBX and Adjunct Systems
Unlike with traditional PBXs, a Denial of Service attack on an IP PBX can effectively
isolate the IP PBX or in worst-case bring it down. A VoIP security solution must protect
the IP PBX server by:
• “Zoned” firewall protection of the IP PBX and Adjunct Systems
• Denial of Service Protection for the IP PBX and Adjunct Systems including:
Rate limiting of connection attempts allowing the IP/PBX and Adjunct
Systems to continue to service internal calls and gateways while under
DOS attack alarm and SNMP notification
• The VoIP security solution should not require all media streams to pass through
a centralized device in the network. In this way, the VoIP solution is less
Denial of Service attacks. Latency-sensitive media stream traffic can easily be disrupted
by a DoS attack, making solutions requiring all traffic to pass through a single point in
the network unacceptable for reliable communications.
Endpoint-to-endpoint media traffic
In addition to superior Denial of Service protection, the ability to support direct endpoint-
to-endpoint media streams (as opposed to all media streams passing through a central
point in the network) provides greater network efficiency and bandwidth utilization. In
solutions where all media streams must pass through a central point, the media streams
for all calls within a single branch must still pass out of the branch, to the network-hosted
facility, and back to the branch! Thus for this type of call, twice the bandwidth in
consumed on the narrow WAN local loop as compared to the desired VoIP security
Preservation of location-specific IP addresses
Location-specific IP addresses should be preserved for Call Controller and Presence
features that require this information.
Network Call Usage Reporting
The VoIP security devices should employ a dedicated management CPU and
communication port to collect accounting statistics and distribute to external Accounting,
Billing, and Network Management Systems.
This accounting information can be used for:
• Remote monitoring and maintenance services.
• Network optimization and monitoring.
• Per-customer billing.
• Departmental charge back.
• Traffic studies.
VoIP Overlay Installation
When installing an IP Phone network it is recommended to separate the voice and data
traffic into different LAN segments. Typically this is done by creating a separate VLAN
for both Voice and Data. The ideal VoIP security solution should not require the creation
of a new sub-net and reconfiguration of Host IP addresses.
The VoIP security solution should support CALEA by allowing any traffic stream to be
mirrored to a second destination.
Solutions for a full range of VoIP style deployments
The ideal VoIP security solution should be applicable for reliably and securely delivering
VoIP in a variety of deployment architectures including:
• Centrex Model.
• Multi-Branch Offices.
• Consumer long distance.
• Multi-Tenant Buildings.
Video, IM, Wireless, and other support
The ideal VoIP security solution should be applicable for more than just VoIP. All
features described herein for Voice-over-IP, including dynamic per-call (per-session)
firewall and bandwidth control, NAT traversal, usage and health monitoring, encryption
compatibility, and signalling protocol independence, should also be applicable to:
• Instant Messaging.
• 802.11 (Wi-Fi) traffic.
• Message traffic between remotely located Unified Messaging, CRM (Customer
Relationship Management), or other systems.
• Other IP PBX functions.
• Other services the Service Provider may wish to offer, such as video
surveillance and physical security and fire monitoring .
The VoIP security solution should support several types of redundancy including:
• Link Redundancy.
• Unit Redundancy.
• WAN Redundancy with automatic policy adjustment.
• Port Redundancy using Spanning Tree.
• Fan Redundancy.
• Power Supply Redundancy.
• Feed Redundancy.
4. Security Constraints of VoIP
Security constraints are reasons why predefined security requirements cannot be reached
at all, or in special conditions, in VoIP communications. Security constraints can result
from inadequate design of VoIP protocols or incomplete implementations of these. And
the underlying architecture of the Internet itself gives also some limitations, naturally.
There are many network infrastructure elements, like firewalls and e.g. IPSEC security
gateways using NAT, which are used to add security to the networks where they are used.
The functioning of these devices and the restrictions they pose should be taken into
account in the design of VoIP protocols, and may be vice versa also, i.e. there still may
be some tasks which just need to be done in a certain way requiring few changes to
existing network elements already in place.
At this point, it would be convenient to notice that security can be implemented and
therefore exist at many levels. Three different levels can be identified: link-level
security, secured packets and chosen fields in a packet. And in each of these levels, the
choice can be all information or part of the information depending on the sensitivity of
the information transferred.
When security is implemented in the link-level, the security is transparent for users and
applications since either of them have zero knowledge on what happens to packets when
they pass the IP-layer and enter to link layer. All information that goes through certain
link is secured.
Not every packet contains information having high sensitivity and therefore also high
priority to be secured. If the security is implemented at the packet level, the security is
not anymore as transparent, for instance to network infrastructure devices, as it is at the
Sometimes only certain fields of packets are secured. This, of course, gives the weakest
level of protection when compared to two previously mentioned methods. However, if
fields chosen include all the fields containing sensitive information, this method may turn
out to be adequate and secure enough.
4.1 Delay Sensitivity of VoIP
“Perhaps the most vexing problem in voice-over-IP, in general, has been the issue of
quality of service. The delay in conversation that many VoIP users encounter is caused
by the jitter and latency of packet delivery within the Internet itself .” Quality of
service, latency and jitter are definitely problems for VoIP. Does these have any effect on
the security point of view? Yes, indeed they have.
Prerequisite for fully secured end-to-end connection is that the whole path, i.e. all devices
and links between them, is protected against man-in-the-middle attacks . One
consequence of this is that all devices must perform mutual authentication. Public key
cryptography based authentication is the one and only means of authentication that scales
up to arbitrarily large networks by making it possible to securely distribute keys
relatively easily through unsecured networks.
On the other hand public-key based methods set two major constraints. First, currently
there is not required public-key infrastructure that would span major parts of the globe, in
place. Secondly, public-key based authentication procedures require and use vast amount
of computational power. If authentication is performed at the link-level by two endpoints
of a link every now and then, this should not be a problem. Public-key cryptography
would also give convenient tools for the encryption of messages. If the encryption takes
place between two endpoints, this should not be a problem because end devices can be
equipped with required computational power, e.g. crypto chips. But if encrypted packets
need to be encrypted at any intermediary device (e.g. router), this can easily become a
bottleneck and introduce additional delay, to delay sensitive delivery of VoIP packets.
4.2 Specified Message Format
SIP, H.323 and other VoIP protocols are used to initiate sessions, e.g. audio sessions. The
information that is needed for creating media channel between two entities is
encapsulated in the body of VoIP message. This information includes IP addresses. While
NAT is used internal addresses would require translation to external addresses to be
applicaple. The body must not be encrypted when a NAT device does the translation and
this imposes a security constraint.
4.3 End-to-End and Hop-by-Hop Security
There are two interesting concepts and ways to implement both authentication and
encryption in use. Namely, end-to-end and hop-by-hop. As the names imply, end-to-end
covers the whole connection from the sender to the recipient. Hop-by-hop instead covers
at least one hop of a connection. It may cover other hops, if there exists more, but that is
Advantages of hop-by-hop security:
• No need for end users to have public keys.
• Only service providers need to have public keys.
• It models current web security that has proven to work.
Major limitation of hop-by-hop security:
• Requires transitive trust model.
In end-to-end security messages are encrypted or/and authenticated all the way from the
sender to the recipient.
4.4 Real-life Examples of Security Problems
H.235 provides quite comprehensive security architecture for H.323 protocol suite. It
provides several different alternatives for authentication and also for encryption. Using
TLS in a pre-defined port 1300 does the establishment of the Call Connection Channel a
priori. This should be considered to a constraint since it is predefined and any other
security mechanisms can not be used for the first connection.
NAT and Firewall Traversal
Since H.323-compliant applications use dynamically allocated sockets for audio, video
and data channels, a firewall must be able to allow H.323 traffic through on an intelligent
basis. The firewall must be either H.323-enabled with an H.323 proxy, or able to “snoop”
to control channel to determine which dynamic sockets are in use for H.323 sessions, and
allow traffic as long as the control channel is active.
Below SIP specific security constraints are discussed on the basis of presentation by chief
scientist Jonathan Rosenberg from Dynamic soft.
Forking is a situation where A calls to B and the call invitation request forks to B1 and
B2, which are different terminals that user B has specified he or she is using. The
outcome should be the both terminals B1 and B2 ringing. Challenge-response mechanism
used within SIP does not work with forking; only B1 rings in cases where challenge-
response is used. Using signed requests without challenge-response can solve the
problem but this would require the use of PKI. Replay attacks can be achieved by
May happen when using http digest authentication for both request and response. If the
same shared secret is used in both directions, an attacker can obtain credentials by
reflecting a challenge in a response back in request. Using different secrets in each
direction eliminates attack. This kind of attack is not a problem when PGP is used for
SIP encryption is based on the use of PGP. This, to be useful, requires PKI. Encryption
does not cover many critical headers (e.g. To, From). Does not work with forwarding and
last but not least limitation: the PGP-based model assumes advance knowledge of the
recipient’s public key.
CANCEL is a SIP command which cancel searches and ringing, simply if attacker sends
CANCEL to target when target gets INVITE, the target can be prevented from receiving
NAT and Firewall Traversal
There is a big need for secure traversal of SIP and its signaled sessions through NAT and
firewall. Since SIP is a session control protocol IP addresses and TCP ports appear in the
body of the protocol. In networks where NAT is used internal network addresses would
require translation to external network addresses to be applicable. The body must not be
encrypted when a NAT device does the translation and this impose a security constraint
This is fundamental to SIP operations and results in difficulties in NAT and firewall
Issues with security constraints of MGCP and the related standards are similar to the
presented constraints of H.323 and SIP with the exception that according to RFC2885
security issues should be solved by using external security protocols and especially
IPSEC. Assuming that IPSEC is required to be used, it follows that they are actually the
security issues of IPSEC that determine the security constraints for MGCP and related
protocols – at least up to some extent.
VOIP is still an emerging technology, so it is somewhat speculative to develop a
complete picture of what a mature worldwide VOIP network will one day look like. The
situation is analogous to the state of the Internet in the late 80s and early 90s. Competing
protocols and designs for the infrastructure of the net flourished at the time, but as the
purpose of the Internet became more defined with the emergence of the world wide web
and other staples of today’s net, the structure and protocols became standardized and
interoperability became much easier. The same may one day be true of VOIP. Although
there are currently many different architectures and protocols to choose from, eventually
a dominant standard will emerge. Converged networks are complex and drive the need
for increased security and performance requirements. A wide range of network security
devices are currently available on the market, presenting a potentially confusing array or
options. However, low-cost, highly-secure VoIP Security Devices exist today and can
mitigate security and performance risks in a straightforward and cost-effective manner.
 Mika Marjalaakso, Security Requirements and Constraints of VoIP
 D. Richard Kuhn, Thomas J. Walsh and Steffen Fries, Security Considerations for
Voice Over IP Systems.
 Ranch Networks, What To Look For in VoIP Security.