1.1. 1.2. Research Problem and Objectives of the Paper
Research problem of this paper is two fold and can best be expressed in the form of a
question: “What are Security Requirements and Constraints of VoIP?” Objectives of the
paper are to first introduce characteristics, in terms of security, of some VoIP architectures
already in use, then identify some components of which VoIP security comprises, and finally
analyse security constraints imposed by some, selected VoIP architectures or protocols.
1.2. 1.3. Scope of the Paper
Idea of this paper is not to address all issues related to VoIP security, and give detailed
presentation of all problems, which exists, or are likely to exist in the upcoming future. The
defined scope is instead to scratch the surface and give introduction on some of the most
important existing, and therefore already identified, security issues. A VoIP call as understood
within this paper ends, at the latest, at that point where the Internet Protocol is not anymore
used for addressing of messages.
1.3. 1.4. Structure of the Paper
This paper is organized so that in Section 2, characteristics of some well-known
security protocols or architectures are briefly reviewed. In Section 3, some general
security requirements for VoIP system are discussed. And in Section 4, some security
constraints of VoIP are discussed. This paper ends in Section 5, where conclusions are
2.2. Security Characteristics of Some Existing VoIP Protocols
VoIP protocols which are discussed in this paper are: H.323, SIP (Session Initiation
Protocol) and MGCP (Media Gateway Control Protocol).
These three protocols, of course, have similarities since they all address the same need
of VoIP communications. One area where the protocols differ from each other is the
location of intelligence, i.e. where heavy computation is done, in networks, as follows
•• Intelligence everywhere
o o H.323
•• Intelligent endpoints and dumb network
o o SIP
•• Intelligent network and dumb endpoints
o o MGCP
It should be also noted that the protocols differ from each other in terms of complexity and
flexibility, e.g. H.323 specification gives very limited amount of flexibility to implementors
compared to SIP.
2.2. ITU-T’s H.323
Addressing in H.323 is based on transport address, which is composed of network
address and TSAP Identifier. In case of TCP/IP networks transport address equals to
IP address and TCP port. H.323 uses three different types of channels, which are Call
Connection Channel, Call Control Channel and Media Channels, depending on the
type of information to be transferred. Media Channels are usually directly routed
between the participants of a call but both Call Control Channel and Call Connection
Channel can be routed via gatekeeper or directly between the participants as chosen
by the gatekeeper. A gatekeeper may or may not be present in different areas of the
H.235 recommendation contains provisions for entities utilizing the H.235 to support
a key recovery technique but it is not specifically required for operation.
Call Connection Channel is secured a priori with TLS (TCP-port 1300). In the initial
connection setup, the Call Connection Channel is first opened in case there is no
gatekeeper present. If the gatekeeper is present, the first channel to be opened is RAS
channel with the gatekeeper. In RAS channel, some authentication mechanism should
be used but not much is said on encrypting the traffic between the terminal and the
gatekeeper. After connection establishment procedures the H.245 Call Control
Channel is opened with using secured mode if negotiated. Information on Media
Channels including encryption keys is passed via the H.245 Call Control Channel .
Figure 1, H.323 Network Architecture
There are two types of authentication that may be utilized: 1) symmetric encryption-based
that requires no prior contact between the communicating entities and 2) an ability to have
some prior shared secret (in the H.235 recommendation referenced to as “subscription”
based). This second way to authenticate can either be symmetric or asymmetric .
As symmetric encryption-based authentication, there is Diffie-Hellman key-
exchange available which is intended to be used to generate a shared secret between
two entities. It may also be used for the authentication of application- or protocol-
specific request messages, not users of terminals, by the responder .
Subscription-based authentication has three variations, which are:
•• password-based with symmetric encryption;
•• password-based with hashing (also symmetric);
•• certificate-based with signatures (asymmetric).
Other “authentication” mechanisms that can be specified as the authentication mechanism to
be used are “IPSEC based connection”, “TLS” and “something else”.
Encryption can be done in RTP-layer (see figure 1) which is used to transport information
streams. Encryption can be utililized packet-by-packet basis, which means that it is up to the
specified policy how encryption capabilities are applied to each packet.
2.1. 2.3. IETF’s SIP
SIP is a text-based protocol, similar to HTTP and SMTP, for initiating interactive
communication sessions between users. Such sessions include voice, video, chat,
interactive games, and virtual reality .
Figure 2, SIP Network Architecture
All authentication mechanisms that SIP provides are challenge-response based. There exist
three alternatives, which are:
•• Basic authentication;
•• Digest authentication; and
•• PGP authentication.
As a client-server architecture, the communications is based on requests and responses. In
addition to requests, it is also possible within SIP to authenticate responses, but this option is
not widely used.
Encryption capabilities of SIP itself are very limited. Only PGP-based encryption of certain
headers in a message is possible.
2.2. 2.4. IETF’s and ITU-T’s MGCP/MEGACO/H.248
Approach of MGCP/MEGACO/H.248 to authentication and encryption is clear and
straightforward. According to RFC2885 as of August 2000, IPSEC “should” be used for the
authentication and encryption of protocol connections. IKE should also be used to provide
more robust keying options. Encryption keys for Media Gateways to encrypt media
connections are provided by Media Gateway Controllers via IPSEC-secured protocol
connections, i.e. media connections are not be encrypted via IPSEC .
Figure 3, MGCP/MEGACO Network Architecture
2.3. 3.3. Security Characteristics of VoIP vs. Traditional Telephony
1.1. In VoIP, given that packets are not encrypted, all that an attacker needs is to
pick up appropriate packets with a packet sniffer. This packet sniffer can be a general-
purpose computer attached, for example, to corporate’s local area network. In
traditional telephony, mobile telephony excluded, attacker instead needs to have
special device, which further needs to be physically connected to a wire, which is
used during a call.
2.2. Imagine a LAN, which is used for VoIP in a manner that people use either IP
telephones, (that are connected to LAN) or their voice-equipped PC’s (that, of course,
are also connected to LAN) to initiate and receive phone calls. Let’s say that the total
amount of terminal devices in this LAN is about 100. From the point of view of a
single terminal device, the remaining 99 could, at worst, be potential packet sniffers
very difficult to trace, because they are supposed to be there and connected to LAN.
3.3. Internet is widely considered as insecure. Circuit-switched networks are not
fully secure but people don’t worry too much about that.
3.4. 3.4. Functional Security Requirements
1. Information exchanged among the participants of a call should be kept
confidential, and access to such information should be impossible to any third
Motivation for this clause is easy to see: without confidentiality, third parties may
misuse the information which they were able to eavesdrop during a conversation.
2. Only service provider should have access to statistical information and such
information should be safeguarded against attacks from third parties.
Who would like to get cought from calling to sex or escort service numbers?
3.5. . Technical Security Requirements
Security requirements, which should be met and fully supported by any VoIP protocol
to be considered as, secure, are summarized in the list below:
1. All connections between network elements should be encrypted;
2. The endpoints of all connections should always be authenticated in two-
ways to prevent man-in-the middle attacks;
3. End-to-end user authentication should be provided at terminal devices;
4. Both clients and servers should be protected against Denial of Service
type of attacks.
3.4. Security Constraints of VoIP
Security constraints are reasons why predefined security requirements cannot be reached at
all, or in special conditions, in VoIP communications. Security constraints can result from
inadequate design of VoIP protocols or incomplete implementations of these. And the
underlying architecture of the Internet itself gives also some limitations, naturally. There are
many network infrastructure elements, like firewalls and e.g. IPSEC security gateways using
NAT, which are used to add security to the networks where they are used. The functioning of
these devices and the restrictions they pose should be taken into account in the design of VoIP
protocols, and may be vice versa also, i.e. there still may be some tasks which just need to be
done in a certain way requiring few changes to existing network elements already in place.
At this point, it would be convenient to notice that security can be implemented and therefore
exist at many levels. Three different levels can be identified: link-level security, secured
packets and chosen fields in a packet. And in each of these levels, the choice can be all
information or part of the information depending on the sensitivity of the information
When security is implemented in the link-level, the security is transparent for users and
applications since either of them have zero knowledge on what happens to packets when they
pass the IP-layer and enter to link layer. All information that goes through certain link is
Not every packet contains information having high sensitivity and therefore also high priority
to be secured. If the security is implemented at the packet level, the security is not anymore as
transparent, for instance to network infrastructure devices, as it is at the link level.
Sometimes only certain fields of packets are secured. This, of course, gives the weakest level
of protection when compared to two previously mentioned methods. However, if fields
chosen include all the fields containing sensitive information, this method may turn out to be
adequate and secure enough.
4.1. 4.1. Delay Sensitivity of VoIP
“Perhaps the most vexing problem in voice-over-IP, in general, has been the issue of quality
of service. The delay in conversation that many VoIP users encounter is caused by the jitter
and latency of packet delivery within the Internet itself .” Quality of service, latency and
jitter are definitely problems for VoIP. Does these have any effect on the security point of
view? Yes, indeed they have.
Prerequisite for fully secured end-to-end connection is that the whole path, i.e. all
devices and links between them, is protected against man-in-the-middle attacks .
One consequence of this is that all devices must perform mutual authentication.
Public key cryptography based authentication is the one and only means of
authentication that scales up to arbitrarily large networks by making it possible to
securely distribute keys relatively easily through unsecured networks.
On the other hand public-key based methods set two major constraints. First, currently
there is not required puclic-key infrastructure that would span major parts of the
globe, in place. Secondly, public-key based authentication procedures require and use
vast amount of computational power. If authentication is performed at the link-level
by two endpoints of a link every now and then, this should not be a problem. Public-
key cryptography would also give convenient tools for the encryption of messages. If
the encryption takes place between two endpoints, this should not be a problem
because end devices can be equipped with required computational power, e.g. crypto
chips. But if encrypted packets need to be encrypted at any intermediary device (e.g.
router), this can easily become a bottleneck and introduce additional delay, to delay
sensitive delivery of VoIP packets.
4.2. 4.2. Specified Message Format
SIP, H.323 and other VoIP protocols are used to initiate sessions, e.g. audio sessions. The
information that is needed for creating media channel between two entities is encapsulated in
the body of VoIP message. This information includes IP addresses. While NAT is used
internal addresses would require translation to external addresses to be applicaple. The body
must not be encrypted when a NAT device does the translation and this impose a security
4.3. 4.3. End-to-End and Hop-by-Hop Security
There are two interesting concepts and ways to implement both authentication and
encryption in use. Namely, end-to-end, and hop-by-hop. As the names imply, end-
to-end covers the whole connection from the sender to the recipient. Hop-by-hop
instead covers at least one hop of a connection. It may cover other hops, if there exists
more, but that is not guaranteed.
Advantages of hop-by-hop security :
•• No need for end users to have public keys;
•• Only service providers need to have public keys; and
•• It models current web security that has proven to work.
Major limitation of hop-by-hop security :
•• Requires transitive trust model.
In end-to-end security messages are encrypted or/and authenticated all the way from
the sender to the recipient.
4.4. 4.4. Real-life Examples of Security Problems
H.235 provides quite comprehensive security architecture for H.323 protocol suite. It
provides several different alternatives for authentication and also for encryption.
Using TLS in a pre-defined port 1300 does the establishment of the Call Connection
Channel a priori. This should be considered to a constraint since it is predefined and
any other security mechanisms can not be used for the first connection.
NAT and Firewall Traversal
Since H.323-compliant applications use dynamically allocated sockets for audio, video and
data channels, a firewall must be able to allow H.323 traffic through on an intelligent basis.
The firewall must be either H.323-enabled with an H.323 proxy, or able to “snoop” to control
channel to determine which dynamic sockets are in use for H.323 sessions, and allow traffic
as long as the control channel is active .
Below SIP specific security constraints are discussed on the basis of presentation 
by chief scientist Jonathan Rosenberg from Dynamicsoft.
Forking is a situation where A calls to B and the call invitation request forks to B1
and B2, which are different terminals that user B has specified he or she is using. The
outcome should be the both terminals B1 and B2 ringing. Challenge-response
mechanism used within SIP does not work with forking, only B1 rings in cases where
challenge-response is used. Using signed requests without challenge-response can
solve the problem but this would require the use of PKI. Replay attacks can be
achieved by remembering Call-IDs.
May happen when using http digest authentication for both request and response. If
the same shared secret is used in both directions, an attacker can obtain credentials by
reflecting a challenge in a response back in request. Using different secrets in each
direction eliminates attack. This kind of attack is not a problem when PGP is used for
Multiple proxies on the path may challenge the user. This is useful and needed for
outsourced services.If UAC (User Agent Client) only inserts credentials for last
challenge, ping ponging results. Solution is that UAC must accumulate credentials for
all challenges to a request. Currently there is a grammar problem in the message
format since to accumulate credentials would require multiple headers to be used,
which further is not yet possible.
SIP encryption is based on the use of PGP. This, to be useful, requires PKI.
Encryption does not cover many critical headers (e.g. To, From). Does not work with
forwarding and last but not least limitation: the PGP-based model assumes advance
knowledge of the recipient’s public key.
CANCEL is a SIP command which cancel searches and ringing, simply if attacker
sends CANCEL to target when target gets INVITE, the target can be prevented from
NAT and Firewall Traversal
There is a big need for secure traversal of SIP and its signalled sessions through NAT
and firewall. Since SIP is a session control protocol IP addresses and TCP ports
appear in the body of the protocol. In networks where NAT is used internal network
addresses would require translation to external network addresses to be applicaple.
The body must not be encrypted when a NAT device does the translation and this
impose a security constraint This is fundamental to SIP operations and results in
difficulties in NAT and firewall traversal.
Issues with security constraints of MGCP and the related standards are similar to the
presented constraints of H.323 and SIP with the exception that according to RFC2885
security issues should be solved by using external security protocols and especially
IPSEC. Assuming that IPSEC is required to be used, it follows that they are actually
the security issues of IPSEC that determine the security constraints for MGCP and
related protocols – at least up to some extent.
This paper has discussed on existing VoIP architectures, security services they
provide, security requirements in general and finally some constraints for VoIP
security have been introduced. End-to-end security is difficult to provide in all
conditions. Hop-by-hop security provides solution that is easier to implement in
practise since only service providers need to set up PKI but as a disadvantage it
causes transitivity of trust. Firewall and NAT traversal is also an issue, but solutions
already exist for that.
For the authentication of both devices and users there is no other way than to set up
PKI and start using it. It provides only viable way for VoIP to scale up.
 Huovinen, L., Niu S., IP Telephony, [Referenced: 19.12.1997]
 Lawrence, J., MGCP Update, Presentation given at VON Europe 2000,
[Referenced: 6.7.2000], http://www.trillium.com/whats-
 IETF, SIP Working Group, [Referenced: 26.10.2000]
 IETF, RFC 2543, 1999 [Referenced: 23.11.1999]
 Oran, D., Sigcomm’99 Tutorial – M1: “Voice Over IP”, Cisco Systems,
Massachusetts, USA, 1999, 71 p.
 ITU-T, ITU-T Recommendation H.235 (02/98), Security and encryption for
H-Series (H.323 and other H.245-based) multimedia terminals, 1998
 IETF, Megaco Protocol version 0.8, RFC2885, 2000, [Referenced:
 Rosenberg, J., SIP Security, [Referenced: 8.5.2000]
 Thernelius, F., SIP, NAT and Firewalls, Master’s Thesis, Kungl Tekniska
Högskolan, Stockholm, 2000
[10 Kotha, S., Deploying H.323 Applications in Cisco Networks, White Paper,
] [Referenced: 2.7.2000]
[11 Rosenberg, J., Computer Telephony: The Session Initiation Protocol (SIP):
] A Key component for Internet Telephony. June 2000
[12 Goncalves, M., Voice Over IP Networks, McGraw-Hill, 2000
[13 This piece of information is based on numerous discussions between the
] writer of this paper and some product vendors. The nature of these
discussions is confidential and therefore the vendors can not be referenced