Comprehensive VoIP Security for the Enterprise:
Not Just Encryption and Authentication
A Sipera Whitepaper
As enterprises and operators role out real-time Internet Protocol
(IP) communications applications such as Voice-over IP (VoIP),
instant messaging (IM), video and multimedia, the need to protect
end-users and network infrastructures from multiple catastrophic
attacks, misuse, and abuse of session-based protocols is
At the same time, the encryption and authentication that many
advertise as VoIP security only scratches the surface of the required
protection. In fact, there are many VoIP-speciﬁc vulnerabilities that
have been discovered, along with thousands of threats that can
be launched against SIP/UMA/IMS networks, that encryption and
authentication alone do not address.
This white paper will look at a number of these threats that target
the enterprise network and users including reconnaissance, Denial
of Service (DoS)/Distributed Denial of Service (DDoS), Stealth
DoS/DDoS, Spooﬁng and VoIP spam in order to explore the unique
methods and techniques to protect VoIP infrastructure as well as
end users from threats that endanger the continued exchange of
time-critical, business-sensitive information.
Real-time, Internet Protocol (IP) communications applications have a signiﬁcant and obvious appeal
for enterprises and end-users because they allow the Internet and existing data networks to become
a cost-effective transport for things most people want to do such as: placing voice calls, participating
in video conferences, exchanging Instant Messages (IMs), and a host of other communications
applications. It can also allow you to realize the beneﬁts of using a Session Initiation Protocol
(SIP) trunk for hosted Voice over IP (VoIP) services. But cost is only part of the appeal, these new
communications applications enable increased efﬁciencies and collaboration with integration of
soft clients on PCs, IT infrastructure such as Microsoft Live Communication Server (LCS) and voice
extranets into one converged network, as shown in Figure 1.
Call Managers SIP Phones Soft Clients
VoIP VLAN Data VLAN
Voice Extranets Internet Road Warrior
Figure 1: Adding VoIP to the enterprise network
These beneﬁts do not come without a signiﬁcant tradeoff as we can see by taking a step back and
looking at what happened with IP networks. Because the IP network is an ‘open’ system, any user
can freely connect to it at any time from any place with little effort or oversight. This makes the IP
network a fertile breeding ground for a wide variety of malicious and unauthorized activities that can
affect any enterprise, group, or user. Network protocols, operating systems, web browsers,
e-mail clients and other applications are persistent targets of attacks.
Traditionally, the Internet security industry reacts to these attacks by developing a collection of
piecemeal solutions to protect the enterprise from attacks. As a result, threats have been effectively
mitigated to manageable levels by the development and deployment of a number of increasingly
sophisticated solutions including ﬁrewalls, Intrusion detection/intrusion prevention system (IDS/IPS),
anti-spam ﬁlters and others.
Comprehensive VoIP Security for the Enterprise 2
However, problems still persist and if history is any indication, IP communications applications will
also be subject to many of the same security threats that are prevalent in traditional Internet data
applications, and to many additional ones as well. These new attacks include deliberate application-
speciﬁc assaults against the VoIP infrastructure and end-points, such as denial of service (DoS) and
distributed denial of service (DDoS) attacks as well as stealth attacks and VoIP spam.
Because of these risks, many enterprises have deployed their VoIP infrastructure as an “island”
utilizing a separate Virtual Local Area Network (VLAN) to protect it against these attacks, but this
does not allow them to realize the full potential of IP communications applications. Even worse
from a security perspective, some enterprises feel they are safe by simply using the encryption and
authentication techniques embedded into the VoIP infrastructure. While this is important, encryption
and authentication do not protect against a variety of external threats from malicious users and
spammers as well as internal threats from infected PCs. Frequently, these malicious endpoints are
“authorized” users of VoIP and will easily pass the authentication and encryption hurdles.
At the same time, it’s important to understand that IP communications applications, such as VoIP,
are very different than web applications and email, as shown in Figure 2. VoIP is real-time by its very
nature and involves complex state machines which may need to track several dozen states at the
same time. The protocols themselves, such as SIP, are feature-rich and involve the use of separate
signaling and media planes which allow devices to talk peer-to-peer rather than the traditional
client-server methods of the data world. Finally, there is an extremely low tolerance to false positives
and negatives as compared to the data world.
Peer-to-Peer Separate signaling
and media planes
VoIP is Different
Low tolerance to false
Protocol and Feature Rich
positives & negatives
Complex state machine
(several dozen states)
Figure 2: IP Communications applications are very different than data applications
Comprehensive VoIP Security for the Enterprise 3
It’s easy to see that IP communications applications demand a security solution that not only
“borrows” from the best security functionality of the data world but adds speciﬁc VoIP protection
techniques that take into account the real-time, peer-to-peer, and feature-rich nature of these
VoIP Risks and Vulnerabilities
VoIP networks have thousands of unique vulnerabilities that can be exploited to launch a variety of
attacks. In fact, the Sipera VIPER lab, which is comprised of the most knowledgeable and capable
VoIP and security developers, architects, and engineers, has identiﬁed over 20,000 threats in the
last two years that can be launched against SIP networks, as shown in Table 1.
Attacks on Attacks SIP and
infrastructure SIP Media on end-users Media
Fuzzing >20000 7 Misuse 8
Reconnaissance 5 n/a Session Anomalies 4
Flood >30 2 Stealth 2
Distributed Flood >30 n/a Spam 2
Misuse/spooﬁng n/a 6
Total >20065 15 Total 16
Table 1: Unique SIP vulnerabilities as catalogued by Sipera VIPER Lab
All told, enterprises need to be aware of, and effectively protect their network from, these attacks
against their infrastructure and the additional ones against end-users which are unique to IP
communications applications. These application-speciﬁc threats are in addition to attacks such as
call hijacking, fraud and eavesdropping that are secured using encryption and authentication. Let’s
look at some of the more prevalent and potentially damaging VoIP-speciﬁc application level attacks.
Pre-DoS attacks are probes conducted against a network to ascertain its vulnerabilities, the
behavior of its equipment and users, and what services might be available for exploitation or
disruption. Once this information has been gathered, focused attacks against the network’s
assets, services, and users can then be launched. This type of ‘intelligence gathering’ or
‘probing action’ is often the ﬁrst thing an attacker will do when attempting to penetrate a
Types of reconnaissance attacks include call walking and port scanning. Call walking is a type of
reconnaissance probe where a malicious user initiates sequential calls to a block of telephone
Comprehensive VoIP Security for the Enterprise 4
numbers in order to identify what assets are available for further exploitation. Port scanning
is similar to call walking in that sequential probes are made against a block of destinations.
However, port scanning does not target end-users as call walking does, but instead targets a
group of sequential ports in a network.
Depending upon the responses that are received, the attacker then can determine which exploit
attempts might or might not work to breach the network. Using these methods, an attacker can
easily identify and gather the domain names and URLs of SIP-enabled devices that populate the
network and launch attacks against those devices.
Floods and Distributed Floods
Flood DoS and DDoS attacks are those attacks whereby a malicious user deliberately sends a
tremendously large amount of random messages to one or more VoIP end-points from either a
single location (DoS) or from multiple locations (DDoS), as shown in Figure 3. Typically, the
ﬂood of incoming messages is well beyond the processing capacity of the target system, thereby
quickly exhausting its resources and denying services to its legitimate users.
In the case of DDoS attacks, the attacker(s) will use multiple sources to launch the assault or a
single source masquerading as multiple sources to attack the target system. If the system(s) from
which the DDoS attack originates have themselves somehow been compromised, then they are
referred to as zombies.
Oftentimes, however, a ﬂood may be caused by a valid reason (such as a power failure
precipitating a ﬂood of SIP end-point registrations or a ﬂood caused by an improperly conﬁgured
DoS Attack on End-point DDos Attack on Call Server
Malicious User SIP Phone Malicious User SIP Server
Figure 3: Malicious users can launch DoS and DDoS ﬂood attacks against end-users or infrastructure
Comprehensive VoIP Security for the Enterprise 5
Fuzzing is a legitimate method of testing software systems for bugs and is accomplished
essentially by providing an application with semi-valid input to see what its reaction will be.
Then appropriate ﬁxes can be implemented, if necessary.
Malicious users, however, employ this same methodology to exploit vulnerabilities in a target
system. They do this by sending messages whose content, in most cases, is good enough that the
target will assume it’s valid. In reality, the message is ‘broken’ or ‘fuzzed’ enough that when the
target system attempts to parse or process it, various failures result instead. These can include
application delays, information leaks, or even catastrophic system crashes.
Misuse involves taking over someone’s call or making calls on their behalf which is more
commonly called spooﬁng. This is done by deliberately inserting fake data into the source IP
address-ﬁeld portion of the packet to hide the true source of the call. In this way the attacker
can ‘spoof’ a legitimate user and hijack the current session which results in the call either being
redirected or terminated, as shown in Figure 4. Spooﬁng results in misuse/abuse of the system
and a denial-of-services (DoS) to the legitimate user.
Original Call Session
Caller A Caller B
Call Session Call Session
Figure 4: Malicious user hijacks the current session and redirects the call
Comprehensive VoIP Security for the Enterprise 6
Session anomalies occur when the messages do not come in the correct sequence and therefore
neither the end-points nor the call server know how to handle the calls. When hackers or
malicious users do this intentionally, it will result in a session abuse for the VoIP system, similar
Stealth attacks are those in which one or more speciﬁc end-points are deliberately attacked from
one (DoS) or more (DDoS) sources, although at a much lower call volume than is characteristic
of ﬂood-type attacks. In addition to VoIP spam, detection of stealth attacks is vital for VoIP
systems as they have the potential to be far more annoying than what we are familiar with in the
data world. VoIP security solutions need to be more sophisticated and use different techniques to
protect against stealth and VoIP spam.
VoIP spam or Spam-over-Internet Telephony (SPIT) is unsolicited and unwanted bulk messages
broadcast over VoIP to an enterprise network’s end-users. In addition to being annoying and
having the potential to signiﬁcantly impinge upon the availability and productivity of the end-
point resource, high-volume bulk calls routed over IP are often very difﬁcult to trace and have the
inherent capacity for fraud, unauthorized resource use, and privacy violations.
Call Managers SIP Phones Soft Clients
VoIP VLAN Data VLAN
Voice Extranets Road Warrior
Figure 5: Unique VoIP threats exist from both internal and external sources
Comprehensive VoIP Security for the Enterprise 7
These attacks can be from external sources such as hackers, malicious users and spammers or
internal threats from disgruntled employees, infected PCs or email attachments, as shown in
Figure 5. What’s required to protect against them is a proactive approach to anticipating
and cataloguing the threats and attacks and then to use this expertise as the foundation of a
comprehensive solution which protect against them. The VoIP security solution must also have the
ability to be updated with vaccines against previously unidentiﬁed threats.
Drawbacks to Today’s VoIP Security
Although core VoIP assets and related infrastructure can be protected to a certain degree from direct
assault through a variety of currently available techniques, such as hardening the underlying IP
network and deploying session border controllers (SBCs), none can protect against the increasing
sophistication of attacks against the numerous vulnerabilities inherent in VoIP and related IP
Implementing a comprehensive security solution to deal with both internal and external threats from
DoS, DDoS, stealth and spam is a formidable challenge. As mentioned at the outset, the biggest
mistake an enterprise can make with securing its VoIP infrastructure is to assume that encryption
and authentication are enough to protect the network and end-users against attacks. This is not to
say that authentication and encryption are not important, but they do not protect against zombie and
As well, viruses, worms and other malicious activities frequently utilize end-user equipment to
penetrate the network, even when perimeter security mechanisms like ﬁrewalls and session border
controllers are employed. Complicating the matter further, new and emerging technologies such
as IM now represent an ever larger emerging threat to networks that completely bypass perimeter
defense devices. This has led enterprises to look for alternative security solutions.
Many of the security products which are currently available primarily focus on remediating threats by
employing various disparate technologies such as ﬁrewalls, IDS/IPS, and other security devices that
are upgraded to support VoIP in addition to their main data protection responsibilities. An example
of how a typical VoIP security solution is deployed using these equipment elements to mitigate the
inherent vulnerabilities of an IP network is shown in Figure 6.
Comprehensive VoIP Security for the Enterprise 8
Call Managers SIP Phones Soft Clients
DoS IDS/ Fire-
Filter IPS wall Spam Filter
Voice Extranets Internet Road Warrior
Figure 6: Typical multi-product VoIP security solution
At best these solutions protect against OS, IP and TCP layer vulnerabilities and attacks such as TCP
syn ﬂood, exhaustion of resources with multiple TCP, UDP DoS attacks, HTTP attacks, TCP Fin/Rst
close socket attacks and others.
These traditional solutions are not at all effective for application-level vulnerabilities in that they
cannot provide the needed functionality to effectively detect and protect against VoIP-speciﬁc
attacks such as ﬂoods, protocol fuzzing, stealth, and VoIP spam. At the same time, they cannot
protect against vulnerabilities that may be found in encrypted trafﬁc as they are unable to decrypt
and analyze the trafﬁc in real-time.
As well, because this solution represents a layered-approach to network security, in addition to the
extra hardware (application-aware ﬁrewall, IDS/IPS, and DoS protection systems) required to secure
the network, additional software must also be installed at different points to allow the hardware
components to function properly and to coordinate security monitoring and reporting functions.
Not only do these additional levels of complexity add more points of potential vulnerability, it’s easy
to see that they do not integrate well with a VoIP network due to the fact that the delay introduced
by every device collectively exceeds the security budget (2 ms for signaling and 100 µs for
media) allowed to still ensure toll quality transmission. As well, many of these devices use a store
and forward method to examine the trafﬁc which is just not feasible in the real-time world of IP
Comprehensive VoIP Security for the Enterprise 9
To quickly summarize the points above, existing solutions of this type are decidedly deﬁcient in a
number of critical ways:
• they cannot function in real-time;
• they cannot process encrypted trafﬁc;
• they do not have the capacity to detect attacks on end users;
• they result in a higher TCO as you need to upgrade multiple boxes; and
• they cannot keep in sync with new IP features or applications offered by the VoIP
Existing security measures for IP networks are at best only effective for traditional types of trafﬁc
(web access, e-mail, etc.). However, as VoIP becomes increasingly more prevalent and feature-rich,
the need for more effective and robust security solutions becomes obvious.
Comprehensive VoIP Security
Instead of deploying ineffective ‘point’ solutions, a complete security solution is required that
seamlessly incorporates all existing approaches into a single, comprehensive system, as shown in
Network Level Correlation OS IP Web database
Intrusion Detection System OS IP Web
Denial of Service Prevention OS IP database for IP
Intrusion Prevention System OS IP Web email (VoIP, IM, Video,
Firewall OS IP Web
Figure 7: Single, comprehensive VoIP security solution
Comprehensive VoIP Security for the Enterprise 10
When deployed in the enterprise, this single, comprehensive device replaces the 3 or 4 point
solutions at each location in the network, as shown in Figure 8. In most cases a ﬁrewall will still be
deployed to protect against layer 3 and 4 attacks but not the long list of VoIP speciﬁc application
level ones that were discussed above. You can immediately see the operational simplicity and
obvious cost-effectiveness compared to the solution in Figure 6.
Call Managers SIP Phones Soft Clients
Voice Extranets Internet Road Warrior
Figure 8: Simpliﬁed, comprehensive VoIP security solution for enterprise
The ideal comprehensive VoIP security solution would incorporate the best practices of data
security, from ﬁrewall, IDS/IPS, DoS prevention, network level correlation and spam ﬁltering, while
implementing sophisticated techniques to ensure unique VoIP threats are proactively recognized,
detected, and eliminated. This single solution for securing IP Communications applications would
also include the following features:
All of this functionality needs to be incorporated into a single device that is built from the ground
up using specialized hardware for real-time performance. The appliance must be able to decrypt
packets at wire-speed so that the network can be protected against threats that exist even in
encrypted trafﬁc. And it must securely store and manage these encryption keys in a separate,
tamper-proof, hardware module.
Not a point of failure
It’s also preferable that the device functions as a “bump-on-a-wire” so that no conﬁguration
changes are required to either the call manager, the VoIP phones or to any other element in the
IP network. Another high-availability feature is fail-safe port bypass functionality which ensures
the device is never an additional point-of-failure in the network.
Comprehensive VoIP Security for the Enterprise 11
Sophisticated behavior learning and veriﬁcation
An ability to continuously learn call patterns and end-point ﬁngerprints, in addition to being
able to constantly analyze raw event data based upon speciﬁc user-deﬁnable criteria and take
automatic action, would give the security solution the ability to evolve and adapt on its own to
effectively counter any new or existing threat. This would vastly increase its level of effectiveness
in ensuring that vulnerabilities are mitigated before any threat can proliferate.
This level of sophistication is really the only way to identify both stealth attacks and VoIP spam
which are vital for any VoIP security system. These types of attacks and service abuse are difﬁcult
to detect as the real-time nature of VoIP does not allow the security system the luxury of storing
the call while it’s analyzed before sending it on as is the case with email.
The VoIP security system needs to identify and verify these anomalies in real-time before passing
on the call. Once a potential anomaly is detected, it should be scrutinized further using various
veriﬁcation techniques to determine if it is in fact an attack which should be dropped or Spam
that should be sent to a speciﬁc bulk voice mailbox.
Detection of VoIP spam
Machine-generated calls are a popular tool for mass marketing concerns, although the recipients
of their messages more often than not ﬁnd the calls to be highly intrusive and annoying. In
addition, machine-generated calls are oftentimes used as automated attack tools by malicious
users to overwhelm a system and deprive its legitimate users of services. Machine-generated calls
can be detected by performing sophisticated VoIP Turing tests in the suspected trafﬁc, as shown
in Figure 9. However, when combined with behavior learning and veriﬁcation, the VoIP Turing test
can be used selectively rather than before every call which minimizes its intrusiveness.
Human Can Meet Challenge Machine Can't Meet Challenge
What is the number What is the number
between 1 and 3? Ring.. Ring between 1 and 3?
1. incoming call 1. incoming call
2. challenge caller 4. rings phone 2. challenge caller
3. answers question
Figure 9: VoIP Turing tests distinguish between machine and human callers
Comprehensive VoIP Security for the Enterprise 12
With a VoIP Turing test, the caller is challenged to respond to a question (i.e. What is the number
between 1 and 3?) which the machine cannot do. This test is very similar to the Turing tests that
you may have seen on the web when you buy tickets or register for email addresses. Many times
you are asked to enter some random numbers or letters that have been smudged like you see here.
By entering these letters, the web site doing the challenge is assured you are a human and not a
machine trying to buy blocks of tickets or register hundreds of email addresses.
Network level intelligence
A network level intelligence node needs to collect and correlate multiple events and activities
from different nodes and end-points in the network to accurately detect attacks which otherwise
might have escaped unnoticed if reported only by a single point in the network. This capability
can inspect the sequence and content of messages to detect protocol anomalies and any
instances of end-point scanning.
The primary purpose of the intelligence node is to receive the variously formatted event and alarm
reports from the different security components in the network and to store, normalize, aggregate
and correlate that information into a comprehensive format. It then passes the attack information
back to the security nodes which take the action needed to protect the network and end users, as
shown in Figure 10. This allows distributed attacks to be effectively detected and mitigated.
Challenge Calls to
Subscriber D briefly Intelligence
Device 2 Sipera IPCS
Far more calls being received than
Subscriber D's learned behavior suggests
Figure 10: Network level intelligence gives all nodes the same information in real-time
Comprehensive VoIP Security for the Enterprise 13
Not only would a single, comprehensive security solution completely replace each of the individual
VoIP security components required by the traditional solution, it inherently capitalizes on the fact
that its fundamental design philosophy is based upon a comprehensive monitoring and protection
paradigm for real-time communications. This allows the single device to protect the network
infrastructure and its end-users against attacks and other unauthorized user behavior in real-time and
ensures that vulnerabilities are mitigated before any threat can proliferate.
Currently, VoIP security solutions are merely an extension of existing data security products and
fail to adequately address the increasing complexity of VoIP networks. These traditional products
are simply not equipped to address the real-time, mission-critical nature of IP communications
applications and provide, at best, a piecemeal approach where an entire network is not secured,
leaving signiﬁcant parts of it exposed and vulnerable to attack.
Unlike data communications, VoIP is a real-time service and requires security infrastructure to
provide automated, immediate security responses to preserve the high availability and quality-
of-service (QoS) expected by telephony users. In light of these considerations, any effective and
comprehensive VoIP security system must offer:
• comprehensive protection with real-time performance
• easy deployment and not be a point-of-failure
• automatic user behavior learning
• network level intelligence
• effectively handle VoIP spam; and
• interoperability with major VoIP infrastructure vendors.
At the same time, each of these features must be provided to the network in a manner that does not
exceed the allowable security budget (2 ms for signaling and 100 µs for media) that ensures a high
QoS to the VoIP and multimedia user.
In the end, the only way to provide the required level of protection is to incorporate a variety of
sophisticated VoIP-speciﬁc security techniques and methodologies that include anomaly detection,
ﬁltering, behavior learning, and veriﬁcation into a single, comprehensive security device. Together,
these practices proactively protect the enterprise network from VoIP attacks, misuse and service
abuse which networks and end-users face today and in the future.
Comprehensive VoIP Security for the Enterprise 14