A Ready Market: Introducing H.264-SVC
- 1. A Ready Market:
Introducing H.264-SVC
Next-Generation Technology
for Videoconferencing
Over IP and 3G Networks
Page 1 Copyright © 2006 Wainhouse Research, LLC
- 3. About the Author
Andrew W. Davis, founder and Managing Partner of Wainhouse Research LLC, has more
than fifteen years experience as a technology consultant and industry analyst. Prior to
founding Wainhouse Research, Andrew held senior marketing positions with several large
and small high-technology companies. He has published over 250 trade journal articles and
opinion columns on multimedia communications, image and signal processing,
videoconferencing, and corporate strategies as well as numerous market research reports and
is the editor of the conferencing industry's leading newsletter, The Wainhouse Research
Bulletin. Andrew specializes in videoconferencing, rich media communications, strategy
consulting, and new business development. Mr. Davis holds B.S. and M.S. degrees in
engineering from Cornell University and a Masters of Business Administration from
Harvard University.
About Wainhouse Research
Wainhouse Research is an independent market research firm that focuses on critical issues in
the rich media conferencing and unified collaboration. The company conducts multi-client
and custom research studies for industry vendors, consults with end users on key
implementation issues, publishes a newsletter, white papers, and market statistics, and
delivers public and private seminars at industry group meetings.
About the Sponsor
Vidyo™ creates VidyoConferencing™ solutions that provide quality experiences to all
environments from the home-office desktop to the dedicated corporate video-conferencing
facility. Vidyo, the first company to apply the H.264 Scalable Video Coding (SVC) standard
to video conferencing, delivers HD/Telepresence quality enhanced by industry-best
resilience and low-latency — and it manages to achieve all this over general-purpose IP
networks. Vidyo’s family of end-to-end solutions for OEMs and organizations can support
multi-point connections that include a variety of different platforms ranging from Mac &
Windows desktops to dedicated room solutions. No dedicated networks required.
For more information, visit www.vidyo.com.
© Vidyo, Inc 2006 - Confidential A Ready Market - 2
- 4. Contents
Overview: Key Market Trends ................................................................................................4
Enterprise Solutions..............................................................................................................4
Consumer Applications ........................................................................................................5
3G Wireless ..........................................................................................................................5
Technology Innovations ...........................................................................................................5
Challenges Facing Visual Communications.............................................................................6
Resilience .............................................................................................................................6
Communications Quality......................................................................................................7
Scalability .............................................................................................................................8
Ease of Use ...........................................................................................................................8
Cost.......................................................................................................................................9
Introducing H.264 – SVC.........................................................................................................9
How SVC addresses today’s challenges.................................................................................11
Sponsor Information ...............................................................................................................14
© Vidyo, Inc 2006 - Confidential A Ready Market - 3
- 5. Overview: Key Market Trends
The convergence of multimedia technology with the Internet and the rapid adoption of
broadband IP services for both consumer and enterprise communications is creating strong
demand for improved digital media delivery. Video is at the forefront for home users as well
as enterprise knowledge workers. Both constituencies are now demanding higher quality
and easier-to-use video telephony and videoconferencing solutions. As a result, vendor and
service provider interest is exploding along several different deployment and service models.
Enterprise Solutions
Leading enterprise software vendors such as IBM Lotus, Microsoft, Oracle, and SAP are
working to embed voice and video communications into their next generation workflow
tools. Future users will be able to launch multimedia calls without leaving their high
level business process applications. Video will soon become a “feature” of every day
productivity solutions such as Microsoft Word and Microsoft Excel as well as CRM and
customized solutions. For example, financial services companies and online retailers are
looking to deploy next-generation call center solutions to tens of thousands of desktops
where video will provide the next level of customer intimacy.
Cisco, Avaya, Nortel, Alcatel and others are now introducing systems that deploy video
as enhancements to their IP PBX telephony solutions, a development that will expand
videoconferencing beyond the dedicated conference room. This telephony-based
approach promises to make video as easy to use as voice calling, but will require high
quality delivery over networks with variable performance. The IP PBX market is the
fastest growing segment of the PBX industry, with approximately 5,000,000 handsets
shipped in 2005 across a wide range of customers. Many of these customers are looking
to deploy video as part of the IP promise of enhanced solutions.
Cisco, Polycom, RADVISION, and a dozen other vendors are developing visual
collaboration portals that work in conjunction with directory services and other
infrastructure products and services from IBM and Microsoft. This approach promises
to deliver the “point and click” interface that customers demand for easy-to-use
scheduled and ad-hoc conferencing, but will demand solutions that operate over multiple
networks with different bandwidth limitations. Several large pharmaceutical,
manufacturing, and government enterprises are looking at this approach since it
leverages their investments in Microsoft or IBM Lotus Notes-Sametime infrastructure
while extending their communications capabilities to include video. IBM claims
20,000,000 Sametime licensees in the field, and Microsoft is working ardently to top this
number with its Live Communications Server product line; bringing video to just a small
fraction of this installed base represents a huge challenge and a huge business
opportunity as well.
All of these enterprise deployment models – embedded collaboration, PBX-based
multimedia calling, and portal solutions – promise to move desktop video from its current
state where it is in trials to scores of desktops to real deployment status where the solution is
© Vidyo, Inc 2006 - Confidential A Ready Market - 4
- 6. available to millions of enterprise workers. IT managers will be focused on cost, scalability,
and resiliency in order to make video cost-effective and efficient while pleasing (high
quality) to the user at the same time..
Consumer Applications
Instant messaging is widely adopted in the consumer space, with over 800 million user
accounts in 2005 generating over 10 billion messages per day. And the trend is clear –
consumer IM is growing from text-only to include voice and video calling enhancements.
Consumer webcam sales have surpassed the 20 million per year mark, and video instant
messaging is already becoming increasingly popular, with sessions currently running in
the billions per year. Today, AOL, MSN, and Yahoo! all support video chat; and Skype,
popular with both consumers and traveling enterprise workers, has recently introduced
video calling and limited videoconferencing. All of these services will create strong
demand for better video technology that works across a variety of consumer Internet
access services.
Many believe that the future of consumer television lies with “Internet TV” and the
ability to deliver hundreds of on-demand channels with high quality and reasonable
bandwidth constraints. Service providers in this market will need video technology that
can deliver in environments with widely varying performance parameters.
3G Wireless
In many countries, wireless networks are replacing wired ones as the primary
communications infrastructure, and now 3G networks, for which carriers have
committed hundreds of billions of dollars, are promising to provide mobile voice, data,
and multimedia. In fact, over 40 wireless service providers around the world are
currently running 3G multimedia trials and video is considered a key element in driving
demand for 3G services and 3G-capable endpoints. Success for these vendors will
require video technology that can provide consumers reasonable quality in the highly
unreliable wireless world.
Technology Innovations
Despite new video compression standards such as H.264 and ever-more powerful processors,
major challenges remain with respect to the transmission of real-time voice and video over
packet networks and the Internet in particular. Packet-switched network-supported
multimedia applications require many different transmission capabilities (bandwidth, latency,
jitter, packet loss, etc) while being delivered to a wide variety of endpoint devices operating
in homogeneous and heterogeneous bandwidth environments. Conventional video coding
systems encode video content using a fixed bit-rate tailored to a specific application. As a
consequence, conventional video coding does not fulfill the requirements of flexible digital
media applications. Hence, traditional technologies have impeded the wide-scale adoption
of video-enabled communications over IP networks. A new approach is needed.
© Vidyo, Inc 2006 - Confidential A Ready Market - 5
- 7. Scalable video coding (SVC) is emerging as that technology - an IP network-friendly coding
approach that can satisfy underlying transmission requirements ranging all the way from
HDTV over the enterprise LAN to two-way video chat sessions over an unreliable cell
phone network. SVC theory is not new. In fact, SVC has been included in MPEG-2 and
MPEG-4 and other video standards. But only recently, with advances in algorithms and
processors, has the technology been shown to be practical for real-time, two-way video
communications. SVC has the potential to be a disruptive technology, one that promises to
improve vastly the videoconferencing experience in systems where bandwidth constraints
and error resilience have limited user acceptance in the past.
The potential impact for a new technology that promises improved video quality over IP
networks is huge. While today’s videoconferencing market is approximately 125,000 room
systems and an equivalent number of enterprise desktop systems per year, the potential
desktop market for IP-savvy videoconferencing applications in the enterprise is between
5,000,000 and 10,000,000 units per year for PC-based applications, an equivalent number on
the PC consumer side, and perhaps five times that number for 3G mobile telephony users.
Challenges Facing Visual Communications
Major barriers still exist that are preventing enterprises from deploying videoconferencing
and visual collaboration tools everywhere while also slowing the adoption of Internet video
chat, mobile-phone video telephony, and friends-and-family video calling over the Internet.
These barriers can be lumped into one human-factors, and three technology categories in
addition to cost as its own category. The barriers include: resilience, communications
quality, scalability, and ease-of-use.
Resilience
Videoconferencing and video streaming applications can suffer from high packet-loss when
traveling over an IP (or 3G) network due to the underlying “best-effort” model of the
Internet protocol. In addition, bit errors can have a devastating effect on video quality. Over
the years, many packet-loss and bit error recovery mechanisms have been used in
conjunction with unreliable transport protocols to improve real-time IP network applications.
When unrecoverable packet losses occur, it is highly desirable to have a video coding
scheme that is resilient to such losses. A resilient video scheme will present gradual
degradation in video quality rather than frozen or tiled video when network-based packet
loss occurs.
Related to packet loss is bandwidth variability, a common occurrence on IP networks where
loads can fluctuate widely from moment to moment. This issue is particularly serious on
wireless networks where throughput can be hampered by multi-path fading, interference, or
noise. When a 400 kbps video stream is suddenly faced with a network availability of 300
kbps, packet loss is inevitable. In this situation it is highly desirable to have a video
compression scheme that is capable of adapting to unpredictable variations in bandwidth.
© Vidyo, Inc 2006 - Confidential A Ready Market - 6
- 8. Communications Quality
While objective measurements of quality in a videoconferencing session have always been
difficult to come by, users accustomed to “TV quality” video and “toll quality” voice over
the PSTN quickly realize that today’s technology for delivering real-time communications
over packet-switched networks often falls short. Communications quality in a
videoconference is driven largely by three factors: delay, frame rate, and resolution.
Delay (latency) causes a very un-natural communications environment; often with the effect
of having people stepping on each other. Maintaining acceptable video quality requires
keeping to acceptable total delay budgets. Total delay is the sum of the delay in the network,
the time it takes a packet to traverse the network from point A to point B, as well as
compression latency, the time it takes a video codec to perform all the algorithms required to
compress the original digital signal. Experience has shown that a key challenge is
establishing an overall delay of less than 200ms – feasible on a point-to-point call, but
beyond the state of the art with today’s IP-based videoconferencing bridges. Many research
projects have investigated alternative approaches to real-time video compression, with the
goal of reducing delay while maximizing video quality and minimizing computational
complexity.
The North American television standard (NTSC) specifies a frame rate of approximately 30
frames per second (fps) while the European standard (PAL) is set to 25 fps. These two
frame rates have become known as “real time” frame rates and are also associated with the
term “TV quality video.” However, the definition of “real time” is not set in stone. For
example, the movie industry settled long ago on a standard of 24 fps. High frame rates,
generally above 20 fps, are required to preserve motion quality, maintain lip synch, and
provide the benefits of low delay. In general, all other things being equal, higher frame rates
require higher transmission speeds or network bandwidth.
Because of the popularity of consumer still digital cameras, most people today are familiar
with the concept of image resolution; they understand that a 5 megapixel (MP) camera will
produce better images than a 1 MP camera and will enable larger print sizes while requiring
more image storage and transport bandwidth. These same concepts hold true for video. In
the case of visual communications however, independent bodies have specified certain fixed
image sizes in order to enable interoperability between vendors and systems. Many of these
standards are shown in the table below.
Most videoconferencing systems today use SIF or CIF while most laptops support XGA
screen resolutions. HD videoconferencing, with 9x the resolution of CIF, is a recent
introduction. Increasing resolution delivers sharper images with more visible detail, enables
images to be displayed in larger sizes, and creates a more pleasing visual communications
experience overall, but requires more processing power and more network bandwidth. The
© Vidyo, Inc 2006 - Confidential A Ready Market - 7
- 9. increased bandwidth needs associated with high definition further stress the requirements for
network resiliency and efficient multipoint processing.
Image Format Pixel Resolution Megapixels per Image
SQCIF 128 x 96 0.012
QCIF 176 x 144 0.025
SIF 352 x 240 (NTSC) 0.084
CIF 352 x 288 0.101
4CIF 704 x 576 0.406
4SIF 704 x 480 (NTSC) 0.338
D1 720 x 480 (NTSC) 0.346
D1 720 x 576 (PAL) 0.415
HDTV 1280 x 720 0.922
VGA 640 x 480 0.307
XGA 1024 x 768 0.786
Figure 1 Pixel resolution for different image standards
Scalability
When enterprise managers talk about scalability of videoconferencing, they are referring to
the need to support large numbers of information worker desktops with a wide range of CPU
and memory resources connected over IP networks with varying network loads. The
heterogeneous nature of receivers makes it difficult to deliver a single video stream to all
with acceptable quality. The concerns are even more poignant on the part of service
providers attempting to support hundreds of thousands or even millions of consumers using
broadband connections to the home.
Scalability also involves support for multipoint conferences – conferences involving more
than two endpoints. Multipoint chat sessions have introduced many consumers to the value
of such calls, and videoconferencing users have long demanded such capabilities in their
systems. Multipoint requirements however significantly raise the challenges for call
reliability and fault tolerance because multipoint calls often connect endpoints with very
different capabilities on networks with different performance levels into a single conference.
Ease of Use
Enterprise workers have long been disappointed with the ease of use of desktop
videoconferencing. Products have not been able to support ad-hoc calling or multipoint
conferences, and desktop solutions for video have not generally been able to support the call
answering, call forwarding, hold and transfer, and voice mail functionality that most
information workers expect. The result has been lack of acceptance and a stagnant market.
This is all in the process of changing however, as major enterprise vendors like Avaya, IBM,
© Vidyo, Inc 2006 - Confidential A Ready Market - 8
- 10. Cisco, Nortel, Microsoft and others are announcing unified communications products that
integrate voice, video, data, and instant messaging into one easy-to-use application. On the
consumer front, video chat has been integrated into free services from AOL, MSN, Yahoo,
and Google while free videoconferencing services from Skype, SightSpeed, GlowPoint and
others have made video calling easier than ever. Consumer videoconferencing is ready for
the next leap in quality with systems optimized for video delivery over packet networks.
Cost
Cost is always a factor in deploying advanced communications solutions. With the
proliferation of high performance personal computers, the use of dedicated silicon for audio-
video processing on the desktop is becoming less and less price-competitive. Furthermore,
as video communications proliferate through enterprise IP-PBX systems and web
conferencing solutions as well as through consumer IM and chat services, the need to
provide a cost-effective multipoint solution becomes an equally important challenge.
Current multipoint options that force the customer to choose between 1) feature-rich,
expensive solutions (about $4K/concurrent user) that introduce reduced quality and delay
performance and 2) lower cost solutions that offer minimal features and video quality will
give way to next-generation solutions that combine features, performance, and error
resilience with a much lower implementation cost.
Introducing H.264 – SVC
Scalable video coding (SVC), a technique that enables a video stream to be broken into
multiple resolutions, quality levels and frame rates, is appealing for applications where the
bandwidth available cannot be guaranteed – for example Internet video, video telephony,
and wireless communications. SVC designs were first offered for systems intended for one-
way delivery of video over packet-switched networks; Vidyo is the first company to apply
SVC technology to two-way video communications and specifically to the challenges of
point-to-point and multipoint videoconferencing. We should note, however, that there is a
standardization process under way jointly between ITU-T VCEG and ISO MPEG to develop
a recommendation for H.264-SVC; this process is expected to lead to a ratified standard by
the end of 2006.
The purpose of any video compression algorithm is to exploit both the spatial and temporal
redundancy of video information so that acceptable video quality can be received at the far
end while using as few bits as possible to transmit the signal. A non-scalable video encoder
generates a single compressed bitstream. A scalable video encoder compresses a raw video
sequence into multiple layers (see diagram). One of the compressed layers is the base layer.
The base layer can be independently decoded and can provide a relatively low level of video
quality. Additional compressed layers are enhancement layers that provide additional
quality to the received video stream. Enhancement layers can be decoded only in
conjunction with the base layer. The complete bitstream would consist of the base layer and
all the enhancement layers and would provide the very best video quality.
© Vidyo, Inc 2006 - Confidential A Ready Market - 9
- 11. On a QoS-enabled IP network it would even be possible to send the base layer with a higher
priority than the other layers.
Figure 2 Scalable video coding principles
Enhancement layers can be created in the temporal, spatial, and quality realms. Hence,
compared to decoding the complete bitstream, decoding the base layer and only some of the
enhancement layers produces video with degraded quality (known as signal-to-noise ratio or
SNR), or smaller resolution images (spatial scalability), or a lower frame rate (temporal
scalability).
The SVC algorithm sits on top of the base video compression scheme enabling
videoconferencing vendors to offer SVC as an enhancement to their existing products, and
more importantly, to position H.264-SVC as backwards compatible with non-scalable H.264
products in the field.
While SVC can theoretically be implemented as a direct client-to-client communication
system, future implementations will benefit greatly from some intelligence in the network.
Routers, for example, could understand which packets are base-layer packets and treat these
with higher priority, or routers could signal encoders that the network is congested so that
the coder could degrade gracefully to fit available bandwidth. But a more likely
implementation is a client-server architecture based on industry standard hardware and
operating systems. Such an H.264-SVC server could not only provide multipoint
capabilities for videoconferencing, it could also adjust the bitstream to each endpoint
© Vidyo, Inc 2006 - Confidential A Ready Market - 10
- 12. individually depending on that endpoint’s coding and network capabilities as well as provide
improved resiliency for packet loss.
Because an H.264-SVC server sees each participant in a multipoint call as an individual
stream, each endpoint has the capability of signaling the server how it wants to receive each
stream. For example, in a four-way call, endpoint A might be able to decode the full
bitstreams from endpoints B, C, and D and display each of the four video windows in VGA
resolution. The 2x2 array on the monitor would therefore be the equivalent of 1280 x 960,
nearly the same as HDTV, but without the need for HD cameras. Meanwhile endpoint B,
with less processing power and network bandwidth might decode only the base layer and
one enhancement layer and hence display all the videos in CIF format.
How SVC addresses today’s challenges
While H.264-SVC does not directly address issues surrounding ease-of-use, the technology
folds seamlessly into today’s videoconferencing architectures and addresses the issues
surrounding visual communications on unreliable networks. H.264-SVC will make video
useable on a wide variety of consumer and enterprise endpoints and networks.
Network Resilience Challenge
Besides the ability to decode with different bandwidth and computation resources,
scalable video coding technology offers graceful degradation and the ability to
withstand error bursts.
Because of this architecture, SVC is extremely resistant to packet loss. Rather than
lose an entire frame or series of frames, performance degrades gradually because the
user continues to receive the base layer and possibly one or more enhancement
layers as well. When packet loss occurs, an H.264-SVC image will lose frame rate
or quality or image resolution smoothly rather than freeze or drop out together.
While video from non-scalable coders typically breaks down completely at less than
1% packet loss without some error recovery mechanisms and at approximately 5-8%
with sophisticated error correction schemes (that require additional bandwidth),
H.264-SVC has been shown to operate with better-than-usable video performance
with up to 60% packet loss.
Quality Challenge
H.264-SVC provides extremely low delay conferencing. This is particularly
important for multipoint calls where delay is a major contributor to the perception of
poor quality.
An H.264-SVC client/server architecture is resolution-independent and can support
HD resolution images. The performance of the endpoint is dependent on only the
total number of pixels displayed - a single 4CIF image or four CIF images will result
in the same performance. The performance of the server is not dependent on the
resolution being displayed.
© Vidyo, Inc 2006 - Confidential A Ready Market - 11
- 13. Scalability Challenge
SVC has a built-in adaptive rate control and intelligent routing capabilities that can
send each participant the video stream that it can handle best. This minimizes
network bandwidth use while also maximizing video efficiency.
SVC fits well with current network architectures, where quality of service can be
assigned to different media types. Running the base video layer with high priority
and the enhancement video layers with lower priority can save network bandwidth
and deployment costs while making large-scale video deployments more feasible.
H.264-SVC servers can run on today’s industry standard hardware and software
platforms and fit into common management systems. The technology is low cost,
easily distributed, and fits neatly into any enterprise management system.
The software system is cost-effective for both point-to-point and multipoint IP video
communications. Simplified media mixing eliminates the need for
compression/decompression in the server when supporting multipoint calls while
also substantially reducing latency. The SVC approach provides all or nearly all the
features associated with traditional high-end bridging systems such as rate matching
and personal layout control without introducing multipoint delay or degraded video
quality and without requiring specialized, expensive hardware.
Cost Challenge
H.264-SVC addresses costs in several significant ways. Because the algorithm can
be implemented as software on industry-standard processors, H.264-SVC systems
do not require specialized, expensive hardware. H.264-SVC clients can run on
ordinary personal computers as well as 3G handsets and PDAs. Moreover,
important infrastructure components such as multipoint bridges can support H.264-
SVC connectivity on industry-standard server platforms.
H.264-SVC can provide superior video performance on ordinary network
connections provided to consumers and SOHO customers as well as on typical
enterprise LAN architectures, reducing the need for specialized, high-cost network
designs or services.
© Vidyo, Inc 2006 - Confidential A Ready Market - 12
- 14. Conclusion
Video-telephony and videoconferencing have never reached their full potential in either the
enterprise market or the consumer space because the technology has been hampered by
network issues and video quality limitations. While the ultimate acceptance of video
communications will rely on IP protocols and packet-switched networks, video compression
technology has been based on designs and thinking associated with traditional circuit-
switched networks. H.264-SVC is new technology that combines the elegant performance
of H.264 with the network resiliency and scalability that come with scalable video coding.
H.264-SVC can be implemented with today’s industry-standard, low-cost hardware
platforms and promises to take video communications to the enterprise desktop as well as
the 3G mobile videophone.
© Vidyo, Inc 2006 - Confidential A Ready Market - 13
- 15. Sponsor Information
About the Sponsor
Vidyo™ creates VidyoConferencing™ solutions that provide quality experiences to all
environments from the home-office desktop to the dedicated corporate video-conferencing
facility. Vidyo, the first company to apply the H.264 Scalable Video Coding (SVC) standard
to video conferencing, delivers HD/Telepresence quality enhanced by industry-best
resilience and low-latency — and it manages to achieve all this over general-purpose IP
networks. Vidyo’s family of end-to-end solutions for OEMs and organizations can support
multi-point connections that include a variety of different platforms ranging from Mac &
Windows desktops to dedicated room solutions. No dedicated networks required.
Contact Information
433 Hackensack Avenue
Hackensack, NJ 07601
732 290 7468
info@vidyo.com
www.vidyo.com
© Vidyo, Inc 2006 - Confidential A Ready Market - 14