SlideShare a Scribd company logo
1 of 45
Download to read offline
architekture.com, inc.                     TM




           design with intelligence




Optimizing Video Conferences with
  Macromedia Flash Technologies

                                   Jim Cheng
                   jim.cheng@architekture.com

                                   Allen Ellison
                 allen.ellison@architekture.com




                             February 2005
Copyright © 2005 Architekture.com, Inc. All rights reserved.

This white paper is for information purposes only. ARCHITEKTURE.COM MAKES NO WARRANTIES,
EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Macromedia, Macromedia Flash, Flash Communication Server, and Flash Player are either trademarks
or registered trademarks of Macromedia, Inc. in the United States and/or other countries. The names of
actual companies and products mentioned herein may be the trademarks of their respective owners.

ARCHITEKTURE.COM, INC.
600 GRANT STREET SUITE 850
DENVER, CO 80203
(720) 231-3166


                                                       ii
                             Copyright 2005, Architekture.com, All Rights Reserved.
INTRODUCTION
It is well known that the combination of Macromedia Flash Communication Server
and Macromedia Flash Player offers many exciting possibilities for live video
conferencing. The task of choosing optimal hardware selections and software settings,
however, has remained quite burdensome and arcane. All too often, developers have
to deal with audio synchronization, frozen video images, and lag issues. Even for
seasoned Macromedia Flash developers, the task of implementing quality Flash-based
video conferencing applications becomes a challenge when confronted with the
bewildering selection of cameras, network configurations, and software settings.

However, the ability to create high-quality video conferencing experiences in Flash is
essential to meeting client expectations for many of today’s cutting-edge Flash
Communication Server applications. In the course of developing such applications for
a variety of clients during 2004, Architekture.com has conducted significant research
on optimizing high-bandwidth video conferencing applications with the goal of finding
a good balance between video and sound quality, and limiting the use of CPU and
network resources to mitigate problems associated with skipped frames, lag, or out-of-
sync sound. We are pleased to present our findings and recommendations to the
Flash developer community in this white paper.

Architekture.com is a leading Macromedia Flash development firm with recognized
expertise in Flash Communication Server. Our world-class development team creates
cutting-edge solutions that push the limits of what is thought possible. We specialize in
the development of immersive, real-time multi-player simulations, as well as rapid
prototype development and real-time business collaboration applications.




                                                  iii
                         Copyright 2005, Architekture.com, All Rights Reserved.
CONTENTS
Introduction........................................................................................................ iii
Why Optimization Matters ................................................................................... 1
Focusing on the Client Side ................................................................................. 1
Testing Environment ............................................................................................ 2
Hardware........................................................................................................... 2
   Cameras ........................................................................................................ 2
   Microphones................................................................................................... 8
   Networking..................................................................................................... 8
Software Settings ................................................................................................ 9
   Camera Settings.............................................................................................. 9
      Camera.setMode() ....................................................................................... 9
      Camera.setQuality()................................................................................... 10
      Camera.setKeyFrameInterval()..................................................................... 13
   Microphone Settings ...................................................................................... 13
      Microphone.setRate() ................................................................................. 13
      Microphone.setGain() and Microphone.setSilenceLevel()................................ 13
      Microphone.setUseEchoSuppression() .......................................................... 14
   Buffer Times.................................................................................................. 14
   Embedded Video Sizes................................................................................... 14
   MovieClip.attachAudio() ................................................................................ 15
   Stream Latency.............................................................................................. 15
Scaling ............................................................................................................ 16
   Flash Communication Server Limitations.......................................................... 16
   Network Limitations ....................................................................................... 17
   Client Machine Limitations ............................................................................. 18
   CPU Utilization and Resolution ....................................................................... 19
Summary ......................................................................................................... 21
Appendix A: Error Margins and Significance ........................................................ 22
Appendix B: Detailed Experimental Setups and Results.......................................... 23
   Camera Testing ............................................................................................ 23
   Encoding/Decoding and CPU Utilization ......................................................... 27
   Video Settings ............................................................................................... 30
   Scaling......................................................................................................... 34
Appendix C: Where to Download Test Files ......................................................... 38
Appendix D: IIDC/DCAM Camera List ................................................................ 39




                                                           iv
                                  Copyright 2005, Architekture.com, All Rights Reserved.
WHY OPTIMIZATION MATTERS
Many-to-many video conferencing on desktop computers requires significant
quantities of resources, both in terms of processor utilization and network bandwidth.
In order to achieve optimal results, it is necessary to find a good balance between
video and sound quality that limits the use of resources to a level where processor and
network loads do not introduce deleterious effects such as frame skipping, lag, or out-
of-sync sound into the video conference experience.

Poor choices in hardware selection and improper software settings often contribute to
a poor video conferencing experience, and the bewildering number of options often
makes it seem next to impossible to create high-quality video conferencing
experiences, even with best-of-breed tools. This discourages both clients and
developers alike, and convinces many that even with today’s technologies, video
conferencing applications are difficult to use and cannot meet the promise of rich
audio and visual communication between groups of individuals.

Judicious choices of optimal hardware configuration and software settings, however,
can make all the difference between a glitchy and nearly useless video conference
application, and an impressive high-quality experience that exceeds client
expectations. In the course of developing rich video conferencing applications using
Macromedia technologies, we at Architekture.com have spent many hours
determining best choices in specifying and configuring collaborative video
conferencing products for our clients. We hope that sharing our results with the Flash
developer community will lead to the development and release of many high-quality
video conferencing applications in the future.


FOCUSING ON THE CLIENT SIDE
Although Flash Communication Server plays a crucial role in facilitating video
conferencing with Flash technologies, for the most part it only serves to relay streams
from one client machine to another in live video conferencing situations. In our testing
environments, we have noted that even fairly modest server hardware setups such as a
single 2.8 GHz Pentium 4 system with 512 MB of RAM can easily accommodate
relatively intensive video conferencing situations that push the limit of a typical
professional license.

The limitations affecting video conferencing performance are instead mainly
concentrated on the client side, because this is where the bulk of the work is done.
When publishing a stream, the client machine has to acquire video and audio data,
encode it, and push it across the network to the server, all in real time. And in a many-
to-many conferencing situation, the same machine will need to subscribe to streams
published by all of the other participants, decode them in real time, and present the

                                                  1
                         Copyright 2005, Architekture.com, All Rights Reserved.
results onscreen and through the speakers or headphones—this too in real time (or as
close to it as possible). Consequently, our optimization research and
recommendations focus nearly entirely on the client-side systems.

TESTING ENVIRONMENT
Principal testing was conducted in the Architekture.com development laboratory on a
hyper-threaded 2.8 GHz Pentium 4 computer running Windows XP Professional SP1
with 1.25 GB of RAM. The Flash Communications Server application runs on a similar
processor with 512 MB of RAM under Windows Server 2003. These machines are
connected on a 100 Mbps Ethernet LAN through a switch, and tests were conducted
with in-house testing utilities running under Flash Player 7.0.19.0. We also conducted
some additional testing on machines belonging to clients for proprietary video
conferencing applications.

HARDWARE
A developer's ability to make or suggest hardware configurations for use with an
application will vary depending on client requirements. However, we have found that
the choice of hardware goes a long way in affecting the overall video conferencing
experience. Even if you are building a video conferencing application for the web and
have no control over the hardware configurations of client machines, these findings
may help in determining minimum system requirements and in optimizing software
settings for an expected range of client machines and network configurations.

Our goal in making effective hardware choices for optimal performance is to minimize
the load on the client processor and network while maintaining a high-quality audio
and video stream. During our tests, we found that high processor loads were strongly
correlated with poor performance, because the CPU’s time became divided between
processes supporting the video conference and other applications contending for
processor time.

Maintaining reasonable network loads is an important secondary consideration,
particularly in low-bandwidth settings, because available network bandwidth directly
limits the amount of data that can be transferred between the client machine and
Flash Communication Server.

CAMERAS
Cameras play a basic role in acquiring the video signal for conferencing applications.
However, the video signal itself usually requires some degree of additional processing
by the CPU before it is ready for use by the client Flash Player. Equally important are
the drivers used to interface the camera with the operating system, because poorly



                                                  2
                         Copyright 2005, Architekture.com, All Rights Reserved.
written camera drivers coupled with a camera’s high data throughput can place even
greater demands on the processor.

For most video conferencing applications, camera resolutions greater than 640 x 480
and frame rates greater than 30 frames per second (fps) are generally not necessary.
Furthermore, consumer-level cameras intended for use with video conferencing
applications seldom provide resolutions and frame rates higher than these for real-
time video feeds. Because of this, we will limit our discussion to these cameras and will
not consider those with higher resolutions or frame rates that are typically used for
scientific and industrial applications.

Most cameras designed for video conferencing use one of two serial bus architectures
for communication with the client machine: USB (typically the faster 2.0 specification),
or Firewire, also known as IEEE 1394. Firewire cameras can also be further divided in
two categories based on data transfer protocol: DV (digital video) cameras, which
provide a compressed data stream to the computer, and IIDC/DCAM cameras, which
output uncompressed data streams and also offer camera hardware control over the
Firewire bus.

Our tests, as well as available documentation, suggest that there are significant
differences in terms of overall processor demands between the various protocols used
to transfer data from the camera to the computer. To determine the processor use
required to handle video acquisition for different cameras, we conducted tests with
three representative cameras using different bus and protocol combinations for
transferring data to the client machine under identical resolution and frame rate
settings.

For our tests, we used the following cameras: Apple iSight, an IIDC/DCAM-compliant
webcam that connects through a 400-Mbit Firewire bus; Sony DCR-TRV460, a DV-
compliant consumer camcorder that also connects through 400-Mbit Firewire bus;
and Creative Labs NX Ultra, a higher-quality USB webcam.

All cameras were specified by their manufacturers as having a maximum live video
resolution of 640 x 480 pixels as well as the capability of yielding streams of up to 30
fps (with the exception of the Creative NX Ultra camera, which was limited to 15 fps
according to manufacturer specifications). Although the Sony DCR-TRV460 camera
also sports a USB connection, we only used its Firewire DV connection for our tests.
Table 1 provides an overview of the cameras we used for our tests.




                                                  3
                         Copyright 2005, Architekture.com, All Rights Reserved.
Table 1: Basic Camera Capabilities

              Camera                   Data Bus                   Max. Resolution      Max. FPS
            Apple iSight            1394 IIDC/DCAM                  640x480              30
         Sony DCR-TRV460               1394 DV                      640x480              30
          Creative NX Ultra                USB                      640x480              15

We measured CPU utilization for locally visualizing video output at varying resolutions
and frame rates using each camera. To isolate the processor requirements needed to
process the video signal and import it into Flash, we conducted these tests entirely
locally using a simple Flash application running under Flash Player 7.0.19.0 without
Flash Communication Server integration.

Resolutions tested were all at the standard definition ratio of 4:3: 160 x 120, 200 x
150, 240 x 180, 320 x 240, 400 x 300, and 640 x 480 at frame rates of 1, 5, 10,
15, 24, and 30 fps. CPU utilization was measured using Windows Task Manager and
averaged over roughly 30 seconds of video acquisition with all other applications and
non-essential processes disabled.

Although data points were obtained for all cameras at our test resolutions and frame
rates, no camera supported all the resolutions natively. Actual resolution and frame
rate can be assessed programmatically after making a Camera.setMode() call
through the camera object’s width, height, and currentFps properties for
comparison. When unsupported resolutions or frame rates were requested, Flash
typically causes the video stream to be returned from the camera at a lower resolution
and scaled up for display with fairly obvious pixelization.

Figure 1 shows example frame captures illustrating this pixelization effect.




       Creative Labs NX Ultra 240 x 180                                  Apple iSight 240 x 180
       (Camera Resolution: 176 x 132)                                (Camera Resolution: 240 x 180)

                     Figure 1: Sample frame captures illustrating pixelization
In this example, a resolution of 240x180 was requested of both the Creative Labs NX
Ultra and the Apple iSight cameras. The NX Ultra, which does not support a 240x180

                                                       4
                              Copyright 2005, Architekture.com, All Rights Reserved.
capture resolution, is instead yielding a 176 x 132 stream, resulting in pixelization as
Flash scales up the image to the display resolution of 240 x 180. On the other hand,
Apple iSight natively supports a 240 x 180 capture resolution, resulting in significantly
better picture quality.

Table 2 lists the supported resolutions for each camera in the test set.

                           Table 2: Supported Camera Resolutions

                       160 x 120     200 x 150        240 x 180     320 x 240     400 x 300   640 x 480
  Apple iSight            Yes            Yes            Yes             Yes          No         Yes
  Sony DCR-TRV460         Yes            No             No              Yes          No         Yes
  Creative NX Ultra       Yes            No             No              Yes          No         Yes

The cameras tested do not all support the same range of resolutions and frame rates.
For this reason, we focused our analysis on configurations supported by multiple
cameras to determine comparative performance, even though data points were
obtained for a significantly larger set of configurations. In particular, the 160 x 120,
320 x 240, and 640 x 480 resolutions allowed commensurate comparisons between
the cameras at various frame rates up to 15 fps for all cameras, and up to 30 fps for
the Sony DCR-TRV460 and the Apple iSight cameras.

We also made a number of fairly interesting observations with regard to frame rates.
In the case of the Creative NX Ultra camera, Flash was successfully able to request
and receive video streams at frame rates up to 30 fps as reported by the
Camera.fps property, although the camera itself is specified as having a maximum
frame rate of 15 fps. We suspect this might be due to inaccurate reporting on the part
of the driver or software-level interpolation. The results from our experiments do not
yield conclusive evidence for either possibility.

Also, although the Apple iSight camera is not officially supported on the Windows
platform, we were able to use it with the default Microsoft drivers for 1394 desktop
cameras. However, when using this driver, the frame rate was capped at a maximum
frame rate of 15 fps. Using the third-party Unibrain Fire-i IIDC/DCAM driver instead
enabled us to reach the specified hardware maximum frame rate of 30 fps as shown
in Figure 2.

It should also be noted that the Creative Labs NX Ultra camera yielded significantly
noisier CPU utilization data than the other cameras during testing. We presume this is
due to USB bus usage by other devices, including our keyboard and mouse, but could
not conclusively determine the source.

Overall, the processor load results came in strongly in favor of the IIDC/DCAM-
compliant Apple iSight camera. Processor utilization for image acquisition and
importing in Flash was roughly half that required for the other two cameras at the

                                                  5
                         Copyright 2005, Architekture.com, All Rights Reserved.
same resolution and frame rate in all comparable cases, with the Unibrain Fire-i driver
slightly outperforming the Microsoft driver.

Processor utilization was roughly comparable between the Sony DCR-TRV460 and the
Creative NX Ultra cameras at low resolutions. At a resolution of 320 x 240, the DV-
compliant Sony DCR-TRV460 camera came out in the middle and outperformed the
Creative Labs NX Ultra camera, although at 640 x 480, the Sony DCR-TRV460
camera came in last when used with higher frame rates.

Also, as expected, processor utilization increases with higher resolutions and frame
rates.

From a hardware perspective, we recommend the use of IIDC/DCAM-compliant
cameras, because the uncompressed data stream appears to reduce significantly the
overhead needed to process the image for consumption by Flash, particularly if
processor resources are at a premium (for example, slower machines, visually rich
user interfaces, or video conferences involving more than two participants).

Figure 2 shows graphs of experimental results for various requested resolutions at
reported frame rates of 15, 24, and 30 fps (lower CPU utilization is better). Note that
resolutions other than 160 x 120, 320 x 240, and 640 x 480 are not directly
commensurable between cameras due to differences in actual hardware resolution.




                                                  6
                         Copyright 2005, Architekture.com, All Rights Reserved.
Resolution vs. CPU Utilization - 15 FPS


                      25




                      20




 % CPU Utilization
                      15




                      10




                       5




                       0
                            160x120     200x150             240x180                320x240   400x300   640x480
                                                                      Resolution



                                                  Resolution vs. CPU Utilization - 24 FPS

                      25




                      20
% CPU Utilization




                      15




                      10




                       5




                       0
                            160x120     200x150             240x180                320x240   400x300   640x480
                                                                      Resolution



                                                  Resolution vs. CPU Utilization - 30 FPS


                      25




                      20
  % CPU Utilization




                      15




                      10




                       5




                       0
                            160x120     200x150             240x180                320x240   400x300   640x480
                                                                      Resolution




                           Figure 2: Resolution versus CPU utilization graphs



                                                                      7
                               Copyright 2005, Architekture.com, All Rights Reserved.
MICROPHONES
One of the most common problems we encountered with microphones used for video
conferencing was the introduction of unwanted echoes and background noise.
Although Flash does provide an option for echo suppression via software, we have
found that we were able to obtain significantly better results when we reduced the
incidence of echoes and irrelevant background noise on the hardware level through
proper microphone selection. Echoes and ambient noise are particularly undesirable,
because they not only make speech less intelligible, but the unwanted sounds also
interfere with our ability to set accurately the silence level needed to toggle the
microphone activity state.

In the course of developing video conferencing applications for our clients, we have
experimented with a number of different microphone setups, including analog
headsets, USB headsets, and discrete microphone and speaker combinations to
determine the best configurations for obtaining high-quality sound capture while
minimizing unwanted noise. The best setup for reducing echo and ambient noise we
have found so far seems to be with noise-canceling USB headsets.

Additional improvements to audio quality that can be made through software will be
discussed later.

NETWORKING
Our video conferencing application development is, for the most part, geared towards
high-bandwidth intranet applications. For this reason, we primarily conduct our testing
over 100 Mbit Ethernet connections, with and without non-RTMP “noise” traffic. In our
experiments with up to 5 actual participants and simulated conferences involving up to
10 participants, we have not encountered any problems with network saturation thus
far. For LAN-based intranet applications, a 100 Mbit Ethernet setup appears to be
quite sufficient for video conferencing. We have not tested other local network
technologies such as 802.11, but results would be similar to those we have obtained
given ample bandwidth and network latencies commensurate with 100 Mbit Ethernet
connections.

High-quality live video conferencing over high-bandwidth Ethernet connections is
possible even at relatively high resolutions such as 320 x 240 for small numbers of
simultaneous participants. Additionally, bandwidth utilization can be capped at
reasonably low levels (for example, 38,400 bytes per second per video stream)
without significant loss of video quality given a judicious choice of video encoding
parameters as we describe later.

For lower bandwidth usage such as across the Internet, available bandwidth will be
markedly lower than that available on a LAN, and latency—the amount of time
elapsed from when the video has been encoded on one machine to when the video

                                                  8
                         Copyright 2005, Architekture.com, All Rights Reserved.
has been decoded on the recipient machine—will be increased. These issues are
essentially the facts of life when developing Internet-based applications. However, they
can be dealt with fairly effectively by minimizing bandwidth usage and allowing for
increased latency.

It should also be noted that for many-to-many video conferencing, the bandwidth
required grows exponentially relative to the number of participants. We discuss this
issue in greater depth shortly when we consider network limitations on scaling. This is
particularly relevant for cases of limited bandwidth, but is an important concern when
dealing with collaborative video conferences with increasing numbers of participants.


SOFTWARE SETTINGS
We have experimented with a large number of the possible software settings in Flash
Player 7 for video conferencing and have documented our observations in this section.
In particular, we have found that many of the typical glitches observed in video
conferencing can be addressed with changes in the settings used in the Flash Player
client-side communication objects. We also review several other interesting items that
we have found in engineering video conferencing applications.

CAMERA SETTINGS
The principal methods for manipulating the camera object in Flash Player are
setMode(), setQuality(), and setKeyFrameInterval(). As the camera
object is responsible for generating the bulk of the data needed to be streamed to
Flash Communication Server, the settings here have a significant effect on both the
video quality and the overall video conferencing experience.

We’ll consider each of these methods in turn and discuss the possible options for each
setting and our observations, test results, and recommendations for configuring an
optimal video conferencing experience.

Camera.setMode()
The Camera.setMode() method allows specification of the desired resolution and
frame rate for the video data being collected. Of course, only certain resolutions and
frame rates are supported natively by each physical camera due to hardware
limitations. If the settings specified are not intrinsically supported by the camera, Flash
Player will instead fall back to the closest possible setting. The capture size will be
favored over frame rate by default, but the preference of capture rate over frame rate
can be changed through the optional favorSize flag. While this behavior does
allow specification of practically any resolution and frame rate, we have found that
using unsupported resolutions is undesirable, because it usually results in a pixelated
image (as shown in Figure 1 earlier).


                                                   9
                          Copyright 2005, Architekture.com, All Rights Reserved.
From experience, we have found that resolutions of 160 x 120 and 320 x 240 tend to
be good choices because they seem to be supported natively by many typical cameras
used for video conferencing applications, and they are small enough to function well
when encoding for streaming. It is possible to detect programmatically whether the
specified size and frame rate were actually used for the camera hardware by
inspecting the read-only width, height, and currentFps properties.

From our previous tests conducting basic video capture without encoding for network
transport, we observed that lower resolutions and frame rates reduce the processor
demand on the machine. With this in mind, we recommend choosing the lowest
acceptable capture size and frame rate for an application. For high-bandwidth
intranet applications, we have found that a resolution of 320 x 240 at 24 fps works
relatively well for up to five simultaneous participants. For conferences intended to be
conducted across the Internet through broadband connections, capture size and frame
rate will need to be scaled down accordingly.

Camera.setQuality()
Camera.setQuality() allows specification of both the maximum bandwidth per
second to be used by an outgoing video stream, and the required video quality of the
outgoing stream. By default, these are 16384 and 0, respectively. These settings allow
for the choice of different setups, each with its own benefits.

Either parameter can be set to zero to allow Flash to automatically use as much
bandwidth as necessary to maintain a specified video quality, or to throttle video
quality to avoid exceeding the given bandwidth cap. The video quality can also be set
to 100 to use the lossless non-compressing codec instead. Also, an exact bandwidth
limit and a required video quality can be specified when both are equally important.

We have been unable to determine significant differences in processor utilization
between the various setups. However, our experiments revealed marked differences in
how Flash handles the edge cases where quality or bandwidth must be sacrificed to
remain within the specified limits. In particular, we focused on settings intended for use
in intranets with high-bandwidth network connectivity.

For the case where both a maximum bandwidth and desired frame quality are
specified, we found that a bandwidth limit between 400,000 and 900,000 bytes per
second and a frame quality setting of 60 to 85 gave very acceptable results with
smooth playback and no audio synchronization issues.

Lower frame quality settings yielded increasingly pixelated video as expected. Low
bandwidth limits, however, yielded skipped frames as described in the camera object’s
documentation.




                                                 10
                         Copyright 2005, Architekture.com, All Rights Reserved.
We also note that in cases where we chose relatively high bandwidth caps, the actual
outgoing bandwidth usage seemed to reach a maximal upper limit below the specified
cap. For example, we observed total bandwidth usage to seldom exceed 250,000
bytes per second for a 320 x 240 stream captured at 24 fps despite the fact that
maximum bandwidth was allocated for video and that the server-to-client maximum
total bandwidth on the Flash Communication Server application was set to higher
values.

With the frame quality specified and bandwidth usage left up to Flash (set to zero), we
conducted a series of experiments to determine actual bandwidth usage and observed
video quality for various frame quality settings under simulated video conferencing
conditions by publishing and self-subscribing to the same stream with the settings
recommended by Giacomo “Peldi” Guilizzoni on his weblog.

                         Table 3: Camera.setQuality() Basic Settings

                     Bandwidth:                                                0
                     FPS:                                                     24
                     Favor Size:                                               0
                     Frame Quality:                                     As Below
                     Key Frame Interval:                                       8
                     Camera Width:                                          280
                     Camera Height:                                         208
                     Buffer Time:                                              0
                     Audio Rate:                                         22 MHz

Table 4 shows the results we obtained for each specified frame quality. Outgoing
bandwidth usage per second and processor utilization were averaged over 30 seconds
of simulated video conference usage with intermittent audio and relatively little
physical motion.

                           Table 4: Variable Frame Quality Results

 Frame Quality   Bandwidth/Sec.        CPU Util. (%)                       Subjective Findings
     100            250,000                 33               Excellent picture, marked frame skipping
      90             68,000                 29               Excellent picture, some frame skipping
      80             36,000                 30               Excellent picture, occasional frame skipping
      70             24,000            Not Measured          Faint pixelization, smooth playback
      60             19,000            Not Measured          Mild pixelization, smooth playback
      50             13,000            Not Measured          Medium pixelization, smooth playback
      40             11,000            Not Measured          Loss of fine detail, smooth playback
      30             10,000            Not Measured          Moderate loss of detail, smooth playback
      20              9,000            Not Measured          Severe loss of detail, smooth playback
      10              8,000                 27               Loss of gross detail, smooth playback




                                                 11
                         Copyright 2005, Architekture.com, All Rights Reserved.
From the data, we observed that CPU utilization dropped rather slowly with decreasing
frame quality. High frame quality yielded very high quality pictures at the cost of frame
skipping, whereas specifying lower frame quality yielded smooth playback by
sacrificing detail. The sweet spot, as it were, seems to be at about a frame quality
between 70 to 80.

It is also rather interesting to note that at a frame quality of 100 (using the lossless
codec with no compression, and causing exceptionally high bandwidth consumption),
the CPU utilization seems to be somewhat greater than when the frame quality is set to
lower values and the video data compressed.

Using similar settings with the frame quality set to 80 and varying the specified
bandwidth, we repeated the experiment to obtain the results shown in Table 5.

                             Table 5: Variable Bandwidth Results

Spec. Bandwidth    CPU Use (%)                               Subjective Findings
    19,200              30              Smooth, significant pixelization upon movement
    38,400         Not Measured         Smooth, some pixelization upon movement
    51,200         Not Measured         Occasional frame skips, pixelization on gross movement
    76,800         Not Measured         Frequent frame skips, pixelization with extreme movement
   128,000         Not Measured         Frequent frame skips, high-quality picture
   192,000         Not Measured         Frequent frame skips, high-quality picture
   256,000         Not Measured         Very frequent frame skips, high-quality picture
   384,000              30              Constant frame skip, high-quality picture

Here, the trade-off seems to be in smooth video playback versus greater pixelization
upon movement. If the video image is very still over time, a high-quality picture can be
obtained for practically all the specified bandwidths. However, this is somewhat
impractical for most video conferencing applications where one would expect at least
a small amount of movement. The sweet spot here for a frame quality of 80 is
apparently somewhere between 38,400 to 51,200 bytes per second, though 38,400
is quite acceptable if you don’t mind momentary pixelization upon a video conference
participant’s sudden movement. Processor utilization, however, appears to be fairly
constant throughout.

Allowing Flash to modulate the frame quality as needed has the considerable benefit
of keeping the bandwidth usage capped to relatively low levels without significantly
sacrificing image quality. This is particularly important for low-bandwidth usage
scenarios, such as video conferencing over the Internet, and for scaling video
conferences to larger numbers of simultaneous participants for intranet use. It is our
preferred setting, because momentary pixelization upon gross movements is
considered preferable to frequent and unpredictable frame skipping.

However, each application may benefit from experimentation with various bandwidth
and frame quality settings, depending on requirements and preferences. Alternatively,

                                                 12
                         Copyright 2005, Architekture.com, All Rights Reserved.
Guilizzoni offers a rather handy calculator for choosing these settings with a number
of configurable options at:

        http://www.peldi.com/blog/archives/2005/01/generating_opti_1.html

Camera.setKeyFrameInterval()
The key frame interval determines how often a full key frame is published to the
stream, as opposed to an interpolated frame generated by the codec. Flash allows
values ranging from 1 to 48, with the default being 15 (every fifteenth frame is a key
frame). Testing with varying values for the key frame interval indicates that low key
frame intervals tend to contribute to increased frame skipping (as additional
bandwidth is used to transmit a full key frame more often), whereas large intervals
yield decreased to non-existent frame skipping, but introduce longer normalization
times in cases where the frame quality was automatically throttled down in response to
motion. For applications demanding very high quality video, we typically set the key
frame interval to be equal to or greater than our frame rate, because we feel that
occasionally lowered frame quality is preferable to frequent frame skipping.

MICROPHONE SETTINGS
There are several settings that can be specified for the microphone object from within
Flash. Specifically, these are the sampling rate, the gain, the silence level and time
out, and whether to enable echo suppression in the audio codec. These settings affect
sound acquisition and encoding for publishing to Flash Communication Server.

Microphone.setRate()
This method determines the sampling rate used to acquire sound from the microphone
in kilohertz (kHz). Flash allows settings of 5, 8, 11, 22, and 44 kHz, with 8 being the
default in most cases. In general, higher sampling rates yield more natural-sounding
audio with increased bandwidth usage. We generally use settings of 22 or 44 kHz to
achieve relatively high-quality audio transmission and haven’t noticed significant
performance increases with lower sampling rates.

Microphone.setGain() and Microphone.setSilenceLevel()
The gain on the microphone is applied as a multiplier for boosting the input much like
a volume knob works, with zero silencing the audio, the default level of 50 leaving the
signal strength unchanged, and a maximum value of 100. This setting is used in
conjunction with the silence level, which determines the threshold above which the
microphone is activated for publishing audio data. Optionally, the
Microphone.setSilenceLevel() method can also take a second parameter to
specify the silence timeout, which is the time in milliseconds that audio should
continue to be published after the sound level drops below the specified silence level.

We have noted that oftentimes it can be rather difficult to set the audio gain and
silence levels as precisely as we would like to enable the microphone to toggle state

                                                 13
                         Copyright 2005, Architekture.com, All Rights Reserved.
correctly. In some cases, the sweet spot for the silence level has been as narrow as
one unit, with too low a value causing the microphone to be keyed on constantly and
picking up all manner of ambient noise, while a slightly higher value would not
accurately detect a video conference participant’s voice at normal conversational
volume.

The proper choice of gain and silence level values seems to differ significantly between
individual machines and microphone setups, so we are unable to recommend specific
values outside of experimentation with particular hardware setups. We do, however,
recommend implementation of a tell-tale “talk” light in many cases so a participant
can see whether his or her audio signal is being broadcast. Too frequently, we have
seen the case of a video of a participant mouthing words silently on-screen, unaware
that the microphone remains deactivated.

If it is necessary to silence the audio programmatically in response to low activity levels
or to implement a push-to-talk feature, setting the gain to zero is an effective means of
doing this. However, we have not found setting the silence level to 100 to be effective
in all instances, because very loud microphone input can raise the activity level to 100
and thus breach the threshold.

Microphone.setUseEchoSuppression()
Flash allows for optional echo suppression through the audio codec to be toggled on
and off using ActionScript. We usually enable this with good results, although we have
found that a more effective solution to echo reduction is to use USB headsets with
noise cancellation over analog headsets or discrete microphone and speaker setups.
This has the added benefit of filtering out the majority of background noise before it
hits Flash, making it easier to get the silenceLevel setting right.

BUFFER TIMES
The NetStream object allows a buffer time to be set on both publishing and
subscribing but with significantly different effects. If set on publishing, it determines the
maximum size of the outgoing buffer that, when full, will cause the remaining frames
to be dropped. The Macromedia documentation states that this is generally not a
concern on high-bandwidth connections and we have found this to be the case in our
use. On the subscribing end, the buffer time determines the amount of data to be
buffered prior to display. We have typically set both of these to zero with excellent
results for use with live video conferencing applications.

EMBEDDED VIDEO SIZES
Our experience with sizing embedded videos suggests that processor load is
minimized when the embedded video object is sized to match the subscribed video
stream’s resolution exactly. In experiments where the displayed video is sized to be
both larger and smaller than the published resolution, we have observed increased

                                                  14
                          Copyright 2005, Architekture.com, All Rights Reserved.
processor utilization. Given that the camera resolution on a publishing machine can
be changed easily, we recommend matching subscribers’ embedded video object
sizes with the stream’s video resolution. The stream’s native resolution can be
determined programmatically on the subscriber machine by examining the attached
Video object’s width and height properties.

MOVIECLIP.ATTACHAUDIO()
In order to control certain aspects of a stream’s incoming sound (such as volume and
panning), a developer can use the MovieClip.attachAudio() method to attach
the incoming sound to a MovieClip and then control it through a Sound object as
suggested in the client-side development documentation. However, in our experience,
we have found that while such technique does provide for additional control over the
incoming sound, it also has an unfortunate tendency to desynchronize the audio
playback from the video playback. We have not found an adequate solution for this
problem as of yet and recommend against using MovieClip.attachAudio() on
live video conferencing streams.

STREAM LATENCY
Latency can be a significant problem with many video conferencing situations, and
manifests itself as the delay between events captured at the publishing machine and
their arrival and display on a subscriber machine. Because there is no native provision
for a client-side determination of latency, we measure latency by broadcasting a
message using the NetStream.send() method on a publishing machine and
timing the difference in time between the initial broadcast and the subsequent receipt
of the message on a second, self-subscribed stream. While this technique measures
data latency, all of our observations thus far indicate this directly coincides with video
latency. Therefore, we have also taken to interpreting data latency as video latency.

In the course of our research, we have noted that, upon subscribing to a live stream,
latency typically averages below 50 milliseconds (ms) when audio data is entirely
absent. However, upon playback of streamed audio data, latency will typically
increase rapidly to several hundred milliseconds with little to no recovery to previous
levels, even after audio data has ceased. We have also observed that in some cases
with continuous audio data (for example, when the microphone is always keyed on
because of significant volume or too low a silence level), the measured latency
increases slowly in a continuous manner.

While in many cases the latency will tend to level out at 200 to 400 ms (values that we
find acceptable), latency will sometimes continue to grow into the seconds, yielding a
very poor-quality video conferencing experience. While we typically can restore the
latency to low levels by closing the subscribed stream and resubscribing, such a
solution is not particularly appealing because it interrupts the video and audio for


                                                 15
                         Copyright 2005, Architekture.com, All Rights Reserved.
several seconds while the stream is reconnected. To date, we have not found an
adequate solution for capping latency at manageable levels.

It is also important to note that we have not discovered a way of automating the
measurement of audio latency, and aside from implementing a questionable
hardware-based solution such as feeding the speaker jack into the microphone jack
and monitoring the audio activity level, we are at a loss on how to measure audio
latency. A means of determining audio latency would be extremely valuable, of
course, because we could then identify and measure audio sync issues as they
occurred through automated means.

SCALING
While it is relatively easy to create a high-quality video conferencing experience for
two simultaneous participants, the demands on both the network and the machines
increase quickly as an application is scaled to involve greater numbers of
simultaneous participants. Specifically, the bandwidth needed to support many-to-
many video conferencing grows exponentially relative to the number of participants
such that n2 streams are required for n participants. (For more information on
bandwidth usage, see Brian Hock’s Macromedia white paper entitled Calculating Your
Bandwidth and Software License Needs for the Macromedia Flash Communication
Server MX.) Additionally, each client machine will need to dedicate additional
resources to handle the decoding of each subscribed stream.

These factors place upper limits on the maximum number of possible participants in a
single video conference on several fronts, the FCS server, the network infrastructure’s
available bandwidth, and the capabilities of the client machines.

FLASH COMMUNICATION SERVER LIMITATIONS
Flash Communication Server is licensed in increments of 10 Mbit per second or 2,500
simultaneous connections, so the primary consideration when it comes to scaling
Flash Communication Server to accommodate increasing numbers of video
conference participants is adequate bandwidth support by its current license(s).

For video conferencing applications, the 10 Mbit per second peak bandwidth limit will
almost surely be reached before coming close to making 2,500 simultaneous
connections. There aren’t any limits on the number of streams served, just peak
bandwidth usage and total simultaneous connections.

A single professional license offers 10 Mbit per second, or about 1.19 megabytes per
second in available bandwidth. To calculate the usage for a hypothetical case, let us
assume a fairly typical high-bandwidth video conferencing stream with a maximum of
38,400 bytes per second allocated to video data, and a 44 kHz audio sampling rate.
Experimentally, this utilizes a peak maximum of roughly 50 kilobytes per second.

                                                 16
                         Copyright 2005, Architekture.com, All Rights Reserved.
Using 50 kilobytes per second as our estimated bandwidth usage, for increasing
numbers of participants, we can generate the total streams and estimated maximum
bandwidth usage per second in Table 6.

                   Table 6: Example Bandwidth Calculations for n Participants

            Participants    Total Streams              Max. Bandwidth (Bytes per Sec.)
                  2                4                             200,000
                  3                9                             450,000
                  4               16                             800,000
                  5               25                            1,250,000
                  6               36                            1,800,000
                  7               49                            2,450,000
                  8               64                            3,200,000
                  9               81                            4,050,000
                 10             100                             5,000,000

Of course, these numbers are a rough estimate and probably err slightly on the high
side, because we are assuming that all streams are simultaneously reaching their
expected peak bandwidth utilization. However, we can use these figures to estimate
the bandwidth load on the Flash Communication Server software.

Given our earlier assumptions, a single professional license will likely become
saturated somewhere between four and five simultaneous participants. To
accommodate larger numbers of participants, the maximum bandwidth cap on Flash
Communication Server would need to be increased by stacking additional licenses or
purchasing higher capacity licenses from Macromedia.

In practice, actual bandwidth usage will depend on the choice of settings and how the
application is actually used. As screen real estate on the client side is also expected to
diminish with increasing numbers of video conference participants, we recommend a
strategy of reducing per-stream bandwidth usage with increasing numbers of
participants by scaling down the capture resolution and frame rate, video bandwidth
cap, or frame quality as the number of participants in a video conference grows.

Even with an unlimited capacity license on Flash Communication Server, the
limitations on hardware, operating system, and processor performance will eventually
impose a hard ceiling on the number of simultaneous participants supported for a
video conferencing application.

NETWORK LIMITATIONS
Much of our research has been focused on video conferencing in high-bandwidth
intranet configurations with ample network headroom. However, network limitations
should be kept in mind when scaling video conferencing applications for deployment
on all network configurations, particularly those in heavily used environments or

                                                   17
                           Copyright 2005, Architekture.com, All Rights Reserved.
across the Internet. Also, when comparing bandwidth utilization reported by Flash
Communication Server to actual bandwidth used on the physical network, some
degree of additional network overhead used for packet envelopes, retransmitted
packets, and control messages should be taken into account.

In our experience, video conferencing works very well in an intranet setting. However,
in busy local network environments, you will need to take into account additional,
non-video conference traffic such as e-mail, web browsing, and file transfers also
contending for network bandwidth. Depending on local traffic volume and the network
architecture, you may encounter lower available bandwidth and quality of service than
might be expected in ideal conditions.

While we have not encountered any problems traceable to network congestion in test
cases involving both shared and dedicated 100 Mbit Ethernet connections for our
video conferencing tests, we do suggest testing to ensure that an application runs well
in its specific network environment.

When video conferencing is conducted over the Internet, other factors come into play.
First, significantly greater latency and lower available bandwidth can be expected than
those achievable in a local network configuration, even for users with broadband
connections. Also, some users may have asymmetric upload and download bandwidth
capacities. These limitations place additional constraints on the size and quality of
video streams that can be delivered to each client. As recommended by Guilizzoni and
Hock, lowering the capture size, bandwidth and video quality of your streams will be
necessary to accommodate the limitations of Internet-based conferencing.

CLIENT MACHINE LIMITATIONS
On the client machines, the principal consideration in scaling to larger numbers of
participants is the incremental growth of the number of streams that need to be
decoded and displayed. We have conducted a number of simulated tests on single
video conference clients subscribing to and displaying up to 10 live streams without
significant problems when used with reasonable bandwidth and quality settings. We
observed that the settings in Table 7 yield very acceptable performance with smooth
playback when decoding and rendering up to 10 incoming streams on our test
machine.
                       Table 7: Recommended 10-Participant Settings

                     Bandwidth:                                            38400
                     FPS:                                                     15
                     Favor Size:                                               0
                     Frame Quality:                                            0
                     Key Frame Interval:                                      30
                     Camera Width:                                           160


                                                 18
                         Copyright 2005, Architekture.com, All Rights Reserved.
Camera Height:                                         120
                     Buffer Time:                                             0
                     Audio Rate:                                         22 MHz

Average processor utilization for 10 streams utilizing the settings in Table 7 was only
36%, demonstrating that a high-quality 10-participant video conference is entirely
possible on current systems using Macromedia Flash technologies. We have also
conducted additional tests with varying parameters, but have found this combination
of settings to yield the best results.

CPU UTILIZATION AND RESOLUTION
We wanted to determine the effect of stream resolution on processor usage and
determine optimal resolutions to use with different numbers of simultaneous
participants. Using matched publishing and display resolutions, we measured
averaged CPU utilization on our test machine over 60 to 90 seconds when subscribed
to 4, 6, or 8 simulated video conferencing streams at various resolutions in 4:3 aspect
ratios using the settings in Table 8.

                   Table 8: CPU Utilization versus Resolution Basic Settings

                     Bandwidth:                                           38400
                     FPS:                                                    24
                     Favor Size:                                              0
                     Frame Quality:                                           0
                     Key Frame Interval:                                     48
                     Buffer Time:                                             0
                     Audio Rate:                                         44 MHz

Figure 3 shows the plotted results obtained for the 4, 6, and 8 stream cases at various
resolutions




                                                 19
                         Copyright 2005, Architekture.com, All Rights Reserved.
Figure 3: CPU utilization versus stream resolution area graph

The x-axis in this graph is measured in somewhat unusual units, the area resolution of
a stream’s video feed in thousands of pixels. For example, a video resolution of 320 x
240 would yield an area of 76.8 kilopixels (320 x 240 = 76,800). To convert back
from the area to the original 4:3 aspect ratio dimensions, divide the area by 12 and
take the square root of the resulting value. This can be multiplied by 4 to obtain the
width, and by 3 to obtain the height. This unit of measurement was used so that we
could quantitatively compare various resolutions against each other. The numeric
results are provided in Appendix B.

The positions of the 160 x 120 and 320 x 240 capture resolutions that are typically
supported at the hardware level by many commonly used video conferencing cameras
are indicated on the graph to assist in reading.

At present, we suspect that the appearance of shelves, where CPU utilization remains
fairly stable across relatively substantial changes in resolution with marked changes
between certain resolutions, stems from the encoding algorithm used by Flash in
compressing video. However, we do not have sufficient information to determine
conclusively whether this is the case.




                                                 20
                         Copyright 2005, Architekture.com, All Rights Reserved.
SUMMARY
In summary, we offer these findings of optimal hardware and software configurations
for use in live conferencing applications using Flash Communication Server:

   •   Cameras for video conferencing differ significantly in the processor load
       needed for video acquisition. We have found that Firewire cameras using the
       IIDC/DCAM protocol perform significantly better than USB cameras or DV
       Firewire cameras.

   •   USB headsets with active noise cancellation are preferred, because they
       provide superior sound quality and echo reduction compared to analog
       headsets or discrete setups.

   •   Resolutions natively supported by the camera hardware are preferable in order
       to avoid pixelization. Typically, 160 x 120 and 320 x 240 are supported and
       work reasonably well for streaming.

   •   Bandwidth utilization should be carefully balanced with image quality.
       Maximizing either or both tends to yield poor results. A bandwidth limit of
       about 38,400 bytes per second with an unspecified frame quality and a key
       frame interval at or above the camera frame rate serves our purposes rather
       well. Experimentation may be in order to find the configuration best fitting a
       given application’s requirements. Giacomo Guilizzoni has provided an easy-
       to-use calculator that recommends values for various setups at:

       http://www.peldi.com/blog/archives/2005/01/generating_opti_1.html

   •   Microphone sampling rates of 22 or 44 kHz work well. Low sampling rates,
       while reducing bandwidth usage, also result in poor audio quality.

   •   Embedded videos used for displaying subscribed streams should be sized to
       match the originating camera resolution for optimal performance.

   •   MovieClip.attachAudio() should not be used to manipulate the audio
       from a subscribed stream. This has a tendency to introduce unwanted
       synchronization issues.




                                                 21
                         Copyright 2005, Architekture.com, All Rights Reserved.
APPENDIX A: ERROR MARGINS AND SIGNIFICANCE
Most of our test results, particularly those of processor utilization read from Windows
Task Manager, were obtained by manual estimates of averages from values provided
from various tools on a periodic basis. Additionally, video conferences were typically
simulated by speaking into our USB headsets in front of our test cameras in a calm
manner for up to several minutes. Unfortunately, such practices limit our ability to
reproduce visual and audio inputs exactly for each test case.

As such, we have assumed a moderate error margin and have refrained from reading
significance into cases where only marginal differences were observed due to our
inability to obtain results with high precision or granularity.

We are actively working to obtain results with greater statistical rigor through research
in tools that would yield both better-reproducible test cases and more precise results.
Using such tools, we would be able to analyze for significant variations more
effectively.

To alleviate some of the problems that our current methods introduce, we provide
detailed experimental results and community access to our experimental tools in these
appendixes so that our tests can be reproduced and the results be compared by others
in the Flash Communication Server development community.




                                                 22
                         Copyright 2005, Architekture.com, All Rights Reserved.
APPENDIX B: DETAILED EXPERIMENTAL SETUPS AND
RESULTS
CAMERA TESTING
For our camera tests, three representative cameras supporting different protocols were
used in conjunction with our CamTest tool: Apple iSight, an IIDC/DCAM-compliant
webcam that connects via Firewire; Sony DCR-TRV460, a DV-compliant camcorder
that also connects via Firewire; and Creative Labs NX Ultra, a USB webcam. All
cameras were specified as having a maximum live video resolution of 640 x 480
pixels, the capability of yielding streams of up to 30 fps (with the exception of the
Creative NX Ultra, which was limited to 15 fps). Although the Sony DCR-TRV460
camcorder also supports a USB connection, we only tested it using its DV connection.

                              Table 9: Basic Camera Specifications

                 Camera               Data Bus             Max. Resolution         Max. FPS
               Apple iSight         IIDC/DCAM                640x480                 30
            Sony DCR-TRV460             DV                   640x480                 30
             Creative NX Ultra          USB                  640x480                 15

CPU utilization for locally visualizing video output at varying resolutions and frame
rates was measured using each camera using the Windows Task Manager with all
non-essential processes disabled. To isolate the processor requirements needed to
process the video signal into Flash, these tests were conducted entirely locally using a
simple Flash application running under Flash Player 7.0.19.0 with no Flash
Communication Server integration. Resolutions tested were all at the standard
definition ratio of 4:3: 160 x 120, 200 x 150, 240 x 180, 320 x 240, 400 x 300,
and 640 x 480 at rates of 1, 5, 10, 15, 24, and 30 fps. CPU utilization was averaged
over roughly 30 seconds of video acquisition.

Table 10 provides the supported resolutions for each camera among the test set. The
footnotes provide additional details on the actual sizes of the video streams when the
given resolution was requested.




                                                  23
                          Copyright 2005, Architekture.com, All Rights Reserved.
Table 10: Supported Resolutions for Test Cameras (Extended)


                                 160x120        200x150        240x180        320x240        400x300          640x480
    Apple iSight                   Yes            Yes            Yes            Yes            No1              Yes
    Sony DCR-TRV460                Yes            No2            No2            Yes            No1              Yes3
    Creative NX Ultra              Yes4           No5            No5            Yes4           No6              Yes4

As a result, the CPU utilization observations obtained for the 200 x 150, 240 x 180,
and 400 x 300 resolutions should be interpreted with some caution compared to the
resolutions for which all tested cameras provided matched video streams. It is
probable that the scaling of lower-resolution video streams to the originally requested
size in the Flash Player contributes somewhat to the overall CPU utilization.

Additionally, we had some issues with frame rates. In the case of the Creative NX
Ultra, although the camera itself is specified as having a maximum frame rate of 15
fps, Flash was successfully able to request and receive video streams at frame rates up
to 30 fps. We suspect this might be due to inaccurate reporting on the part of the
driver or software-level interpolation. The results from our experiments do not yield
conclusive evidence for either possibility.

In the case of the Apple iSight camera, we were only able to attain a maximum frame
rate of 15 fps, although the technical specifications state that a frame rate of 30 fps
was possible. This was likely due to the use of the generic Windows 1394 Desktop
Camera driver, because a manufacturer-supplied driver for the Windows operating
system was not available. Resolution and frame rate testing for the Apple iSight
camera was therefore limited to frame rates of 15 fps and below for the tests
described here, though at a later point, we were able to obtain a 30 fps frame rate
from the Apple iSight camera using the Unibrain third-party Fire-i drivers for 1394
IIDC/DCAM cameras as described in the main body of this white paper.

It should also be noted that results for the Creative NX Ultra camera were significantly
noisier than for the other cameras, presumably due to noise from additional USB
devices connected to the test machine.

Figure 4 presents graphs of our experimental results (lower CPU utilization is better).




1
  A video stream of 320 x 240 was obtained when a 400 x 300 stream was requested.
2
  Video streams of 160 x 120 were obtained when 200 x 150 and 240 x 180 streams were requested.
3
  The Sony DCR-TRV460 camera produces an interlaced video stream at 640 x 480.
4
  The Creative NX Ultra camera produced slightly letterboxed frames at 160 x 120, 320 x 240, and 640 x 480.
5
  Video streams of 176 x 132 were obtained when 200 x 150 and 240 x 180 streams were requested.
6
  A video stream of 352 x 264 was obtained when a 400 x 300 stream was requested.


                                                            24
                                   Copyright 2005, Architekture.com, All Rights Reserved.
Frame Rate vs. CPU Utilization at 160x120

                          25




                          20




      % CPU Utilization
                          15

                                                                                                  Apple iSight (1394 IIDC/DCAM)
                                                                                                  Sony DCR-TRV460 (1394 DV)
                                                                                                  Creative NX Ultra (USB)

                          10




                           5




                           0
                               0    5       10          15           20          25          30
                                                 Frames Per Second



                                                 Frame Rate vs. CPU Utilization at 240x180

                          25




                          20
      % CPU Utilization




                          15

                                                                                                  Apple iSight (1394 IIDC/DCAM)
                                                                                                  Sony DCR-TRV460 (1394 DV)
                                                                                                  Creative NX Ultra (USB)

                          10




                           5




                           0
                               0    5       10          15           20          25          30
                                                 Frames Per Second



                                                 Frame Rate vs. CPU Utilization at 200x150

                          25




                          20
% CPU Utilization




                          15

                                                                                                  Apple iSight (1394 IIDC/DCAM)
                                                                                                  Sony DCR-TRV460 (1394 DV)
                                                                                                  Creative NX Ultra (USB)

                          10




                          5




                          0
                               0    5       10          15           20          25          30
                                                 Frames Per Second




                                                                     25
                                   Copyright 2005, Architekture.com, All Rights Reserved.
Frame Rate vs. CPU Utilization at 320x240

                    25




                    20




% CPU Utilization
                    15

                                                                                              Apple iSight (1394 IIDC/DCAM)
                                                                                              Sony DCR-TRV460 (1394 DV)
                                                                                              Creative NX Ultra (USB)

                    10




                    5




                    0
                         0      5       10          15           20          25          30
                                             Frames Per Second




                                             Frame Rate vs. CPU Utilization at 400x300

                    25




                    20
% CPU Utilization




                    15

                                                                                              Apple iSight (1394 IIDC/DCAM)
                                                                                              Sony DCR-TRV460 (1394 DV)
                                                                                              Creative NX Ultra (USB)

                    10




                    5




                    0
                         0      5       10          15           20          25          30
                                             Frames Per Second



                                             Frame Rate vs. CPU Utilization at 640x480

                    25




                    20
% CPU Utilization




                    15

                                                                                              Apple iSight (1394 IIDC/DCAM)
                                                                                              Sony DCR-TRV460 (1394 DV)
                                                                                              Creative NX Ultra (USB)

                    10




                    5




                    0
                         0      5       10          15           20          25          30
                                             Frames Per Second




                             Figure 4: Frame rate versus CPU utilization graphs




                                                                 26
                                Copyright 2005, Architekture.com, All Rights Reserved.
Figure 5 shows the results of graphing the same data to compare CPU utilization at
15, 24, and 30 fps.

                                                                Resolution vs. CPU Utilization - 24 FPS

              25




              20




              15


                                                                                                                    Sony DCR-TRV 460
                                                                                                                    Creative NX Ultra

              10




                    5




                    0
                                       160x120      200x150     240x180           320x240      400x300    640x480



                                                                Resolution vs. CPU Utilization - 30 FPS

                                  25




                                  20
              % CPU Utilization




                                  15

                                                                                                                    Sony DCR-TRV 460
                                                                                                                    Creative NX Ultra

                                  10




                                   5




                                   0
                                         160x120      200x150     240x180          320x240      400x300   640x480
                                                                          Resolution


                                                   Figure 5: Resolution versus utilization graphs



ENCODING/DECODING AND CPU UTILIZATION
With the understanding obtained of the effects of various cameras and video stream
formats on processor utilization, we next analyzed the CPU utilization incurred by
publishing audio and video to Flash Communication Server as well as that needed for
subscribing to a stream from Flash Communication Server using our FCSDiag tool.

We tested a broadcasting-only configuration (with no local visualization), and a simple
loopback configuration where the published stream was resubscribed and rendered by
the same machine under several different video setting configurations that have

                                                                                  27
                                                   Copyright 2005, Architekture.com, All Rights Reserved.
proven to give high-quality results. The loopback case effectively simulates the load for
a participant machine in a simple 1-to-1 video conference.

The first test was conducted with a configuration that yields, as we have found through
prior work in Flash video conferencing, a relatively high-quality experience. Two
additional tests were also conducted, the first having the camera bandwidth set to
38,400 and allowing Flash to throttle the video quality dynamically, and the second
having the video quality set to 90 and the bandwidth unspecified by being set to zero,
as recent experiments have shown that these configurations also yielded relatively
high-quality results.

Table 11 lists the configurations used for each of these tests. In the graphed results
that follow, the “Publish Only” CPU utilization encompasses the CPU use needed for
video acquisition and publishing of the encoded stream to Flash Communication
Server, while the “Loopback” CPU utilization adds on the additional processor use
needed to subscribe and display the same stream on the test machine.
                                     Table 11: Encoding/Decoding Test Configurations

                                   Test Configuration A                 Test Configuration B                 Test Configuration C
    Bandwidth:                                      400,000                                   38,400                                0
                                                               7                                       7
    FPS:                                                    24                                     24                         247
    Favor Size:                                                0                                      0                             0
    Frame Quality:                                           85                                       0                        90
    Key Frame Interval:                                      48                                     48                         48
    Camera Width:                                          320                                    320                         320
    Camera Height:                                         240                                    240                         240
    Buffer Time:                                          0.01                                   0.01                        0.01
    Audio Rate:                                       22 MHz                                 22 MHz                       22 MHz

Figure 6 shows the results graphically.




7
    In practice, this results in an actual frame rate of 15 FPS for the Apple iSight due to driver limitations.


                                                                     28
                                          Copyright 2005, Architekture.com, All Rights Reserved.
Encoding / Decoding - Test Configuration A

                    30




                    25




                    20




% CPU Utilization
                                                                                                               Publish Only
                    15
                                                                                                               Loopback




                    10




                    5




                    0
                         Apple iSight - FP7           Sony DCR-TRV460 - FP7          Creative NX Ultra - FP7
                                                       Camera / Flash Player



                                              Encoding / Decoding - Test Configuration B

                    30




                    25




                    20
% CPU Utilization




                                                                                                               Publish Only
                    15
                                                                                                               Loopback




                    10




                    5




                    0
                         Apple iSight - FP7           Sony DCR-TRV460 - FP7          Creative NX Ultra - FP7
                                                       Camera / Flash Player




                                              Encoding / Decoding - Test Configuration C

                    30




                    25




                    20
% CPU Utilization




                                                                                                               Publish Only
                    15
                                                                                                               Loopback




                    10




                    5




                    0
                         Apple iSight - FP7           Sony DCR-TRV460 - FP7          Creative NX Ultra - FP7
                                                       Camera / Flash Player




                                    Figure 6: Encoding/Decoding Graphs




                                                                 29
                             Copyright 2005, Architekture.com, All Rights Reserved.
Our three test configurations yielded comparable results in terms of CPU utilization
despite the differences in settings. Because the Apple iSight camera was operating at
only 15 fps for these tests (as described earlier), we believe that the CPU utilization in
tests employing it are artificially lowered to a certain extent. As such, there do not
appear to be substantial differences in the amount of work needed to encode or
decode video from the cameras tested. In terms of subjective quality, all tested
configurations utilizing the different cameras were quite acceptable, as we had
anticipated.

Additionally, the data from this series of experiments enable us to break down CPU
usage into its constituent parts when combined with our earlier results for CPU
utilization during video acquisition with our test cameras. Applying this to the data for
Test Configuration A, we can derive the breakdown shown Figure 7. Results for the
other test configurations are similar.
                                                                Test Configuration A -- Loopback Breakdown

                                  30




                                  25




                                  20
                                                                                                                  13
              % CPU Utilization




                                                                                                                                  Decoding
                                  15                                                                                              Encoding
                                                                                  10                                              Acquisition


                                                  10
                                                                                                                  3
                                  10

                                                                                  2.5


                                   5
                                                  3.5                                                             9
                                                                                  6.5

                                                  2.5
                                   0
                                           Apple iSight - FP7            Sony DCR-TRV460 - FP7          Creative NX Ultra - FP7




                                       Figure 7: CPU utilization breakdown for Test Configuration A


While there are slight differences in the CPU usage for encoding and decoding in the
three cases shown, the greatest factor affecting total CPU utilization in these simulated
simplest case 1-to-1 video conferencing tests remains the choice of camera.

VIDEO SETTINGS
From prior experience with video conferences involving five participants, we had
usually set both frame quality and the maximum bandwidth for the camera during
testing and used a static video resolution of 320 x 240 pixels at 24 fps (the same as
the Flash movie’s frame rate). After significant trial and error, we had arrived at the
settings shown in Table 12. These settings yielded the best overall performance with
minimal frozen frames and synchronization issues.


                                                                                   30
                                                  Copyright 2005, Architekture.com, All Rights Reserved.
Table12: Initial Video Settings

                     Bandwidth:                              400,000-900,000
                     FPS:                                                 24
                     Favor Size:                                           0
                     Frame Quality:                                    60-85
                     Key Frame Interval:                                  48
                     Camera Width:                                       320
                     Camera Height:                                      240
                     Buffer Time:                                       0.01
                     Audio Rate:                                     22 MHz

In FCSDiag loopback tests employing both video and audio input, the CPU utilization
and average latency (time for a NetStream.send call to reach the Flash
Communication Server application and return) did not show significant variation within
the range of bandwidth and frame quality settings given in Table 12 and were
essentially the same as the results obtained in Table 8 for Test Configuration A in the
encoding/decoding tests.

Subjectively, the video stream appeared very smooth and no frozen frames or
problems with audio synchronization were observed. At lower frame quality settings,
some pixelization was observed as expected.

Table 13 lists typical CPU utilization and average latency obtained using these settings
on Flash Player 7.

                      Table13: CPU Utilization and Latency for Cameras

                   Camera                 Avg. Latency (ms)              % CPU Utilization
                 Apple iSight                   150                            13
              Sony DCR-TRV460                   180                            21
               Creative NX Ultra                180                            25

It should be noted that the average latency tends to remain fairly stable, with the
loopback signal being delayed about 150 to 180 ms from real time once audio data
has been introduced to the stream. On some occasions, latency will increase to
markedly higher values (~1,500 ms) for unknown reasons and yield unsatisfactory
results as the received stream lags over a second behind real time.

We have also experimented with only setting either the maximum bandwidth or the
frame quality so as to allow Flash to manage one or the other in real-time. We were
initially introduced to such a possibility from Giacomo Guilizzoni’s weblog, where he
presented an optimal settings calculator for Flash Communication Server video
settings under different settings.



                                                 31
                         Copyright 2005, Architekture.com, All Rights Reserved.
Adapting his results to our needs, we conducted a number of tests to quantify the
effects of each of the parameters under such regimes. Our experiment results indicate
these approaches also produce relatively high-quality results with properly chosen
settings. As these tests were done using a different program than was employed in the
earlier tests in order to measure and graph bandwidth utilization in real time. The
resultant CPU utilization measures are not directly comparable to the data obtained in
previous experiments. We used the Creative NX Ultra for these experiments. For our
initial battery of tests, we set the bandwidth to 0 and throttled the frame quality from
100 down to 0 with the audio muted (to keep latency relatively constant) under the
conditions given in the Table 14.

                           Table14: Variable Frame Quality Settings

                       Bandwidth:                                              0
                       FPS:                                                   24
                       Favor Size:                                             0
                       Frame Quality:                                   As Below
                       Key Frame Interval:                                     8
                       Camera Width:                                         280
                       Camera Height:                                        208
                       Buffer Time:                                            0
                       Audio Rate:                                       22 MHz

We obtained the following results, with the average bandwidth utilization in bytes,
selected average CPU utilizations, and subjective findings for each test listed in Table
15.

                           Table 15: Variable Frame Quality Results

Frame Quality   Bandwidth/Sec     CPU Util. (%)                           Subjective Findings
    100           250,000              33                High-quality picture, marked frame skipping
     90            68,000              29                High-quality picture, some frame skipping
     80            36,000              30                High-quality picture, occasional frame skipping
     70            24,000         Not Measured           Faint pixelization, smooth playback
     60            19,000         Not Measured           Mild pixelization, smooth playback
     50            13,000         Not Measured           Medium pixelization, smooth playback
     40            11,000         Not Measured           Loss of fine detail, smooth playback
     30            10,000         Not Measured           Moderate loss of detail, smooth playback
     20             9,000         Not Measured           Severe loss of detail, smooth playback
     10             8,000              27                Loss of gross detail, smooth playback

Here, CPU utilization seems to drop rather slowly with decreasing frame quality. High
frame quality yielded very high-quality pictures at the cost of frame skipping, whereas
specifying lower frame quality yielded smooth playback by sacrificing detail. The sweet
spot, as it were, seems to be at about a frame quality of 70 to 80. It is also interesting
to note that at a frame quality of 100 (zero compression, accompanied by

                                                  32
                          Copyright 2005, Architekture.com, All Rights Reserved.
exceptionally high bandwidth consumption), the CPU utilization seems to be somewhat
greater than when the frame quality is set to lower values and the video data
compressed.

Subsequently, we performed another battery of experiments, this time varying the
specified bandwidth but keeping the frame quality set to 80 with settings otherwise
identical to those given in Table 6. Although a frame quality of 80 had produced
occasional frame skipping shown in Table 15, from previous experience such a value
typically yields a decent trade-off between high bandwidth and CPU utilization and low
picture quality, and so it was chosen for this set of experiments. Table 16 lists the
results.
                             Table 16: Variable Bandwidth Results

  Spec. Bandwidth     CPU Util. (%)                           Subjective Findings
      19,200               30             Smooth, significant pixelization upon movement
      38,400          Not Measured        Smooth, some pixelization upon movement
      51,200          Not Measured        Occasional frame skips, pixelization on gross movement
      76,800          Not Measured        Frequent frame skips, pixelization with extreme movement
     128,000          Not Measured        Frequent frame skips, high quality picture
     192,000          Not Measured        Frequent frame skips, high quality picture
     256,000          Not Measured        Very frequent frame skips, high quality picture
     384,000               30             Constant frame skip, high quality picture

The trade-off seems to be in smooth video playback versus greater pixelization upon
movement. If the video image is very still over time, a high-quality picture can be
obtained for practically all the specified bandwidths. The sweet spot for a frame quality
of 80 is apparently somewhere between 38,400 to 51,200 bytes per second,
although 38,400 is quite acceptable if it's acceptable to experience momentary
pixelization upon a video conference participant’s sudden movement.

Such settings also have the benefit of keeping the bandwidth usage capped relatively
low without significantly sacrificing image quality. This is of particular benefit as we
assume that keeping the bandwidth usage in check becomes increasingly necessary
when scaling the video conference to greater numbers of participants.

Additionally, several ad hoc tests indicate that a low key-frame interval tends to
contribute to increased frame skipping, whereas high key-frame intervals, particularly
ones higher than the frame rate, result in decreased frame skipping but introduce
somewhat longer normalization times in cases where the video image has become
pixelated due to motion.

Although these tests were not repeated on the Apple iSight camera or the Sony DCR-
TRV460 camcorder, the results obtained here lead to the configurations chosen for
Test Configuration B and C in the encoding/decoding tests described earlier, which
replicate a subset of these batteries for the two additional cameras.

                                                 33
                         Copyright 2005, Architekture.com, All Rights Reserved.
SCALING
The other major goal of our research is the feasibility of scaling video conferencing to
support up to 10 simultaneous participants, as one of our goals is determining both
the feasibility and extensibility of Flash video conferencing to large participant video
conference situations. To do this, we conducted a number of tests using our FCSDiag
suite of test applications.

Due to both screen size and network bandwidth constraints, we are primarily looking
at a resolution of 160 x 120 for each participant’s video stream. The principal
considerations in finding optimal settings for supporting a 10-participant conference
are maintaining a relatively low CPU utilization, as each machine will need to encode
its own stream as well as decode 10 incoming streams, and minimizing network
bandwidth utilization, as bandwidth requirements scale exponentially with the number
of participants.

Some of the initial scaling tests documented in the following tables were performed
prior to our determination that the Apple iSight camera performed significantly better
in reducing the CPU overhead involved in video acquisition. Our initial tests were
done using the Creative NX Ultra camera with relatively naïve video settings with
marginally acceptable results.

Significantly better results were obtained in tests conducted with the Apple iSight
camera incorporating refinements in the video configuration learned through testing.
Our efforts in determining optimal configurations for scaling video conferences to 10
participants are described below.

All tests were conducted with the test machine publishing its own stream, and
subscribing to and displaying n (varying between 1 and 10) streams with identical
video settings being broadcast from a second participant machine through Flash
Communication Server. This effectively simulates the load of a participant machine in
a conference with n + 1 participants where the participant machine is not monitoring
a loopback stream.

A second participant machine was used to provide the streams to be subscribed on
the test machine as this allowed us to focus the second machine’s camera (Logitech
QuickCam Orbit) on ambient street traffic outside our facility. With large numbers of
video feeds, it was significantly easier to assess frame skipping when imaging steadily
moving vehicles rather than facial movements. Audio data was collected and
published by both machines from ambient sound in the room.

Our initial test (Test 1) was conducted with the configuration shown in Table 17,
chosen to sacrifice video quality momentarily if necessary to contain bandwidth usage
to reasonable limits.


                                                 34
                         Copyright 2005, Architekture.com, All Rights Reserved.
architekture.com, inc.
architekture.com, inc.
architekture.com, inc.
architekture.com, inc.
architekture.com, inc.
architekture.com, inc.
architekture.com, inc.

More Related Content

Similar to architekture.com, inc.

State of the UC Union- Allwave AV.pdf
State of the UC Union- Allwave AV.pdfState of the UC Union- Allwave AV.pdf
State of the UC Union- Allwave AV.pdfAll Wave AV Systems
 
Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...Videoguy
 
Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...Videoguy
 
Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...Videoguy
 
Network Configuration Example: Configuring Assured Forwarding for High-Defini...
Network Configuration Example: Configuring Assured Forwarding for High-Defini...Network Configuration Example: Configuring Assured Forwarding for High-Defini...
Network Configuration Example: Configuring Assured Forwarding for High-Defini...Juniper Networks
 
Utf 8'en'ibm sametime 9 - voice and video deployment
Utf 8'en'ibm sametime 9 - voice and video deployment Utf 8'en'ibm sametime 9 - voice and video deployment
Utf 8'en'ibm sametime 9 - voice and video deployment a8us
 
Flash-based audio and video communication
Flash-based audio and video communicationFlash-based audio and video communication
Flash-based audio and video communicationKundan Singh
 
Scopia Infrastructure Guide
Scopia Infrastructure GuideScopia Infrastructure Guide
Scopia Infrastructure GuideMotty Ben Atia
 
Video streaming
Video streamingVideo streaming
Video streamingVideoguy
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Videoguy
 
Streaming Video Solutions White Paper
Streaming Video Solutions White PaperStreaming Video Solutions White Paper
Streaming Video Solutions White PaperVideoguy
 
Polycom Video Communications
Polycom Video CommunicationsPolycom Video Communications
Polycom Video CommunicationsVideoguy
 
Collaborative conferencing options available to LTER Network ...
Collaborative conferencing options available to LTER Network ...Collaborative conferencing options available to LTER Network ...
Collaborative conferencing options available to LTER Network ...Videoguy
 
Video Conferencing Demo Application for Dialogic® Multimedia ...
Video Conferencing Demo Application for Dialogic® Multimedia ...Video Conferencing Demo Application for Dialogic® Multimedia ...
Video Conferencing Demo Application for Dialogic® Multimedia ...Videoguy
 
Video Conferencing, The Enterprise and You
Video Conferencing, The Enterprise and YouVideo Conferencing, The Enterprise and You
Video Conferencing, The Enterprise and YouVideoguy
 
Network Planning Worksheets for Video Conferencing
Network Planning Worksheets for Video ConferencingNetwork Planning Worksheets for Video Conferencing
Network Planning Worksheets for Video ConferencingVideoguy
 
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayerTaking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayerDaniel Krook
 
Whitepaper multipoint video_conferencing_june2012_wr
Whitepaper multipoint video_conferencing_june2012_wrWhitepaper multipoint video_conferencing_june2012_wr
Whitepaper multipoint video_conferencing_june2012_wrJohn Shim
 
Kranky Geek - Virtual Collaboration - Igor Pavlov
Kranky Geek - Virtual Collaboration - Igor PavlovKranky Geek - Virtual Collaboration - Igor Pavlov
Kranky Geek - Virtual Collaboration - Igor PavlovIgor Pavlov
 

Similar to architekture.com, inc. (20)

State of the UC Union- Allwave AV.pdf
State of the UC Union- Allwave AV.pdfState of the UC Union- Allwave AV.pdf
State of the UC Union- Allwave AV.pdf
 
Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...
 
Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...
 
Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...Deployment Of Multi-Network Video And Voice Conferencing On A ...
Deployment Of Multi-Network Video And Voice Conferencing On A ...
 
Network Configuration Example: Configuring Assured Forwarding for High-Defini...
Network Configuration Example: Configuring Assured Forwarding for High-Defini...Network Configuration Example: Configuring Assured Forwarding for High-Defini...
Network Configuration Example: Configuring Assured Forwarding for High-Defini...
 
Utf 8'en'ibm sametime 9 - voice and video deployment
Utf 8'en'ibm sametime 9 - voice and video deployment Utf 8'en'ibm sametime 9 - voice and video deployment
Utf 8'en'ibm sametime 9 - voice and video deployment
 
Flash-based audio and video communication
Flash-based audio and video communicationFlash-based audio and video communication
Flash-based audio and video communication
 
Scopia Infrastructure Guide
Scopia Infrastructure GuideScopia Infrastructure Guide
Scopia Infrastructure Guide
 
Video streaming
Video streamingVideo streaming
Video streaming
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...
 
Streaming Video Solutions White Paper
Streaming Video Solutions White PaperStreaming Video Solutions White Paper
Streaming Video Solutions White Paper
 
Polycom Video Communications
Polycom Video CommunicationsPolycom Video Communications
Polycom Video Communications
 
Collaborative conferencing options available to LTER Network ...
Collaborative conferencing options available to LTER Network ...Collaborative conferencing options available to LTER Network ...
Collaborative conferencing options available to LTER Network ...
 
Video Conferencing Demo Application for Dialogic® Multimedia ...
Video Conferencing Demo Application for Dialogic® Multimedia ...Video Conferencing Demo Application for Dialogic® Multimedia ...
Video Conferencing Demo Application for Dialogic® Multimedia ...
 
Video Conferencing, The Enterprise and You
Video Conferencing, The Enterprise and YouVideo Conferencing, The Enterprise and You
Video Conferencing, The Enterprise and You
 
Network Planning Worksheets for Video Conferencing
Network Planning Worksheets for Video ConferencingNetwork Planning Worksheets for Video Conferencing
Network Planning Worksheets for Video Conferencing
 
Intro to Video Conferencing
Intro to Video ConferencingIntro to Video Conferencing
Intro to Video Conferencing
 
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayerTaking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
Taking the Next Hot Mobile Game Live with Docker and IBM SoftLayer
 
Whitepaper multipoint video_conferencing_june2012_wr
Whitepaper multipoint video_conferencing_june2012_wrWhitepaper multipoint video_conferencing_june2012_wr
Whitepaper multipoint video_conferencing_june2012_wr
 
Kranky Geek - Virtual Collaboration - Igor Pavlov
Kranky Geek - Virtual Collaboration - Igor PavlovKranky Geek - Virtual Collaboration - Igor Pavlov
Kranky Geek - Virtual Collaboration - Igor Pavlov
 

More from Videoguy

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingVideoguy
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresVideoguy
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingVideoguy
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksVideoguy
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streamingVideoguy
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideoguy
 
Video Streaming
Video StreamingVideo Streaming
Video StreamingVideoguy
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader AudienceVideoguy
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGVideoguy
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingVideoguy
 
Application Brief
Application BriefApplication Brief
Application BriefVideoguy
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Videoguy
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second LifeVideoguy
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming SoftwareVideoguy
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoguy
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video FormatenVideoguy
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareVideoguy
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxVideoguy
 
Video and Streaming in Nokia Phones v1.0
Video and Streaming in Nokia Phones v1.0Video and Streaming in Nokia Phones v1.0
Video and Streaming in Nokia Phones v1.0Videoguy
 

More from Videoguy (20)

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video Streaming
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_Pres
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video Streaming
 
Adobe
AdobeAdobe
Adobe
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streaming
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A Survey
 
Video Streaming
Video StreamingVideo Streaming
Video Streaming
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader Audience
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video Streaming
 
Application Brief
Application BriefApplication Brief
Application Brief
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second Life
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming Software
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions Cookbook
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video Formaten
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming Software
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - Firefox
 
Video and Streaming in Nokia Phones v1.0
Video and Streaming in Nokia Phones v1.0Video and Streaming in Nokia Phones v1.0
Video and Streaming in Nokia Phones v1.0
 

architekture.com, inc.

  • 1. architekture.com, inc. TM design with intelligence Optimizing Video Conferences with Macromedia Flash Technologies Jim Cheng jim.cheng@architekture.com Allen Ellison allen.ellison@architekture.com February 2005
  • 2. Copyright © 2005 Architekture.com, Inc. All rights reserved. This white paper is for information purposes only. ARCHITEKTURE.COM MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. Macromedia, Macromedia Flash, Flash Communication Server, and Flash Player are either trademarks or registered trademarks of Macromedia, Inc. in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. ARCHITEKTURE.COM, INC. 600 GRANT STREET SUITE 850 DENVER, CO 80203 (720) 231-3166 ii Copyright 2005, Architekture.com, All Rights Reserved.
  • 3. INTRODUCTION It is well known that the combination of Macromedia Flash Communication Server and Macromedia Flash Player offers many exciting possibilities for live video conferencing. The task of choosing optimal hardware selections and software settings, however, has remained quite burdensome and arcane. All too often, developers have to deal with audio synchronization, frozen video images, and lag issues. Even for seasoned Macromedia Flash developers, the task of implementing quality Flash-based video conferencing applications becomes a challenge when confronted with the bewildering selection of cameras, network configurations, and software settings. However, the ability to create high-quality video conferencing experiences in Flash is essential to meeting client expectations for many of today’s cutting-edge Flash Communication Server applications. In the course of developing such applications for a variety of clients during 2004, Architekture.com has conducted significant research on optimizing high-bandwidth video conferencing applications with the goal of finding a good balance between video and sound quality, and limiting the use of CPU and network resources to mitigate problems associated with skipped frames, lag, or out-of- sync sound. We are pleased to present our findings and recommendations to the Flash developer community in this white paper. Architekture.com is a leading Macromedia Flash development firm with recognized expertise in Flash Communication Server. Our world-class development team creates cutting-edge solutions that push the limits of what is thought possible. We specialize in the development of immersive, real-time multi-player simulations, as well as rapid prototype development and real-time business collaboration applications. iii Copyright 2005, Architekture.com, All Rights Reserved.
  • 4. CONTENTS Introduction........................................................................................................ iii Why Optimization Matters ................................................................................... 1 Focusing on the Client Side ................................................................................. 1 Testing Environment ............................................................................................ 2 Hardware........................................................................................................... 2 Cameras ........................................................................................................ 2 Microphones................................................................................................... 8 Networking..................................................................................................... 8 Software Settings ................................................................................................ 9 Camera Settings.............................................................................................. 9 Camera.setMode() ....................................................................................... 9 Camera.setQuality()................................................................................... 10 Camera.setKeyFrameInterval()..................................................................... 13 Microphone Settings ...................................................................................... 13 Microphone.setRate() ................................................................................. 13 Microphone.setGain() and Microphone.setSilenceLevel()................................ 13 Microphone.setUseEchoSuppression() .......................................................... 14 Buffer Times.................................................................................................. 14 Embedded Video Sizes................................................................................... 14 MovieClip.attachAudio() ................................................................................ 15 Stream Latency.............................................................................................. 15 Scaling ............................................................................................................ 16 Flash Communication Server Limitations.......................................................... 16 Network Limitations ....................................................................................... 17 Client Machine Limitations ............................................................................. 18 CPU Utilization and Resolution ....................................................................... 19 Summary ......................................................................................................... 21 Appendix A: Error Margins and Significance ........................................................ 22 Appendix B: Detailed Experimental Setups and Results.......................................... 23 Camera Testing ............................................................................................ 23 Encoding/Decoding and CPU Utilization ......................................................... 27 Video Settings ............................................................................................... 30 Scaling......................................................................................................... 34 Appendix C: Where to Download Test Files ......................................................... 38 Appendix D: IIDC/DCAM Camera List ................................................................ 39 iv Copyright 2005, Architekture.com, All Rights Reserved.
  • 5. WHY OPTIMIZATION MATTERS Many-to-many video conferencing on desktop computers requires significant quantities of resources, both in terms of processor utilization and network bandwidth. In order to achieve optimal results, it is necessary to find a good balance between video and sound quality that limits the use of resources to a level where processor and network loads do not introduce deleterious effects such as frame skipping, lag, or out- of-sync sound into the video conference experience. Poor choices in hardware selection and improper software settings often contribute to a poor video conferencing experience, and the bewildering number of options often makes it seem next to impossible to create high-quality video conferencing experiences, even with best-of-breed tools. This discourages both clients and developers alike, and convinces many that even with today’s technologies, video conferencing applications are difficult to use and cannot meet the promise of rich audio and visual communication between groups of individuals. Judicious choices of optimal hardware configuration and software settings, however, can make all the difference between a glitchy and nearly useless video conference application, and an impressive high-quality experience that exceeds client expectations. In the course of developing rich video conferencing applications using Macromedia technologies, we at Architekture.com have spent many hours determining best choices in specifying and configuring collaborative video conferencing products for our clients. We hope that sharing our results with the Flash developer community will lead to the development and release of many high-quality video conferencing applications in the future. FOCUSING ON THE CLIENT SIDE Although Flash Communication Server plays a crucial role in facilitating video conferencing with Flash technologies, for the most part it only serves to relay streams from one client machine to another in live video conferencing situations. In our testing environments, we have noted that even fairly modest server hardware setups such as a single 2.8 GHz Pentium 4 system with 512 MB of RAM can easily accommodate relatively intensive video conferencing situations that push the limit of a typical professional license. The limitations affecting video conferencing performance are instead mainly concentrated on the client side, because this is where the bulk of the work is done. When publishing a stream, the client machine has to acquire video and audio data, encode it, and push it across the network to the server, all in real time. And in a many- to-many conferencing situation, the same machine will need to subscribe to streams published by all of the other participants, decode them in real time, and present the 1 Copyright 2005, Architekture.com, All Rights Reserved.
  • 6. results onscreen and through the speakers or headphones—this too in real time (or as close to it as possible). Consequently, our optimization research and recommendations focus nearly entirely on the client-side systems. TESTING ENVIRONMENT Principal testing was conducted in the Architekture.com development laboratory on a hyper-threaded 2.8 GHz Pentium 4 computer running Windows XP Professional SP1 with 1.25 GB of RAM. The Flash Communications Server application runs on a similar processor with 512 MB of RAM under Windows Server 2003. These machines are connected on a 100 Mbps Ethernet LAN through a switch, and tests were conducted with in-house testing utilities running under Flash Player 7.0.19.0. We also conducted some additional testing on machines belonging to clients for proprietary video conferencing applications. HARDWARE A developer's ability to make or suggest hardware configurations for use with an application will vary depending on client requirements. However, we have found that the choice of hardware goes a long way in affecting the overall video conferencing experience. Even if you are building a video conferencing application for the web and have no control over the hardware configurations of client machines, these findings may help in determining minimum system requirements and in optimizing software settings for an expected range of client machines and network configurations. Our goal in making effective hardware choices for optimal performance is to minimize the load on the client processor and network while maintaining a high-quality audio and video stream. During our tests, we found that high processor loads were strongly correlated with poor performance, because the CPU’s time became divided between processes supporting the video conference and other applications contending for processor time. Maintaining reasonable network loads is an important secondary consideration, particularly in low-bandwidth settings, because available network bandwidth directly limits the amount of data that can be transferred between the client machine and Flash Communication Server. CAMERAS Cameras play a basic role in acquiring the video signal for conferencing applications. However, the video signal itself usually requires some degree of additional processing by the CPU before it is ready for use by the client Flash Player. Equally important are the drivers used to interface the camera with the operating system, because poorly 2 Copyright 2005, Architekture.com, All Rights Reserved.
  • 7. written camera drivers coupled with a camera’s high data throughput can place even greater demands on the processor. For most video conferencing applications, camera resolutions greater than 640 x 480 and frame rates greater than 30 frames per second (fps) are generally not necessary. Furthermore, consumer-level cameras intended for use with video conferencing applications seldom provide resolutions and frame rates higher than these for real- time video feeds. Because of this, we will limit our discussion to these cameras and will not consider those with higher resolutions or frame rates that are typically used for scientific and industrial applications. Most cameras designed for video conferencing use one of two serial bus architectures for communication with the client machine: USB (typically the faster 2.0 specification), or Firewire, also known as IEEE 1394. Firewire cameras can also be further divided in two categories based on data transfer protocol: DV (digital video) cameras, which provide a compressed data stream to the computer, and IIDC/DCAM cameras, which output uncompressed data streams and also offer camera hardware control over the Firewire bus. Our tests, as well as available documentation, suggest that there are significant differences in terms of overall processor demands between the various protocols used to transfer data from the camera to the computer. To determine the processor use required to handle video acquisition for different cameras, we conducted tests with three representative cameras using different bus and protocol combinations for transferring data to the client machine under identical resolution and frame rate settings. For our tests, we used the following cameras: Apple iSight, an IIDC/DCAM-compliant webcam that connects through a 400-Mbit Firewire bus; Sony DCR-TRV460, a DV- compliant consumer camcorder that also connects through 400-Mbit Firewire bus; and Creative Labs NX Ultra, a higher-quality USB webcam. All cameras were specified by their manufacturers as having a maximum live video resolution of 640 x 480 pixels as well as the capability of yielding streams of up to 30 fps (with the exception of the Creative NX Ultra camera, which was limited to 15 fps according to manufacturer specifications). Although the Sony DCR-TRV460 camera also sports a USB connection, we only used its Firewire DV connection for our tests. Table 1 provides an overview of the cameras we used for our tests. 3 Copyright 2005, Architekture.com, All Rights Reserved.
  • 8. Table 1: Basic Camera Capabilities Camera Data Bus Max. Resolution Max. FPS Apple iSight 1394 IIDC/DCAM 640x480 30 Sony DCR-TRV460 1394 DV 640x480 30 Creative NX Ultra USB 640x480 15 We measured CPU utilization for locally visualizing video output at varying resolutions and frame rates using each camera. To isolate the processor requirements needed to process the video signal and import it into Flash, we conducted these tests entirely locally using a simple Flash application running under Flash Player 7.0.19.0 without Flash Communication Server integration. Resolutions tested were all at the standard definition ratio of 4:3: 160 x 120, 200 x 150, 240 x 180, 320 x 240, 400 x 300, and 640 x 480 at frame rates of 1, 5, 10, 15, 24, and 30 fps. CPU utilization was measured using Windows Task Manager and averaged over roughly 30 seconds of video acquisition with all other applications and non-essential processes disabled. Although data points were obtained for all cameras at our test resolutions and frame rates, no camera supported all the resolutions natively. Actual resolution and frame rate can be assessed programmatically after making a Camera.setMode() call through the camera object’s width, height, and currentFps properties for comparison. When unsupported resolutions or frame rates were requested, Flash typically causes the video stream to be returned from the camera at a lower resolution and scaled up for display with fairly obvious pixelization. Figure 1 shows example frame captures illustrating this pixelization effect. Creative Labs NX Ultra 240 x 180 Apple iSight 240 x 180 (Camera Resolution: 176 x 132) (Camera Resolution: 240 x 180) Figure 1: Sample frame captures illustrating pixelization In this example, a resolution of 240x180 was requested of both the Creative Labs NX Ultra and the Apple iSight cameras. The NX Ultra, which does not support a 240x180 4 Copyright 2005, Architekture.com, All Rights Reserved.
  • 9. capture resolution, is instead yielding a 176 x 132 stream, resulting in pixelization as Flash scales up the image to the display resolution of 240 x 180. On the other hand, Apple iSight natively supports a 240 x 180 capture resolution, resulting in significantly better picture quality. Table 2 lists the supported resolutions for each camera in the test set. Table 2: Supported Camera Resolutions 160 x 120 200 x 150 240 x 180 320 x 240 400 x 300 640 x 480 Apple iSight Yes Yes Yes Yes No Yes Sony DCR-TRV460 Yes No No Yes No Yes Creative NX Ultra Yes No No Yes No Yes The cameras tested do not all support the same range of resolutions and frame rates. For this reason, we focused our analysis on configurations supported by multiple cameras to determine comparative performance, even though data points were obtained for a significantly larger set of configurations. In particular, the 160 x 120, 320 x 240, and 640 x 480 resolutions allowed commensurate comparisons between the cameras at various frame rates up to 15 fps for all cameras, and up to 30 fps for the Sony DCR-TRV460 and the Apple iSight cameras. We also made a number of fairly interesting observations with regard to frame rates. In the case of the Creative NX Ultra camera, Flash was successfully able to request and receive video streams at frame rates up to 30 fps as reported by the Camera.fps property, although the camera itself is specified as having a maximum frame rate of 15 fps. We suspect this might be due to inaccurate reporting on the part of the driver or software-level interpolation. The results from our experiments do not yield conclusive evidence for either possibility. Also, although the Apple iSight camera is not officially supported on the Windows platform, we were able to use it with the default Microsoft drivers for 1394 desktop cameras. However, when using this driver, the frame rate was capped at a maximum frame rate of 15 fps. Using the third-party Unibrain Fire-i IIDC/DCAM driver instead enabled us to reach the specified hardware maximum frame rate of 30 fps as shown in Figure 2. It should also be noted that the Creative Labs NX Ultra camera yielded significantly noisier CPU utilization data than the other cameras during testing. We presume this is due to USB bus usage by other devices, including our keyboard and mouse, but could not conclusively determine the source. Overall, the processor load results came in strongly in favor of the IIDC/DCAM- compliant Apple iSight camera. Processor utilization for image acquisition and importing in Flash was roughly half that required for the other two cameras at the 5 Copyright 2005, Architekture.com, All Rights Reserved.
  • 10. same resolution and frame rate in all comparable cases, with the Unibrain Fire-i driver slightly outperforming the Microsoft driver. Processor utilization was roughly comparable between the Sony DCR-TRV460 and the Creative NX Ultra cameras at low resolutions. At a resolution of 320 x 240, the DV- compliant Sony DCR-TRV460 camera came out in the middle and outperformed the Creative Labs NX Ultra camera, although at 640 x 480, the Sony DCR-TRV460 camera came in last when used with higher frame rates. Also, as expected, processor utilization increases with higher resolutions and frame rates. From a hardware perspective, we recommend the use of IIDC/DCAM-compliant cameras, because the uncompressed data stream appears to reduce significantly the overhead needed to process the image for consumption by Flash, particularly if processor resources are at a premium (for example, slower machines, visually rich user interfaces, or video conferences involving more than two participants). Figure 2 shows graphs of experimental results for various requested resolutions at reported frame rates of 15, 24, and 30 fps (lower CPU utilization is better). Note that resolutions other than 160 x 120, 320 x 240, and 640 x 480 are not directly commensurable between cameras due to differences in actual hardware resolution. 6 Copyright 2005, Architekture.com, All Rights Reserved.
  • 11. Resolution vs. CPU Utilization - 15 FPS 25 20 % CPU Utilization 15 10 5 0 160x120 200x150 240x180 320x240 400x300 640x480 Resolution Resolution vs. CPU Utilization - 24 FPS 25 20 % CPU Utilization 15 10 5 0 160x120 200x150 240x180 320x240 400x300 640x480 Resolution Resolution vs. CPU Utilization - 30 FPS 25 20 % CPU Utilization 15 10 5 0 160x120 200x150 240x180 320x240 400x300 640x480 Resolution Figure 2: Resolution versus CPU utilization graphs 7 Copyright 2005, Architekture.com, All Rights Reserved.
  • 12. MICROPHONES One of the most common problems we encountered with microphones used for video conferencing was the introduction of unwanted echoes and background noise. Although Flash does provide an option for echo suppression via software, we have found that we were able to obtain significantly better results when we reduced the incidence of echoes and irrelevant background noise on the hardware level through proper microphone selection. Echoes and ambient noise are particularly undesirable, because they not only make speech less intelligible, but the unwanted sounds also interfere with our ability to set accurately the silence level needed to toggle the microphone activity state. In the course of developing video conferencing applications for our clients, we have experimented with a number of different microphone setups, including analog headsets, USB headsets, and discrete microphone and speaker combinations to determine the best configurations for obtaining high-quality sound capture while minimizing unwanted noise. The best setup for reducing echo and ambient noise we have found so far seems to be with noise-canceling USB headsets. Additional improvements to audio quality that can be made through software will be discussed later. NETWORKING Our video conferencing application development is, for the most part, geared towards high-bandwidth intranet applications. For this reason, we primarily conduct our testing over 100 Mbit Ethernet connections, with and without non-RTMP “noise” traffic. In our experiments with up to 5 actual participants and simulated conferences involving up to 10 participants, we have not encountered any problems with network saturation thus far. For LAN-based intranet applications, a 100 Mbit Ethernet setup appears to be quite sufficient for video conferencing. We have not tested other local network technologies such as 802.11, but results would be similar to those we have obtained given ample bandwidth and network latencies commensurate with 100 Mbit Ethernet connections. High-quality live video conferencing over high-bandwidth Ethernet connections is possible even at relatively high resolutions such as 320 x 240 for small numbers of simultaneous participants. Additionally, bandwidth utilization can be capped at reasonably low levels (for example, 38,400 bytes per second per video stream) without significant loss of video quality given a judicious choice of video encoding parameters as we describe later. For lower bandwidth usage such as across the Internet, available bandwidth will be markedly lower than that available on a LAN, and latency—the amount of time elapsed from when the video has been encoded on one machine to when the video 8 Copyright 2005, Architekture.com, All Rights Reserved.
  • 13. has been decoded on the recipient machine—will be increased. These issues are essentially the facts of life when developing Internet-based applications. However, they can be dealt with fairly effectively by minimizing bandwidth usage and allowing for increased latency. It should also be noted that for many-to-many video conferencing, the bandwidth required grows exponentially relative to the number of participants. We discuss this issue in greater depth shortly when we consider network limitations on scaling. This is particularly relevant for cases of limited bandwidth, but is an important concern when dealing with collaborative video conferences with increasing numbers of participants. SOFTWARE SETTINGS We have experimented with a large number of the possible software settings in Flash Player 7 for video conferencing and have documented our observations in this section. In particular, we have found that many of the typical glitches observed in video conferencing can be addressed with changes in the settings used in the Flash Player client-side communication objects. We also review several other interesting items that we have found in engineering video conferencing applications. CAMERA SETTINGS The principal methods for manipulating the camera object in Flash Player are setMode(), setQuality(), and setKeyFrameInterval(). As the camera object is responsible for generating the bulk of the data needed to be streamed to Flash Communication Server, the settings here have a significant effect on both the video quality and the overall video conferencing experience. We’ll consider each of these methods in turn and discuss the possible options for each setting and our observations, test results, and recommendations for configuring an optimal video conferencing experience. Camera.setMode() The Camera.setMode() method allows specification of the desired resolution and frame rate for the video data being collected. Of course, only certain resolutions and frame rates are supported natively by each physical camera due to hardware limitations. If the settings specified are not intrinsically supported by the camera, Flash Player will instead fall back to the closest possible setting. The capture size will be favored over frame rate by default, but the preference of capture rate over frame rate can be changed through the optional favorSize flag. While this behavior does allow specification of practically any resolution and frame rate, we have found that using unsupported resolutions is undesirable, because it usually results in a pixelated image (as shown in Figure 1 earlier). 9 Copyright 2005, Architekture.com, All Rights Reserved.
  • 14. From experience, we have found that resolutions of 160 x 120 and 320 x 240 tend to be good choices because they seem to be supported natively by many typical cameras used for video conferencing applications, and they are small enough to function well when encoding for streaming. It is possible to detect programmatically whether the specified size and frame rate were actually used for the camera hardware by inspecting the read-only width, height, and currentFps properties. From our previous tests conducting basic video capture without encoding for network transport, we observed that lower resolutions and frame rates reduce the processor demand on the machine. With this in mind, we recommend choosing the lowest acceptable capture size and frame rate for an application. For high-bandwidth intranet applications, we have found that a resolution of 320 x 240 at 24 fps works relatively well for up to five simultaneous participants. For conferences intended to be conducted across the Internet through broadband connections, capture size and frame rate will need to be scaled down accordingly. Camera.setQuality() Camera.setQuality() allows specification of both the maximum bandwidth per second to be used by an outgoing video stream, and the required video quality of the outgoing stream. By default, these are 16384 and 0, respectively. These settings allow for the choice of different setups, each with its own benefits. Either parameter can be set to zero to allow Flash to automatically use as much bandwidth as necessary to maintain a specified video quality, or to throttle video quality to avoid exceeding the given bandwidth cap. The video quality can also be set to 100 to use the lossless non-compressing codec instead. Also, an exact bandwidth limit and a required video quality can be specified when both are equally important. We have been unable to determine significant differences in processor utilization between the various setups. However, our experiments revealed marked differences in how Flash handles the edge cases where quality or bandwidth must be sacrificed to remain within the specified limits. In particular, we focused on settings intended for use in intranets with high-bandwidth network connectivity. For the case where both a maximum bandwidth and desired frame quality are specified, we found that a bandwidth limit between 400,000 and 900,000 bytes per second and a frame quality setting of 60 to 85 gave very acceptable results with smooth playback and no audio synchronization issues. Lower frame quality settings yielded increasingly pixelated video as expected. Low bandwidth limits, however, yielded skipped frames as described in the camera object’s documentation. 10 Copyright 2005, Architekture.com, All Rights Reserved.
  • 15. We also note that in cases where we chose relatively high bandwidth caps, the actual outgoing bandwidth usage seemed to reach a maximal upper limit below the specified cap. For example, we observed total bandwidth usage to seldom exceed 250,000 bytes per second for a 320 x 240 stream captured at 24 fps despite the fact that maximum bandwidth was allocated for video and that the server-to-client maximum total bandwidth on the Flash Communication Server application was set to higher values. With the frame quality specified and bandwidth usage left up to Flash (set to zero), we conducted a series of experiments to determine actual bandwidth usage and observed video quality for various frame quality settings under simulated video conferencing conditions by publishing and self-subscribing to the same stream with the settings recommended by Giacomo “Peldi” Guilizzoni on his weblog. Table 3: Camera.setQuality() Basic Settings Bandwidth: 0 FPS: 24 Favor Size: 0 Frame Quality: As Below Key Frame Interval: 8 Camera Width: 280 Camera Height: 208 Buffer Time: 0 Audio Rate: 22 MHz Table 4 shows the results we obtained for each specified frame quality. Outgoing bandwidth usage per second and processor utilization were averaged over 30 seconds of simulated video conference usage with intermittent audio and relatively little physical motion. Table 4: Variable Frame Quality Results Frame Quality Bandwidth/Sec. CPU Util. (%) Subjective Findings 100 250,000 33 Excellent picture, marked frame skipping 90 68,000 29 Excellent picture, some frame skipping 80 36,000 30 Excellent picture, occasional frame skipping 70 24,000 Not Measured Faint pixelization, smooth playback 60 19,000 Not Measured Mild pixelization, smooth playback 50 13,000 Not Measured Medium pixelization, smooth playback 40 11,000 Not Measured Loss of fine detail, smooth playback 30 10,000 Not Measured Moderate loss of detail, smooth playback 20 9,000 Not Measured Severe loss of detail, smooth playback 10 8,000 27 Loss of gross detail, smooth playback 11 Copyright 2005, Architekture.com, All Rights Reserved.
  • 16. From the data, we observed that CPU utilization dropped rather slowly with decreasing frame quality. High frame quality yielded very high quality pictures at the cost of frame skipping, whereas specifying lower frame quality yielded smooth playback by sacrificing detail. The sweet spot, as it were, seems to be at about a frame quality between 70 to 80. It is also rather interesting to note that at a frame quality of 100 (using the lossless codec with no compression, and causing exceptionally high bandwidth consumption), the CPU utilization seems to be somewhat greater than when the frame quality is set to lower values and the video data compressed. Using similar settings with the frame quality set to 80 and varying the specified bandwidth, we repeated the experiment to obtain the results shown in Table 5. Table 5: Variable Bandwidth Results Spec. Bandwidth CPU Use (%) Subjective Findings 19,200 30 Smooth, significant pixelization upon movement 38,400 Not Measured Smooth, some pixelization upon movement 51,200 Not Measured Occasional frame skips, pixelization on gross movement 76,800 Not Measured Frequent frame skips, pixelization with extreme movement 128,000 Not Measured Frequent frame skips, high-quality picture 192,000 Not Measured Frequent frame skips, high-quality picture 256,000 Not Measured Very frequent frame skips, high-quality picture 384,000 30 Constant frame skip, high-quality picture Here, the trade-off seems to be in smooth video playback versus greater pixelization upon movement. If the video image is very still over time, a high-quality picture can be obtained for practically all the specified bandwidths. However, this is somewhat impractical for most video conferencing applications where one would expect at least a small amount of movement. The sweet spot here for a frame quality of 80 is apparently somewhere between 38,400 to 51,200 bytes per second, though 38,400 is quite acceptable if you don’t mind momentary pixelization upon a video conference participant’s sudden movement. Processor utilization, however, appears to be fairly constant throughout. Allowing Flash to modulate the frame quality as needed has the considerable benefit of keeping the bandwidth usage capped to relatively low levels without significantly sacrificing image quality. This is particularly important for low-bandwidth usage scenarios, such as video conferencing over the Internet, and for scaling video conferences to larger numbers of simultaneous participants for intranet use. It is our preferred setting, because momentary pixelization upon gross movements is considered preferable to frequent and unpredictable frame skipping. However, each application may benefit from experimentation with various bandwidth and frame quality settings, depending on requirements and preferences. Alternatively, 12 Copyright 2005, Architekture.com, All Rights Reserved.
  • 17. Guilizzoni offers a rather handy calculator for choosing these settings with a number of configurable options at: http://www.peldi.com/blog/archives/2005/01/generating_opti_1.html Camera.setKeyFrameInterval() The key frame interval determines how often a full key frame is published to the stream, as opposed to an interpolated frame generated by the codec. Flash allows values ranging from 1 to 48, with the default being 15 (every fifteenth frame is a key frame). Testing with varying values for the key frame interval indicates that low key frame intervals tend to contribute to increased frame skipping (as additional bandwidth is used to transmit a full key frame more often), whereas large intervals yield decreased to non-existent frame skipping, but introduce longer normalization times in cases where the frame quality was automatically throttled down in response to motion. For applications demanding very high quality video, we typically set the key frame interval to be equal to or greater than our frame rate, because we feel that occasionally lowered frame quality is preferable to frequent frame skipping. MICROPHONE SETTINGS There are several settings that can be specified for the microphone object from within Flash. Specifically, these are the sampling rate, the gain, the silence level and time out, and whether to enable echo suppression in the audio codec. These settings affect sound acquisition and encoding for publishing to Flash Communication Server. Microphone.setRate() This method determines the sampling rate used to acquire sound from the microphone in kilohertz (kHz). Flash allows settings of 5, 8, 11, 22, and 44 kHz, with 8 being the default in most cases. In general, higher sampling rates yield more natural-sounding audio with increased bandwidth usage. We generally use settings of 22 or 44 kHz to achieve relatively high-quality audio transmission and haven’t noticed significant performance increases with lower sampling rates. Microphone.setGain() and Microphone.setSilenceLevel() The gain on the microphone is applied as a multiplier for boosting the input much like a volume knob works, with zero silencing the audio, the default level of 50 leaving the signal strength unchanged, and a maximum value of 100. This setting is used in conjunction with the silence level, which determines the threshold above which the microphone is activated for publishing audio data. Optionally, the Microphone.setSilenceLevel() method can also take a second parameter to specify the silence timeout, which is the time in milliseconds that audio should continue to be published after the sound level drops below the specified silence level. We have noted that oftentimes it can be rather difficult to set the audio gain and silence levels as precisely as we would like to enable the microphone to toggle state 13 Copyright 2005, Architekture.com, All Rights Reserved.
  • 18. correctly. In some cases, the sweet spot for the silence level has been as narrow as one unit, with too low a value causing the microphone to be keyed on constantly and picking up all manner of ambient noise, while a slightly higher value would not accurately detect a video conference participant’s voice at normal conversational volume. The proper choice of gain and silence level values seems to differ significantly between individual machines and microphone setups, so we are unable to recommend specific values outside of experimentation with particular hardware setups. We do, however, recommend implementation of a tell-tale “talk” light in many cases so a participant can see whether his or her audio signal is being broadcast. Too frequently, we have seen the case of a video of a participant mouthing words silently on-screen, unaware that the microphone remains deactivated. If it is necessary to silence the audio programmatically in response to low activity levels or to implement a push-to-talk feature, setting the gain to zero is an effective means of doing this. However, we have not found setting the silence level to 100 to be effective in all instances, because very loud microphone input can raise the activity level to 100 and thus breach the threshold. Microphone.setUseEchoSuppression() Flash allows for optional echo suppression through the audio codec to be toggled on and off using ActionScript. We usually enable this with good results, although we have found that a more effective solution to echo reduction is to use USB headsets with noise cancellation over analog headsets or discrete microphone and speaker setups. This has the added benefit of filtering out the majority of background noise before it hits Flash, making it easier to get the silenceLevel setting right. BUFFER TIMES The NetStream object allows a buffer time to be set on both publishing and subscribing but with significantly different effects. If set on publishing, it determines the maximum size of the outgoing buffer that, when full, will cause the remaining frames to be dropped. The Macromedia documentation states that this is generally not a concern on high-bandwidth connections and we have found this to be the case in our use. On the subscribing end, the buffer time determines the amount of data to be buffered prior to display. We have typically set both of these to zero with excellent results for use with live video conferencing applications. EMBEDDED VIDEO SIZES Our experience with sizing embedded videos suggests that processor load is minimized when the embedded video object is sized to match the subscribed video stream’s resolution exactly. In experiments where the displayed video is sized to be both larger and smaller than the published resolution, we have observed increased 14 Copyright 2005, Architekture.com, All Rights Reserved.
  • 19. processor utilization. Given that the camera resolution on a publishing machine can be changed easily, we recommend matching subscribers’ embedded video object sizes with the stream’s video resolution. The stream’s native resolution can be determined programmatically on the subscriber machine by examining the attached Video object’s width and height properties. MOVIECLIP.ATTACHAUDIO() In order to control certain aspects of a stream’s incoming sound (such as volume and panning), a developer can use the MovieClip.attachAudio() method to attach the incoming sound to a MovieClip and then control it through a Sound object as suggested in the client-side development documentation. However, in our experience, we have found that while such technique does provide for additional control over the incoming sound, it also has an unfortunate tendency to desynchronize the audio playback from the video playback. We have not found an adequate solution for this problem as of yet and recommend against using MovieClip.attachAudio() on live video conferencing streams. STREAM LATENCY Latency can be a significant problem with many video conferencing situations, and manifests itself as the delay between events captured at the publishing machine and their arrival and display on a subscriber machine. Because there is no native provision for a client-side determination of latency, we measure latency by broadcasting a message using the NetStream.send() method on a publishing machine and timing the difference in time between the initial broadcast and the subsequent receipt of the message on a second, self-subscribed stream. While this technique measures data latency, all of our observations thus far indicate this directly coincides with video latency. Therefore, we have also taken to interpreting data latency as video latency. In the course of our research, we have noted that, upon subscribing to a live stream, latency typically averages below 50 milliseconds (ms) when audio data is entirely absent. However, upon playback of streamed audio data, latency will typically increase rapidly to several hundred milliseconds with little to no recovery to previous levels, even after audio data has ceased. We have also observed that in some cases with continuous audio data (for example, when the microphone is always keyed on because of significant volume or too low a silence level), the measured latency increases slowly in a continuous manner. While in many cases the latency will tend to level out at 200 to 400 ms (values that we find acceptable), latency will sometimes continue to grow into the seconds, yielding a very poor-quality video conferencing experience. While we typically can restore the latency to low levels by closing the subscribed stream and resubscribing, such a solution is not particularly appealing because it interrupts the video and audio for 15 Copyright 2005, Architekture.com, All Rights Reserved.
  • 20. several seconds while the stream is reconnected. To date, we have not found an adequate solution for capping latency at manageable levels. It is also important to note that we have not discovered a way of automating the measurement of audio latency, and aside from implementing a questionable hardware-based solution such as feeding the speaker jack into the microphone jack and monitoring the audio activity level, we are at a loss on how to measure audio latency. A means of determining audio latency would be extremely valuable, of course, because we could then identify and measure audio sync issues as they occurred through automated means. SCALING While it is relatively easy to create a high-quality video conferencing experience for two simultaneous participants, the demands on both the network and the machines increase quickly as an application is scaled to involve greater numbers of simultaneous participants. Specifically, the bandwidth needed to support many-to- many video conferencing grows exponentially relative to the number of participants such that n2 streams are required for n participants. (For more information on bandwidth usage, see Brian Hock’s Macromedia white paper entitled Calculating Your Bandwidth and Software License Needs for the Macromedia Flash Communication Server MX.) Additionally, each client machine will need to dedicate additional resources to handle the decoding of each subscribed stream. These factors place upper limits on the maximum number of possible participants in a single video conference on several fronts, the FCS server, the network infrastructure’s available bandwidth, and the capabilities of the client machines. FLASH COMMUNICATION SERVER LIMITATIONS Flash Communication Server is licensed in increments of 10 Mbit per second or 2,500 simultaneous connections, so the primary consideration when it comes to scaling Flash Communication Server to accommodate increasing numbers of video conference participants is adequate bandwidth support by its current license(s). For video conferencing applications, the 10 Mbit per second peak bandwidth limit will almost surely be reached before coming close to making 2,500 simultaneous connections. There aren’t any limits on the number of streams served, just peak bandwidth usage and total simultaneous connections. A single professional license offers 10 Mbit per second, or about 1.19 megabytes per second in available bandwidth. To calculate the usage for a hypothetical case, let us assume a fairly typical high-bandwidth video conferencing stream with a maximum of 38,400 bytes per second allocated to video data, and a 44 kHz audio sampling rate. Experimentally, this utilizes a peak maximum of roughly 50 kilobytes per second. 16 Copyright 2005, Architekture.com, All Rights Reserved.
  • 21. Using 50 kilobytes per second as our estimated bandwidth usage, for increasing numbers of participants, we can generate the total streams and estimated maximum bandwidth usage per second in Table 6. Table 6: Example Bandwidth Calculations for n Participants Participants Total Streams Max. Bandwidth (Bytes per Sec.) 2 4 200,000 3 9 450,000 4 16 800,000 5 25 1,250,000 6 36 1,800,000 7 49 2,450,000 8 64 3,200,000 9 81 4,050,000 10 100 5,000,000 Of course, these numbers are a rough estimate and probably err slightly on the high side, because we are assuming that all streams are simultaneously reaching their expected peak bandwidth utilization. However, we can use these figures to estimate the bandwidth load on the Flash Communication Server software. Given our earlier assumptions, a single professional license will likely become saturated somewhere between four and five simultaneous participants. To accommodate larger numbers of participants, the maximum bandwidth cap on Flash Communication Server would need to be increased by stacking additional licenses or purchasing higher capacity licenses from Macromedia. In practice, actual bandwidth usage will depend on the choice of settings and how the application is actually used. As screen real estate on the client side is also expected to diminish with increasing numbers of video conference participants, we recommend a strategy of reducing per-stream bandwidth usage with increasing numbers of participants by scaling down the capture resolution and frame rate, video bandwidth cap, or frame quality as the number of participants in a video conference grows. Even with an unlimited capacity license on Flash Communication Server, the limitations on hardware, operating system, and processor performance will eventually impose a hard ceiling on the number of simultaneous participants supported for a video conferencing application. NETWORK LIMITATIONS Much of our research has been focused on video conferencing in high-bandwidth intranet configurations with ample network headroom. However, network limitations should be kept in mind when scaling video conferencing applications for deployment on all network configurations, particularly those in heavily used environments or 17 Copyright 2005, Architekture.com, All Rights Reserved.
  • 22. across the Internet. Also, when comparing bandwidth utilization reported by Flash Communication Server to actual bandwidth used on the physical network, some degree of additional network overhead used for packet envelopes, retransmitted packets, and control messages should be taken into account. In our experience, video conferencing works very well in an intranet setting. However, in busy local network environments, you will need to take into account additional, non-video conference traffic such as e-mail, web browsing, and file transfers also contending for network bandwidth. Depending on local traffic volume and the network architecture, you may encounter lower available bandwidth and quality of service than might be expected in ideal conditions. While we have not encountered any problems traceable to network congestion in test cases involving both shared and dedicated 100 Mbit Ethernet connections for our video conferencing tests, we do suggest testing to ensure that an application runs well in its specific network environment. When video conferencing is conducted over the Internet, other factors come into play. First, significantly greater latency and lower available bandwidth can be expected than those achievable in a local network configuration, even for users with broadband connections. Also, some users may have asymmetric upload and download bandwidth capacities. These limitations place additional constraints on the size and quality of video streams that can be delivered to each client. As recommended by Guilizzoni and Hock, lowering the capture size, bandwidth and video quality of your streams will be necessary to accommodate the limitations of Internet-based conferencing. CLIENT MACHINE LIMITATIONS On the client machines, the principal consideration in scaling to larger numbers of participants is the incremental growth of the number of streams that need to be decoded and displayed. We have conducted a number of simulated tests on single video conference clients subscribing to and displaying up to 10 live streams without significant problems when used with reasonable bandwidth and quality settings. We observed that the settings in Table 7 yield very acceptable performance with smooth playback when decoding and rendering up to 10 incoming streams on our test machine. Table 7: Recommended 10-Participant Settings Bandwidth: 38400 FPS: 15 Favor Size: 0 Frame Quality: 0 Key Frame Interval: 30 Camera Width: 160 18 Copyright 2005, Architekture.com, All Rights Reserved.
  • 23. Camera Height: 120 Buffer Time: 0 Audio Rate: 22 MHz Average processor utilization for 10 streams utilizing the settings in Table 7 was only 36%, demonstrating that a high-quality 10-participant video conference is entirely possible on current systems using Macromedia Flash technologies. We have also conducted additional tests with varying parameters, but have found this combination of settings to yield the best results. CPU UTILIZATION AND RESOLUTION We wanted to determine the effect of stream resolution on processor usage and determine optimal resolutions to use with different numbers of simultaneous participants. Using matched publishing and display resolutions, we measured averaged CPU utilization on our test machine over 60 to 90 seconds when subscribed to 4, 6, or 8 simulated video conferencing streams at various resolutions in 4:3 aspect ratios using the settings in Table 8. Table 8: CPU Utilization versus Resolution Basic Settings Bandwidth: 38400 FPS: 24 Favor Size: 0 Frame Quality: 0 Key Frame Interval: 48 Buffer Time: 0 Audio Rate: 44 MHz Figure 3 shows the plotted results obtained for the 4, 6, and 8 stream cases at various resolutions 19 Copyright 2005, Architekture.com, All Rights Reserved.
  • 24. Figure 3: CPU utilization versus stream resolution area graph The x-axis in this graph is measured in somewhat unusual units, the area resolution of a stream’s video feed in thousands of pixels. For example, a video resolution of 320 x 240 would yield an area of 76.8 kilopixels (320 x 240 = 76,800). To convert back from the area to the original 4:3 aspect ratio dimensions, divide the area by 12 and take the square root of the resulting value. This can be multiplied by 4 to obtain the width, and by 3 to obtain the height. This unit of measurement was used so that we could quantitatively compare various resolutions against each other. The numeric results are provided in Appendix B. The positions of the 160 x 120 and 320 x 240 capture resolutions that are typically supported at the hardware level by many commonly used video conferencing cameras are indicated on the graph to assist in reading. At present, we suspect that the appearance of shelves, where CPU utilization remains fairly stable across relatively substantial changes in resolution with marked changes between certain resolutions, stems from the encoding algorithm used by Flash in compressing video. However, we do not have sufficient information to determine conclusively whether this is the case. 20 Copyright 2005, Architekture.com, All Rights Reserved.
  • 25. SUMMARY In summary, we offer these findings of optimal hardware and software configurations for use in live conferencing applications using Flash Communication Server: • Cameras for video conferencing differ significantly in the processor load needed for video acquisition. We have found that Firewire cameras using the IIDC/DCAM protocol perform significantly better than USB cameras or DV Firewire cameras. • USB headsets with active noise cancellation are preferred, because they provide superior sound quality and echo reduction compared to analog headsets or discrete setups. • Resolutions natively supported by the camera hardware are preferable in order to avoid pixelization. Typically, 160 x 120 and 320 x 240 are supported and work reasonably well for streaming. • Bandwidth utilization should be carefully balanced with image quality. Maximizing either or both tends to yield poor results. A bandwidth limit of about 38,400 bytes per second with an unspecified frame quality and a key frame interval at or above the camera frame rate serves our purposes rather well. Experimentation may be in order to find the configuration best fitting a given application’s requirements. Giacomo Guilizzoni has provided an easy- to-use calculator that recommends values for various setups at: http://www.peldi.com/blog/archives/2005/01/generating_opti_1.html • Microphone sampling rates of 22 or 44 kHz work well. Low sampling rates, while reducing bandwidth usage, also result in poor audio quality. • Embedded videos used for displaying subscribed streams should be sized to match the originating camera resolution for optimal performance. • MovieClip.attachAudio() should not be used to manipulate the audio from a subscribed stream. This has a tendency to introduce unwanted synchronization issues. 21 Copyright 2005, Architekture.com, All Rights Reserved.
  • 26. APPENDIX A: ERROR MARGINS AND SIGNIFICANCE Most of our test results, particularly those of processor utilization read from Windows Task Manager, were obtained by manual estimates of averages from values provided from various tools on a periodic basis. Additionally, video conferences were typically simulated by speaking into our USB headsets in front of our test cameras in a calm manner for up to several minutes. Unfortunately, such practices limit our ability to reproduce visual and audio inputs exactly for each test case. As such, we have assumed a moderate error margin and have refrained from reading significance into cases where only marginal differences were observed due to our inability to obtain results with high precision or granularity. We are actively working to obtain results with greater statistical rigor through research in tools that would yield both better-reproducible test cases and more precise results. Using such tools, we would be able to analyze for significant variations more effectively. To alleviate some of the problems that our current methods introduce, we provide detailed experimental results and community access to our experimental tools in these appendixes so that our tests can be reproduced and the results be compared by others in the Flash Communication Server development community. 22 Copyright 2005, Architekture.com, All Rights Reserved.
  • 27. APPENDIX B: DETAILED EXPERIMENTAL SETUPS AND RESULTS CAMERA TESTING For our camera tests, three representative cameras supporting different protocols were used in conjunction with our CamTest tool: Apple iSight, an IIDC/DCAM-compliant webcam that connects via Firewire; Sony DCR-TRV460, a DV-compliant camcorder that also connects via Firewire; and Creative Labs NX Ultra, a USB webcam. All cameras were specified as having a maximum live video resolution of 640 x 480 pixels, the capability of yielding streams of up to 30 fps (with the exception of the Creative NX Ultra, which was limited to 15 fps). Although the Sony DCR-TRV460 camcorder also supports a USB connection, we only tested it using its DV connection. Table 9: Basic Camera Specifications Camera Data Bus Max. Resolution Max. FPS Apple iSight IIDC/DCAM 640x480 30 Sony DCR-TRV460 DV 640x480 30 Creative NX Ultra USB 640x480 15 CPU utilization for locally visualizing video output at varying resolutions and frame rates was measured using each camera using the Windows Task Manager with all non-essential processes disabled. To isolate the processor requirements needed to process the video signal into Flash, these tests were conducted entirely locally using a simple Flash application running under Flash Player 7.0.19.0 with no Flash Communication Server integration. Resolutions tested were all at the standard definition ratio of 4:3: 160 x 120, 200 x 150, 240 x 180, 320 x 240, 400 x 300, and 640 x 480 at rates of 1, 5, 10, 15, 24, and 30 fps. CPU utilization was averaged over roughly 30 seconds of video acquisition. Table 10 provides the supported resolutions for each camera among the test set. The footnotes provide additional details on the actual sizes of the video streams when the given resolution was requested. 23 Copyright 2005, Architekture.com, All Rights Reserved.
  • 28. Table 10: Supported Resolutions for Test Cameras (Extended) 160x120 200x150 240x180 320x240 400x300 640x480 Apple iSight Yes Yes Yes Yes No1 Yes Sony DCR-TRV460 Yes No2 No2 Yes No1 Yes3 Creative NX Ultra Yes4 No5 No5 Yes4 No6 Yes4 As a result, the CPU utilization observations obtained for the 200 x 150, 240 x 180, and 400 x 300 resolutions should be interpreted with some caution compared to the resolutions for which all tested cameras provided matched video streams. It is probable that the scaling of lower-resolution video streams to the originally requested size in the Flash Player contributes somewhat to the overall CPU utilization. Additionally, we had some issues with frame rates. In the case of the Creative NX Ultra, although the camera itself is specified as having a maximum frame rate of 15 fps, Flash was successfully able to request and receive video streams at frame rates up to 30 fps. We suspect this might be due to inaccurate reporting on the part of the driver or software-level interpolation. The results from our experiments do not yield conclusive evidence for either possibility. In the case of the Apple iSight camera, we were only able to attain a maximum frame rate of 15 fps, although the technical specifications state that a frame rate of 30 fps was possible. This was likely due to the use of the generic Windows 1394 Desktop Camera driver, because a manufacturer-supplied driver for the Windows operating system was not available. Resolution and frame rate testing for the Apple iSight camera was therefore limited to frame rates of 15 fps and below for the tests described here, though at a later point, we were able to obtain a 30 fps frame rate from the Apple iSight camera using the Unibrain third-party Fire-i drivers for 1394 IIDC/DCAM cameras as described in the main body of this white paper. It should also be noted that results for the Creative NX Ultra camera were significantly noisier than for the other cameras, presumably due to noise from additional USB devices connected to the test machine. Figure 4 presents graphs of our experimental results (lower CPU utilization is better). 1 A video stream of 320 x 240 was obtained when a 400 x 300 stream was requested. 2 Video streams of 160 x 120 were obtained when 200 x 150 and 240 x 180 streams were requested. 3 The Sony DCR-TRV460 camera produces an interlaced video stream at 640 x 480. 4 The Creative NX Ultra camera produced slightly letterboxed frames at 160 x 120, 320 x 240, and 640 x 480. 5 Video streams of 176 x 132 were obtained when 200 x 150 and 240 x 180 streams were requested. 6 A video stream of 352 x 264 was obtained when a 400 x 300 stream was requested. 24 Copyright 2005, Architekture.com, All Rights Reserved.
  • 29. Frame Rate vs. CPU Utilization at 160x120 25 20 % CPU Utilization 15 Apple iSight (1394 IIDC/DCAM) Sony DCR-TRV460 (1394 DV) Creative NX Ultra (USB) 10 5 0 0 5 10 15 20 25 30 Frames Per Second Frame Rate vs. CPU Utilization at 240x180 25 20 % CPU Utilization 15 Apple iSight (1394 IIDC/DCAM) Sony DCR-TRV460 (1394 DV) Creative NX Ultra (USB) 10 5 0 0 5 10 15 20 25 30 Frames Per Second Frame Rate vs. CPU Utilization at 200x150 25 20 % CPU Utilization 15 Apple iSight (1394 IIDC/DCAM) Sony DCR-TRV460 (1394 DV) Creative NX Ultra (USB) 10 5 0 0 5 10 15 20 25 30 Frames Per Second 25 Copyright 2005, Architekture.com, All Rights Reserved.
  • 30. Frame Rate vs. CPU Utilization at 320x240 25 20 % CPU Utilization 15 Apple iSight (1394 IIDC/DCAM) Sony DCR-TRV460 (1394 DV) Creative NX Ultra (USB) 10 5 0 0 5 10 15 20 25 30 Frames Per Second Frame Rate vs. CPU Utilization at 400x300 25 20 % CPU Utilization 15 Apple iSight (1394 IIDC/DCAM) Sony DCR-TRV460 (1394 DV) Creative NX Ultra (USB) 10 5 0 0 5 10 15 20 25 30 Frames Per Second Frame Rate vs. CPU Utilization at 640x480 25 20 % CPU Utilization 15 Apple iSight (1394 IIDC/DCAM) Sony DCR-TRV460 (1394 DV) Creative NX Ultra (USB) 10 5 0 0 5 10 15 20 25 30 Frames Per Second Figure 4: Frame rate versus CPU utilization graphs 26 Copyright 2005, Architekture.com, All Rights Reserved.
  • 31. Figure 5 shows the results of graphing the same data to compare CPU utilization at 15, 24, and 30 fps. Resolution vs. CPU Utilization - 24 FPS 25 20 15 Sony DCR-TRV 460 Creative NX Ultra 10 5 0 160x120 200x150 240x180 320x240 400x300 640x480 Resolution vs. CPU Utilization - 30 FPS 25 20 % CPU Utilization 15 Sony DCR-TRV 460 Creative NX Ultra 10 5 0 160x120 200x150 240x180 320x240 400x300 640x480 Resolution Figure 5: Resolution versus utilization graphs ENCODING/DECODING AND CPU UTILIZATION With the understanding obtained of the effects of various cameras and video stream formats on processor utilization, we next analyzed the CPU utilization incurred by publishing audio and video to Flash Communication Server as well as that needed for subscribing to a stream from Flash Communication Server using our FCSDiag tool. We tested a broadcasting-only configuration (with no local visualization), and a simple loopback configuration where the published stream was resubscribed and rendered by the same machine under several different video setting configurations that have 27 Copyright 2005, Architekture.com, All Rights Reserved.
  • 32. proven to give high-quality results. The loopback case effectively simulates the load for a participant machine in a simple 1-to-1 video conference. The first test was conducted with a configuration that yields, as we have found through prior work in Flash video conferencing, a relatively high-quality experience. Two additional tests were also conducted, the first having the camera bandwidth set to 38,400 and allowing Flash to throttle the video quality dynamically, and the second having the video quality set to 90 and the bandwidth unspecified by being set to zero, as recent experiments have shown that these configurations also yielded relatively high-quality results. Table 11 lists the configurations used for each of these tests. In the graphed results that follow, the “Publish Only” CPU utilization encompasses the CPU use needed for video acquisition and publishing of the encoded stream to Flash Communication Server, while the “Loopback” CPU utilization adds on the additional processor use needed to subscribe and display the same stream on the test machine. Table 11: Encoding/Decoding Test Configurations Test Configuration A Test Configuration B Test Configuration C Bandwidth: 400,000 38,400 0 7 7 FPS: 24 24 247 Favor Size: 0 0 0 Frame Quality: 85 0 90 Key Frame Interval: 48 48 48 Camera Width: 320 320 320 Camera Height: 240 240 240 Buffer Time: 0.01 0.01 0.01 Audio Rate: 22 MHz 22 MHz 22 MHz Figure 6 shows the results graphically. 7 In practice, this results in an actual frame rate of 15 FPS for the Apple iSight due to driver limitations. 28 Copyright 2005, Architekture.com, All Rights Reserved.
  • 33. Encoding / Decoding - Test Configuration A 30 25 20 % CPU Utilization Publish Only 15 Loopback 10 5 0 Apple iSight - FP7 Sony DCR-TRV460 - FP7 Creative NX Ultra - FP7 Camera / Flash Player Encoding / Decoding - Test Configuration B 30 25 20 % CPU Utilization Publish Only 15 Loopback 10 5 0 Apple iSight - FP7 Sony DCR-TRV460 - FP7 Creative NX Ultra - FP7 Camera / Flash Player Encoding / Decoding - Test Configuration C 30 25 20 % CPU Utilization Publish Only 15 Loopback 10 5 0 Apple iSight - FP7 Sony DCR-TRV460 - FP7 Creative NX Ultra - FP7 Camera / Flash Player Figure 6: Encoding/Decoding Graphs 29 Copyright 2005, Architekture.com, All Rights Reserved.
  • 34. Our three test configurations yielded comparable results in terms of CPU utilization despite the differences in settings. Because the Apple iSight camera was operating at only 15 fps for these tests (as described earlier), we believe that the CPU utilization in tests employing it are artificially lowered to a certain extent. As such, there do not appear to be substantial differences in the amount of work needed to encode or decode video from the cameras tested. In terms of subjective quality, all tested configurations utilizing the different cameras were quite acceptable, as we had anticipated. Additionally, the data from this series of experiments enable us to break down CPU usage into its constituent parts when combined with our earlier results for CPU utilization during video acquisition with our test cameras. Applying this to the data for Test Configuration A, we can derive the breakdown shown Figure 7. Results for the other test configurations are similar. Test Configuration A -- Loopback Breakdown 30 25 20 13 % CPU Utilization Decoding 15 Encoding 10 Acquisition 10 3 10 2.5 5 3.5 9 6.5 2.5 0 Apple iSight - FP7 Sony DCR-TRV460 - FP7 Creative NX Ultra - FP7 Figure 7: CPU utilization breakdown for Test Configuration A While there are slight differences in the CPU usage for encoding and decoding in the three cases shown, the greatest factor affecting total CPU utilization in these simulated simplest case 1-to-1 video conferencing tests remains the choice of camera. VIDEO SETTINGS From prior experience with video conferences involving five participants, we had usually set both frame quality and the maximum bandwidth for the camera during testing and used a static video resolution of 320 x 240 pixels at 24 fps (the same as the Flash movie’s frame rate). After significant trial and error, we had arrived at the settings shown in Table 12. These settings yielded the best overall performance with minimal frozen frames and synchronization issues. 30 Copyright 2005, Architekture.com, All Rights Reserved.
  • 35. Table12: Initial Video Settings Bandwidth: 400,000-900,000 FPS: 24 Favor Size: 0 Frame Quality: 60-85 Key Frame Interval: 48 Camera Width: 320 Camera Height: 240 Buffer Time: 0.01 Audio Rate: 22 MHz In FCSDiag loopback tests employing both video and audio input, the CPU utilization and average latency (time for a NetStream.send call to reach the Flash Communication Server application and return) did not show significant variation within the range of bandwidth and frame quality settings given in Table 12 and were essentially the same as the results obtained in Table 8 for Test Configuration A in the encoding/decoding tests. Subjectively, the video stream appeared very smooth and no frozen frames or problems with audio synchronization were observed. At lower frame quality settings, some pixelization was observed as expected. Table 13 lists typical CPU utilization and average latency obtained using these settings on Flash Player 7. Table13: CPU Utilization and Latency for Cameras Camera Avg. Latency (ms) % CPU Utilization Apple iSight 150 13 Sony DCR-TRV460 180 21 Creative NX Ultra 180 25 It should be noted that the average latency tends to remain fairly stable, with the loopback signal being delayed about 150 to 180 ms from real time once audio data has been introduced to the stream. On some occasions, latency will increase to markedly higher values (~1,500 ms) for unknown reasons and yield unsatisfactory results as the received stream lags over a second behind real time. We have also experimented with only setting either the maximum bandwidth or the frame quality so as to allow Flash to manage one or the other in real-time. We were initially introduced to such a possibility from Giacomo Guilizzoni’s weblog, where he presented an optimal settings calculator for Flash Communication Server video settings under different settings. 31 Copyright 2005, Architekture.com, All Rights Reserved.
  • 36. Adapting his results to our needs, we conducted a number of tests to quantify the effects of each of the parameters under such regimes. Our experiment results indicate these approaches also produce relatively high-quality results with properly chosen settings. As these tests were done using a different program than was employed in the earlier tests in order to measure and graph bandwidth utilization in real time. The resultant CPU utilization measures are not directly comparable to the data obtained in previous experiments. We used the Creative NX Ultra for these experiments. For our initial battery of tests, we set the bandwidth to 0 and throttled the frame quality from 100 down to 0 with the audio muted (to keep latency relatively constant) under the conditions given in the Table 14. Table14: Variable Frame Quality Settings Bandwidth: 0 FPS: 24 Favor Size: 0 Frame Quality: As Below Key Frame Interval: 8 Camera Width: 280 Camera Height: 208 Buffer Time: 0 Audio Rate: 22 MHz We obtained the following results, with the average bandwidth utilization in bytes, selected average CPU utilizations, and subjective findings for each test listed in Table 15. Table 15: Variable Frame Quality Results Frame Quality Bandwidth/Sec CPU Util. (%) Subjective Findings 100 250,000 33 High-quality picture, marked frame skipping 90 68,000 29 High-quality picture, some frame skipping 80 36,000 30 High-quality picture, occasional frame skipping 70 24,000 Not Measured Faint pixelization, smooth playback 60 19,000 Not Measured Mild pixelization, smooth playback 50 13,000 Not Measured Medium pixelization, smooth playback 40 11,000 Not Measured Loss of fine detail, smooth playback 30 10,000 Not Measured Moderate loss of detail, smooth playback 20 9,000 Not Measured Severe loss of detail, smooth playback 10 8,000 27 Loss of gross detail, smooth playback Here, CPU utilization seems to drop rather slowly with decreasing frame quality. High frame quality yielded very high-quality pictures at the cost of frame skipping, whereas specifying lower frame quality yielded smooth playback by sacrificing detail. The sweet spot, as it were, seems to be at about a frame quality of 70 to 80. It is also interesting to note that at a frame quality of 100 (zero compression, accompanied by 32 Copyright 2005, Architekture.com, All Rights Reserved.
  • 37. exceptionally high bandwidth consumption), the CPU utilization seems to be somewhat greater than when the frame quality is set to lower values and the video data compressed. Subsequently, we performed another battery of experiments, this time varying the specified bandwidth but keeping the frame quality set to 80 with settings otherwise identical to those given in Table 6. Although a frame quality of 80 had produced occasional frame skipping shown in Table 15, from previous experience such a value typically yields a decent trade-off between high bandwidth and CPU utilization and low picture quality, and so it was chosen for this set of experiments. Table 16 lists the results. Table 16: Variable Bandwidth Results Spec. Bandwidth CPU Util. (%) Subjective Findings 19,200 30 Smooth, significant pixelization upon movement 38,400 Not Measured Smooth, some pixelization upon movement 51,200 Not Measured Occasional frame skips, pixelization on gross movement 76,800 Not Measured Frequent frame skips, pixelization with extreme movement 128,000 Not Measured Frequent frame skips, high quality picture 192,000 Not Measured Frequent frame skips, high quality picture 256,000 Not Measured Very frequent frame skips, high quality picture 384,000 30 Constant frame skip, high quality picture The trade-off seems to be in smooth video playback versus greater pixelization upon movement. If the video image is very still over time, a high-quality picture can be obtained for practically all the specified bandwidths. The sweet spot for a frame quality of 80 is apparently somewhere between 38,400 to 51,200 bytes per second, although 38,400 is quite acceptable if it's acceptable to experience momentary pixelization upon a video conference participant’s sudden movement. Such settings also have the benefit of keeping the bandwidth usage capped relatively low without significantly sacrificing image quality. This is of particular benefit as we assume that keeping the bandwidth usage in check becomes increasingly necessary when scaling the video conference to greater numbers of participants. Additionally, several ad hoc tests indicate that a low key-frame interval tends to contribute to increased frame skipping, whereas high key-frame intervals, particularly ones higher than the frame rate, result in decreased frame skipping but introduce somewhat longer normalization times in cases where the video image has become pixelated due to motion. Although these tests were not repeated on the Apple iSight camera or the Sony DCR- TRV460 camcorder, the results obtained here lead to the configurations chosen for Test Configuration B and C in the encoding/decoding tests described earlier, which replicate a subset of these batteries for the two additional cameras. 33 Copyright 2005, Architekture.com, All Rights Reserved.
  • 38. SCALING The other major goal of our research is the feasibility of scaling video conferencing to support up to 10 simultaneous participants, as one of our goals is determining both the feasibility and extensibility of Flash video conferencing to large participant video conference situations. To do this, we conducted a number of tests using our FCSDiag suite of test applications. Due to both screen size and network bandwidth constraints, we are primarily looking at a resolution of 160 x 120 for each participant’s video stream. The principal considerations in finding optimal settings for supporting a 10-participant conference are maintaining a relatively low CPU utilization, as each machine will need to encode its own stream as well as decode 10 incoming streams, and minimizing network bandwidth utilization, as bandwidth requirements scale exponentially with the number of participants. Some of the initial scaling tests documented in the following tables were performed prior to our determination that the Apple iSight camera performed significantly better in reducing the CPU overhead involved in video acquisition. Our initial tests were done using the Creative NX Ultra camera with relatively naïve video settings with marginally acceptable results. Significantly better results were obtained in tests conducted with the Apple iSight camera incorporating refinements in the video configuration learned through testing. Our efforts in determining optimal configurations for scaling video conferences to 10 participants are described below. All tests were conducted with the test machine publishing its own stream, and subscribing to and displaying n (varying between 1 and 10) streams with identical video settings being broadcast from a second participant machine through Flash Communication Server. This effectively simulates the load of a participant machine in a conference with n + 1 participants where the participant machine is not monitoring a loopback stream. A second participant machine was used to provide the streams to be subscribed on the test machine as this allowed us to focus the second machine’s camera (Logitech QuickCam Orbit) on ambient street traffic outside our facility. With large numbers of video feeds, it was significantly easier to assess frame skipping when imaging steadily moving vehicles rather than facial movements. Audio data was collected and published by both machines from ambient sound in the room. Our initial test (Test 1) was conducted with the configuration shown in Table 17, chosen to sacrifice video quality momentarily if necessary to contain bandwidth usage to reasonable limits. 34 Copyright 2005, Architekture.com, All Rights Reserved.