2. software as services. Cloud ser- For multimedia computing in
MULTIMEDIA PROCESSING IN A CLOUD
vice providers rent data-center a cloud, simultaneous bursts of
hardware and software to deliver
IMPOSES GREAT CHALLENGES. multimedia data access, process-
storage and computing services ing, and transmission in the
through the Internet. By using cloud computing, Internet users cloud would create a bottleneck in a general-purpose cloud
can receive services from a cloud as if they were employing a because of stringent multimedia QoS requirements and large
super computer. They can store their data in the cloud instead amounts of users’ simultaneous accesses at the Internet scale.
of on their own devices, making ubiquitous data access possible. In today’s cloud computing, the cloud uses a utilitylike mecha-
They can run their applications on much more powerful cloud- nism to allocate how much computing (e.g., CPU) and storage
computing platforms with software deployed in the cloud, miti- resources one needs, which is very effective for general data ser-
gating the users’ burden of full software installation and vices. However, for multimedia applications, in addition to the
continual upgrade on their local devices. CPU and storage requirements, another very important factor is
With the development of Web 2.0, Internet multimedia is the QoS requirement in terms of bandwidth, delay, and jitter.
emerging as a service. To provide rich media services, Therefore, using a general-purpose cloud in the Internet to deal
multimedia computing has emerged as a noteworthy technolo- with multimedia services may suffer from unacceptable media
gy to generate, edit, process, and search media contents, such QoS or QoE [3]. Mobile devices have limitations in memory,
as images, video, audio, graphics, and so on. For multimedia computing power, and battery life; thus, they have even more
applications and services over the Internet and mobile wireless prominent needs to use a cloud to address the tradeoff between
networks, there are strong demands for cloud computing computation and communication. It is foreseen that cloud com-
because of the significant amount of computation required for puting could become a disruptive technology for mobile appli-
serving millions of Internet or mobile users at the same time. cations and services [4]. More specifically, in mobile media
In this new cloud-based multimedia-computing paradigm, applications and services, because of the power requirement for
users store and process their multimedia application data in multimedia [5] and the time-varying characteristics of the wire-
the cloud in a distributed manner, eliminating full installation less channels, QoS requirements in cloud computing for mobile
of the media application software on the users’ computer or multimedia applications and services become more stringent
device and thus alleviating the burden of multimedia software than those for the Internet cases. In summary, for multimedia
maintenance and upgrade as well as sparing the computation computing in a cloud, the key is how to provide QoS provision-
of user devices and saving the battery of mobile phones. ing and support for multimedia applications and services over
Multimedia processing in a cloud imposes great challenges. the (wired) Internet and mobile Internet.
Several fundamental challenges for multimedia computing in To meet multimedia’s QoS requirements in cloud comput-
the cloud are highlighted as follows. ing for multimedia services over the Internet and mobile wire-
1) Multimedia and service heterogeneity: As there exist dif- less networks, we introduce the principal concepts of
ferent types of multimedia and services, such as voice over multimedia cloud computing for multimedia computing and
IP (VoIP), video conferencing, photo sharing and editing, communications, shown in Figure 1. Specifically, we propose
multimedia streaming, image search, image-based render- a multimedia-cloud-computing framework that leverages
ing, video transcoding and adaptation, and multimedia con- cloud computing to provide multimedia applications and ser-
tent delivery, the cloud shall support different types of vices over the Internet and mobile Internet with QoS provi-
multimedia and multimedia services for millions of users sioning. We address multimedia cloud computing from
simultaneously. multimedia-aware cloud (media cloud) and cloud-aware mul-
2) QoS heterogeneity: As different multimedia services have timedia (cloud media) perspectives. A multimedia-aware cloud
different QoS requirements, the cloud shall provide QoS pro- focuses on how the cloud can provide QoS provisioning for
visioning and support for various types of multimedia servic- multimedia applications and services. Cloud-aware multime-
es to meet different multimedia QoS requirements. dia focuses on how multimedia can perform its content stor-
3) Network heterogeneity: As different networks, such as age, processing, adaptation, rendering, and so on, in the cloud
Internet, wireless local area network (LAN), and third genera- to best utilize cloud-computing resources, resulting in high
tion wireless network, have different network characteristics, QoE for multimedia services. Figure 2 depicts the relationship
such as bandwidth, delay, and jitter, the cloud shall adapt of the media cloud and cloud media services. More specifically,
multimedia contents for optimal delivery to various types of the media cloud provides raw resources, such as hard disk,
devices with different network bandwidths and latencies. CPU, and GPU, rented by the media service providers (MSPs)
4) Device heterogeneity: As different types of devices, such to serve users. MSPs use media cloud resources to develop
as TVs, personal computers (PCs), and mobile phones, have their multimedia applications and services, e.g., storage, edit-
different capabilities for multimedia processing, the cloud ing, streaming, and delivery.
shall have multimedia adaptation capability to fit different On the media cloud, we propose an MEC architecture to
types of devices, including CPU, GPU, display, memory, reduce latency, in which media contents and processing are
storage, and power. pushed to the edge of the cloud based on user’s context or
IEEE SIGNAL PROCESSING MAGAZINE [60] MAY 2011
3. Audio Video Image
Media Cloud
Storage
CPU GPU
Clients
[FIG1] Fundamental concept of multimedia cloud computing.
profile. In this architecture, an MEC is a cloudlet with data multimedia computing over grids addresses infrastructure
centers physically placed at the edge. The MEC stores, process- computing for multimedia from a high-performance comput-
es, and transmits media data at the edge, thus achieving a ing (HPC) aspect [7]. The CDN addresses how to deliver multi-
shorter delay. The media cloud is composed of MECs, which media at the edge so as to reduce the delivery latency or
can be managed in a centralized or peer-to-peer (P2P) manner. maximize the bandwidth for the clients to access the data.
First, to better handle various types of media services in an Examples include Akamai Technologies, Amazon CloudFront,
MEC, we propose to place similar types of media services into and Limelight Networks. YouTube uses Akamai’s CDN to deliv-
a cluster of servers based on the properties of media services. er videos. Server-based multimedia computing addresses desk-
Specifically, we propose to use the distributed hash table (DHT top computing, in which all multimedia computing is done in
[6]) for data storage while using CPU or GPU clusters for mul- a set of servers, and the client interacts only with the servers
timedia computing. Second, for computing efficiency in the [8]. Examples include Microsoft Remote Display Protocol and
MEC, we propose a distributed parallel processing model for AT&T Virtual Network Computing. P2P multimedia comput-
multimedia applications and services in GPU or CPU clusters. ing refers to a distributed application architecture that parti-
Third, at the proxy/edge server of the MEC, we propose media tions multimedia-computing tasks or workloads between
adaptation/transcoding for media services to heterogeneous peers. Examples include Skype, PPlive, and Coolstream. The
devices to achieve high QoE. media cloud presented in this article addresses how the cloud
On cloud media, media applications and services in the can provide QoS provisioning for multimedia computing in a
cloud can be conducted either completely or partially in the cloud environment.
cloud. In the former case, the cloud will
do all the multimedia computing, e.g., for
the case of thin-client mobile phones. In
Cloud Media
the latter case, the key problem is how to Authoring/Editing Sharing/Streaming
allocate multimedia-computing (e.g., Service Service
CPU and GPU) resources between the cli-
Storage Service Delivery Service
ents and cloud, which will involve client– MSPs
cloud resource partition for multimedia
computing. In this article, we present
how multimedia applications, such as
Media Cloud
processing, adaptation, rendering, and so
on, can optimally utilize cloud-comput- Hard Disk Resource Allocator
ing resources to achieve high QoE.
CPU
RELATED WORKS Load Balancer
GPU
Multimedia cloud computing is general-
ly related to multimedia computing over
grids, content delivery network (CDN),
server-based computing, and P2P multi-
media computing. More specifically, [FIG2] The relationship of the media cloud and cloud media services.
IEEE SIGNAL PROCESSING MAGAZINE [61] MAY 2011
4. To our knowledge, there compared to all multimedia
MULTIMEDIA CLOUD COMPUTING
exist very few works on multi- contents that are located at the
media cloud computing in the
IS GENERALLY RELATED TO central cloud. As depicted in
literature. IBM had an initiative MULTIMEDIA COMPUTING OVER Figure 3(a) and (b), respective-
for cloud computing (http:// GRIDS, CONTENT DELIVERY NETWORK, ly, an MEC has two types of
w w w. i b m . c o m / i b m / c l o u d / SERVER-BASED COMPUTING, AND P2P architectures: one is where all
resources.html#3). Trajkovska MULTIMEDIA COMPUTING. users’ media data are stored in
et al. proposed a joint P2P and MECs based on their user pro-
cloud-computing architecture file or context, while all the
realization for multimedia streaming with QoS cost information of the associated users and content locations is
functions [9]. communicated by its head through P2P; the other one is
where the central administrator (master) maintains all the
MULTIMEDIA-AWARE CLOUD information of the associated users and content locations,
The media cloud needs to have the following functions: 1) QoS while the MEC distributedly holds all the content data.
provisioning and support for various types of multimedia servic- Within an MEC, we adopt P2P technology for distributed
es with different QoS requirements, 2) distributed parallel mul- media data storage and computing. With the P2P architec-
timedia processing, and 3) multimedia QoS adaptation to fit ture, each node is equally important and, thus, the MEC is of
various types of devices and network bandwidth. In this section, high scalability, availability, and robustness for media data
we first present the architecture of the media cloud. Then we storage and media computing. To support mobile users, we
discuss the distributed parallel multimedia processing in the propose a cloud proxy that resides at the edge of an MEC or
media cloud and how the cloud can provide QoS support for in the gateway, as depicted in Figure 3(a) and (b), to perform
multimedia applications and services. multimedia processing (e.g., adaptation and transcoding) and
caching to compensate for mobile devices’ limitations on
MEDIA-CLOUD-COMPUTING computational power and battery life.
ARCHITECTURE
In this section, we present an MEC-computing architecture DISTRIBUTED PARALLEL
that aims at handling multimedia computing from a QoS per- MULTIMEDIA PROCESSING
spective. An MEC is a cloudlet with servers physically placed Traditionally, multimedia processing is conducted on client or
at the edge of a cloud to provide media services with high proprietary servers. With multimedia cloud computing, multi-
QoS (e.g., low latency) to users. Note that an MEC architec- media processing is usually on the third party’s cloud data cen-
ture is like the CDN edge servers architecture, with the differ- ters (unless one likes building a private cloud, which is costly).
ence being that CDN is for multimedia delivery, while MEC is With multimedia processing moved to the cloud, one of the
for multimedia computing. Using CDN edge servers to deliver key challenges of multimedia cloud computing is how the
multimedia to the end users can result in less latency than cloud can provide distributed parallel processing of multime-
direct delivery from the original servers at the center. Thus, it dia data for millions of users including mobile users. To
can be seen that multimedia computing in an MEC can pro- address this problem, multimedia storage and computing need
duce less multimedia traffic and reduce latency when to be executed in a distributed and parallel manner. In the
MEC MEC MEC MEC
Head Head
Virtual Center Cloud Master
MEC
Head Head
MEC MEC MEC
MEC
HeadMEC
Proxy Proxy
(a) (b)
[FIG3] Architecture of (a) P2P-based MEC computing and (b) central-controlled MEC computing.
IEEE SIGNAL PROCESSING MAGAZINE [62] MAY 2011
5. MEC, we propose to use DHT for media data storage in the
storage cluster and employ multimedia processing program Storage Cluster
parallel execution model with a media load balancer for media
computing in CPU or GPU clusters. As illustrated in Figure Data
4(a), for media storage, we assign unique keys associated with MSP
data to the data tracker, which manages how media data will
Tracker
be distributed in the storage cluster. For media computing, we CPU Cluster
use the program tracker or the so-called load balancer to Clients Program
schedule media tasks distributedly, and then the media tasks
are executed in CPU or GPU clusters in the MEC. Moreover, we Tracker
use parallel distributed multimedia processing for large-scale
multimedia program execution in an MEC, as illustrated in
Figure 4(b). Specifically, first, we can perform media load bal-
ancing at the users’ level [10]. Second, we can do parallel
media processing at the multimedia task level. In other words,
our proposed approach not only brings distributed load balanc-
GPU Cluster
ing at users’ levels but also performs multimedia task parallel-
ization at the multimedia task levels. This is demonstrated in (a)
the following case study.
…..
CASE STUDY
To demonstrate what we discussed earlier, we will use
Photosynth as an example to illustrate how multimedia par-
allel processing works and how it outperforms the traditional
approach. Photosynth (http://www.photosynth.net/) is a soft-
ware application that analyzes digital photographs and Task Manager
generates a three-dimensional (3-D) model of the photos ….. …..
[11], [12]. Users are able to generate their own models using
the software on the client side and then upload them to the
PhotoSynth Web site. Currently, computing Photosynth im-
ages is done in a local PC. The major computation tasks of
Photosynth are image conversion, feature extraction, image
matching, and reconstruction. In the traditional approach,
the aforementioned four tasks are performed sequentially in …..
a local client. In the cloud-computing environment, we pro- (b)
pose that the four computation tasks of Photosynth be con-
ducted in an MEC. This is particularly important for mobile [FIG4] (a) Data and program trackers for multimedia service.
devices because of their low computation capability and (b) Distributed parallel multimedia processing.
battery constraints. To reduce the com-
putation time when dealing with a large
number of users, we propose cloud-
based parallel synthing with a load bal-
ancer in which the PhotoSynth Image Conversion Phase 1
algorithms are run in a parallel pipeline
in an MEC. Our proposed parallel synth- Feature Extraction Phase 2
ing consists of user- and task-level par-
Server 1 Image Matching Phase 3
allelization. F igu re 5 shows the
user-level parallel synthing in which all Photo Set
Reconstruction Phase 4
tasks of synthing from one user are allo-
Load Server 2
cated to one server to compute, but the
Balancer
tasks from all users can be done simul-
taneously in parallel in the MEC. Figure
6 shows the task-level parallel synthing Server N
in which all tasks of synthing from one
user are allocated to N servers to compute [FIG5] User-level parallel synthing in an MEC.
IEEE SIGNAL PROCESSING MAGAZINE [63] MAY 2011
6. Image Feature Image Reconstruction
Conversion Extraction Matching
Server 1
Images Image Feature Image
Conversion Extraction Matching
Load Server 2
Balancer
Image Feature Image
Conversion Extraction Matching
Server N
Time
Phase 1 Phase 2 Phase 3 Phase 4
Task 1 Task 2 Task 3 Task 4
[FIG6] Task-level parallel synthing in the MEC.
in parallel. More specifically, the tasks of image conversion, 400 images, respectively. From Tables 1 and 2, we can see that
feature extraction, and images matching are computed in N for the two-server case we can achieve a 1.65 times computa-
servers. Figure 6 shows that, theoretically, when the commu- tional gain over the traditional sequential approach, and for the
nication overhead is omitted, image conversion, feature ex- nine-server case, we can achieve a 4.25 times computational
traction, and image matching can save N times less gain over the traditional approach.
computation time using task-level parallelization with N
servers than that with the one-server case. MEDIA CLOUD QOS
We performed a preliminary simulation in which we ran the
Another key challenge in the media cloud/MEC is QoS. There
parallel photosynthing code in an HPC cluster node with nine
are two ways of providing QoS provisioning for multimedia:
servers. Tables 1 and 2 list the simulation results for 200 and
one is to add QoS to the current cloud-computing infrastruc-
ture within the cloud and the other is
to add QoS middleware between the
[TABLE 1] SIMULATION RESULTS OF 200 IMAGES PARALLEL SYNTHING cloud infrastructure and multimedia
IN AN HPC CLUSTER.
applications. In the former case, it
ONE SERVER TWO SERVERS NINE SERVERS focuses on the cloud infrastructure
QoS, providing QoS pro visioning in
TASKS TIME (MS) TIME (MS) GAIN TIME (MS) GAIN
the cloud infrastructure to support
IMAGE CONVERSION 65,347.25 41,089.77 1.59 15,676.95 4.17
FEATURE EXTRACTION 34,864.50 18,140.31 1.92 7,414.13 4.70 multimedia applications and services
IMAGE MATCHING 47,812.50 30,103.78 1.59 11,510.03 4.15 with different media QoS require-
TOTAL TIME/GAIN 148,024.25 89,333.86 1.66 34,601.11 4.28
ments. In the latter case, it focuses on
improving cloud QoS in the middle
layers, such as QoS in the transport
[TABLE 2] SIMULATION RESULTS OF 400 IMAGES PARALLEL SYNTHING
IN AN HPC CLUSTER.
layer and QoS mapping between the
cloud infrastructure and media appli-
ONE SERVER TWO SERVERS NINE SERVERS cations (e.g., overlay).
TASKS TIME (MS) TIME (MS) GAIN TIME (MS) GAIN
The cloud infrastructure QoS is a
IMAGE CONVERSION 150,509.30 91,233.67 1.65 32,702.28 4.60
new area where more research is
FEATURE EXTRACTION 101,437.00 51,576.00 1.97 20,391.11 4.97 needed to provide stringent QoS pro-
IMAGE MATCHING 207,696.33 135,965.17 1.53 55,194.38 3.76 visioning for multimedia applica-
TOTAL TIME/GAIN 459,642.63 278,774.84 1.65 108,287.77 4.24
tions and services in the cloud. In
IEEE SIGNAL PROCESSING MAGAZINE [64] MAY 2011
7. this article, we focus on how a shared through the Internet.
THE MEDIA CLOUD PROXY IS
cloud can provide QoS support The huge success of YouTube
for multimedia applications
DESIGNED TO DEAL WITH MOBILE demonstrates the popularity of
and services. Specifically, in an MULTIMEDIA COMPUTING AND Internet media.
MEC, according to the proper- CACHING FOR MOBILE PHONES. Before the cloud-computing
ties of media services, we orga- era, media storage, processing,
nize similar types of media services into a cluster of servers and dissemination services were provided by different service
that has the best capability to process them. An MEC con- providers with their proprietary server farms. Now, various ser-
sists of three clusters: storage, CPU, and GPU clusters. For vice providers have a choice to be users of public clouds. The
example, media applications related to graphics can go to “pay-as-you-go” model of a public cloud would greatly facilitate
the GPU cluster, normal media processing can go to the small businesses and multimedia fanciers. For small businesses,
CPU cluster, and storage types of media applications can go they pay just for the computing and storage they have used,
to the storage cluster. As a result, an MEC can provide QoS rather than maintaining a large set of servers only for peak
support for different types of media with different QoS loads. For individuals, cloud utility can provide a potentially
requirements. unlimited storage space and is more convenient to use than
To improve multimedia QoS performance in a media cloud, buying hard disks.
in addition to moving media content and computation to the In the following, we will present storage and sharing (stor-
MEC to reduce latency and to perform content adaptation to age and dissemination), authoring and mashup (storage and
heterogeneous devices, a media cloud proxy is proposed in our processing), adaptation and delivery (processing and dissemina-
architecture to further reduce latency and best serve different tion), and media rendering, respectively.
types of devices with adaptation especially for mobile devices.
The media cloud proxy is designed to deal with mobile multi- STORAGE AND SHARING
media computing and caching for mobile phones. As a mobile Cloud storage has the advantage of being “always-on” so that
phone has a limited battery life and computation power, the users can access their files from any device and can share their
media cloud proxy is used to perform mobile multimedia com- files with friends who may access the content at an arbitrary
puting in full or part to compensate for the mobile phone’s time. It is also an important feature that cloud storage pro-
limitations mentioned above, including QoS adaptation to dif- vides a much higher level of reliability than local storage.
ferent types of terminals. In addition, the media cloud proxy Cloud storage service can be categorized into consumer- and
can also provide a media cache for the mobile phone. Future developer-oriented services. Within the category of consumer-
research directions include cloud infrastructure QoS and oriented cloud storage services, some cloud providers deploy
cloud QoS overlay for multimedia services, dynamic multime- their own server farm, while some others operate based on
dia load balancing, and so on. user-contributed physical storage. IOmega (www.iomega.com)
is a consumer-oriented cloud storage service, which holds the
CLOUD-AWARE MULTIMEDIA storage service on its own servers and, thus, offers a for-pay
APPLICATIONS only service. AllMyData (www.amdwebhost.com) trades 1 GB
The emergence of cloud computing will have a profound impact of hosted space for 10 GB of donated space from users. The
on the entire life cycle of multimedia contents. As shown in major investment is software or services, such as data encrypt-
Figure 7, a typical media life cycle is composed of acquisition, ing, fragmentation and distribution, and backup. Amazon S3
storage, processing, dissemination, and presentation. (https://s3.amazonaws.com/) and Openomy (http://www.openo-
For a long time, high-quality media contents could only my.com/) are developer-oriented cloud storage services.
be acquired by professional organizations with efficient Amazon S3 goes with the typical cloud provisioning “pay only
devices, and the distribution of media
contents relied on hard copies, such as
film, video compact disc (VCD), and
DVD. During the recent decade, the
availability of low-cost commodity digi-
tal cameras and camcorders has sparked Storage
an explosion of user-generated media
Acquisition Dissemination Presentation
contents. Most recently, cyber-physical
systems [13] offer a new way of data Processing
acquisition through sensor networks,
which significantly increases the vol-
ume and diversity of media data. Riding
the Web 2.0 wave, digital media con-
tents can now be easily distributed or [FIG7] A typical media life cycle.
IEEE SIGNAL PROCESSING MAGAZINE [65] MAY 2011
8. for what you use.” There is no minimum fee and startup cost. bandwidth demand of multimedia contents calls for dynamic
They charge for both storage and bandwidth per gigabyte. In load balancing algorithms for cloud-based streaming.
Openomy, the files are organized purely by tags. A publicly
callable application programming interface (API) and strong AUTHORING AND MASHUP
focus on tags rather than the classic file system hierarchy is Multimedia authoring is the process of editing segments of
the differentiating feature of Openomy. The general-purpose multimedia contents, while mashup deals with combining
cloud providers, such as Microsoft Azure (http://www.micro- multiple segments from different multimedia sources. To
soft.com/windowsazure/), also allow developers to build stor- date, authoring and mashup tools are roughly classified into
age services on top of them. For example, NeoGeo Company two categories: one is offline tools, such as Adobe Premiere
builds its digital asset management software neoMediaCenter. and Windows Movie Maker, and the other is online services,
NET based on Microsoft Azure. such as Jaycut (http://www.jaycut.com). The former provides
Sharing is an integral part of cloud service. The request of more editing functions, but the client usually needs editing
easy sharing is the main reason the multimedia contents occu- software maintenance. The latter provides fewer functions,
py a large portion of cloud storage space. Conventionally, mul- but the client need not bother about its software maintenance.
timedia sharing happens only when the person who shares the Authoring and mashup are generally time consuming and
contents and the person who is shared with are both online multimedia contents occupy large amount of storages. A
and have a high-data-rate con- cloud can make online author-
nection. Cloud computing is ing and mashup very effective,
THE ABILITY TO PERFORM ONLINE
now turning this synchronous providing more functions to cli-
MEDIA COMPUTATION IS A MAJOR
process into an asynchronous ents, since it has powerful com-
one and is making one-to-many
DIFFERENTIATING CHARACTERISTIC putation and storage resources
sharing more efficient. The per- OF MEDIA CLOUD FROM that are widely distributed geo-
son who shares simply uploads TRADITIONAL CDNS. graphically. Moreover, cloud-
the contents to the cloud stor- based multimedia authoring
age at his or her convenience and then sends a hyperlink to the and mashup can avoid preinstallation of editing software in
persons being shared with. The latter can then access the con- clients. In this article, we present a cloud-based online multi-
tents whenever they like, since the cloud is always on. Sharing media authoring and mashup framework. In this framework,
through a cloud also increases media QoS because cloud–client users will conduct editing and mashup in the media cloud.
connections almost always provide a higher bandwidth and One of the key challenges in cloud-based authoring and
shorter delay than client–client connections, not to mention mashup is the computing and communication costs in pro-
the firewall and network address translation traversal problems cessing multiple segments from single source or multiple
commonly encountered in client–client communications. The sources. To address this challenge, inspired by the idea of
complexities of cloud-based sharing mainly reside in naming, [15], we present an extensible markup language (XML)-based
addressing, and access control. representation file format for cloud-based media authoring
Instantaneous music and video sharing can be achieved via and mashup. As illustrated in Figure 8, this is not a multime-
streaming. Compared to the conventional streaming services dia data stream but a description file, indicating the organiza-
operating through proprietary server farms of streaming service tion of different multimedia contents. The file can be logically
providers, cloud-based streaming can potentially achieve much regarded as a multilayer container. The layers can be entity
a lower latency and provide much a higher bandwidth. This is layers, such as video, audio, graphic, and transition and effect
because cloud providers own a large number of servers deployed layers. Each segment of a layer is represented as a link to the
in wide geographical areas. For example, streamload targets to original one, which maintains associated data in the case of
those who are interested in rich media streaming and hosting. being deleted or moved, as well as some more descriptions.
They offer unlimited storage for paying users, and the charging The transmission and effects are either a link with parame-
plan is based on the downloading bandwidth. In this article, we ters or a description considering personalized demands. Since
propose a cloud-based streaming approach. In the architecture a media cloud holds large original multimedia contents and
of cloud-based streaming, compared with the architecture of frequently used transmission and effect templates [16], it will
video streaming in [14], the media cloud/MEC has a distributed be beneficial to use the link-based presentation file. Thus, the
media storage module for storing the compressed video and process of authoring or mashup is to edit the presentation
audio streams for millions of users and a QoS adaptation mod- file, by which the computing load on the cloud side will be
ule for different types of devices, such as mobile phones, PCs, significantly reduced. In our approach, we will select an MEC
and TVs. While our media cloud/MEC-based streaming architec- to serve authoring or mashup service to all heterogeneous
ture presents a possible solution to cloud streaming QoS provi- clients including mobile phone users. By leveraging edge
sioning, there are still many research issues to be tackled. servers’ assistance in the MEC with proxy to mobile phone, it
Considering the cloud network properties, a new cloud trans- allows mobile editing to achieve good QoE. Future research
port protocol may need to be developed. In addition, the high on cloud-based multimedia authoring and mashup needs to
IEEE SIGNAL PROCESSING MAGAZINE [66] MAY 2011
9. Media Cloud
Video
Representation File
Videom Videon Video Layer
Audio Audiom Audion Audio Layer
Effectm Transition Effectn Transition and
Effect Layer
Transition/Effect
[FIG8] Cloud-based multimedia authoring and mashup.
tackle distributed storage and processing in the cloud, online cloud shall take charge of collecting customized parameters,
previewing on the client, especially for mobile phones and such as screen size, bandwidth, and generating various ver-
slate devices. sions according to their parameters either offline or on the fly.
Note that the former needs more storage, while the latter
ADAPTATION AND DELIVERY needs dynamic video adaptation upon delivery. The ability to
As there exist various types of terminals, such as PCs, mobile perform online media computation is a major differentiating
phones, and TVs, and heterogeneous networks, such as characteristic of media cloud from traditional CDNs. We
Ethernet, WLAN, and cellular networks, how to effectively would like to point out that the processing capability at
deliver multimedia contents to heterogeneous devices via one media-edge servers makes it possible for service providers to
cloud is becoming very important and challenging. Video pay more attention to QoE under dynamic network conditions
adaptation [17], [18] plays an important role in multimedia than just to some predefined QoS metrics. In the presented
delivery. It transforms input video(s) into an output video in a framework, adaptation for single-layer and multilayer video
form that meets the user’s needs. In general, video adaptation will be performed differently. If the video is of a single layer,
needs a large amount of computing and is difficult to perform video adaptation needs to adjust bit rate, frame rate, and reso-
especially when there are a vast number of consumers lution to meet different types of terminals. For scalable video
requesting service simultaneously. Although offline transcod- coding, a cloud can generate various forms of videos by trun-
ing from one video into multiple versions for different condi- cating its scalable layers according to the clients’ network
tions works well sometimes, it needs larger storage. Besides, bandwidth. How to perform video adaptation on the fly can be
offline transcoding is unable to serve live video such as one of the future research topics.
Internet protocol television (IPTV) that
needs to run in real time.
Because of the strong computing and
storage power of the cloud, both offline Bit Rate, Frame Rate, and
Resolution
and online media adaptation to different
types of terminals can be conducted in a Media Cloud
cloud. Cloudcoder is a good example of a Live Videos Nonlive Videos
cloud-based video adaptation service that Coding Standards
was built on the Microsoft Azure plat-
form [19]. The cloudcoder is integrated
Video Adaptation
into the origin digital central manage-
ment platform while offloading much of
the processing to the cloud. The number
of transcoder instances automatically
scales to handle the increased or
decreased volume. In this article, we Clients
present a framework of cloud-based
video adaptation for delivery, as illustrat-
ed in Figure 9. Video adaption in a media [FIG9] Cloud-based video adaptation and transcoding.
IEEE SIGNAL PROCESSING MAGAZINE [67] MAY 2011
10. an intermediate stream for further client rendering, according
to the client’s rendering capability. More specifically, an MEC
with a proxy can serve mobile clients with high QoE since
Media Cloud
rending (e.g., view interpolation) can be done in an MEC
proxy. Research challenges and opportunities include how to
Fully Render Partially Render efficiently and dynamically allocate the rendering resources
between the client and cloud. One of the future research
directions is to study how an MEC proxy can assist mobile
Resource
Allocation/Partition phones on rendering computation, since the mobile phones
have limitations in battery life, memory size, and computa-
Partially Render tion capability.
Multimedia retrieval, such as content-based image
Display Display
retrieval (CBIR), is a good application example of cloud com-
Clients puting as well. In the following, we will present how CBIR
can leverage cloud computing. CBIR [23] is used to search
digital images in a large database based on image content
[FIG10] Cloud-based multimedia rendering.
other than metadata and text annotation. The research topics
in CBIR include feature extraction, similarity measurement,
MEDIA RENDERING and relevance feedback. There are two key challenges in
Traditionally, multimedia rendering is conducted at the client CBIR: how to improve the search quality and how to reduce
side, and the client is usually capable of doing rendering tasks, computation complexity (or computation time). As there
such as geometry modeling, texture mapping, and so on. In exists a semantic gap between the low-level underlying visual
some cases, however, the clients lack the ability required to ren- features and semantics, it is difficult to get high search quali-
der the multimedia. For instance, a free viewpoint video [20], ty. Because the Internet image database is becoming larger,
which allows users to interactively change the viewpoint in any searching in such a database is becoming computationally
3-D position or within a certain range, is difficult to be rendered intensive. However, in general, there is a tradeoff between
on a mobile phone. This is because the wireless bandwidth is quality and complexity. Usually, a higher quality is achieved
quite limited, and the mobile phone has limited computing at the price of a higher complexity and vice versa. Leveraging
capability, memory size, and battery life. Shu et al. [21] pro- the strong computing capability of a media cloud, one can
posed a rendering proxy to perform rendering for the mobile achieve a higher quality with acceptable computation time,
phone. For example, Web browsing is an essential mechanism resulting in better performance from the client’s perspective.
to access the information on the World Wide Web. However, it is
relatively difficult for mobile phones to render Web pages, espe- SUMMARY
cially the asynchronous JavaScript and XML (AJAX) Web page. This article presented the fundamental concept and a frame-
Lehtonen et al. [22] proposed a proxy-based architecture for work of multimedia cloud computing. We addressed multime-
mobile Web browsing. When the client sends a Web request, the dia cloud computing from multimedia-aware cloud and
proxy fetches the remote Web page, renders it, and responds cloud-aware multimedia perspectives. On the multimedia-aware
with a package containing a page description, which includes a cloud, we presented how a cloud can provide QoS support, dis-
page miniature image, the content, and coordinates of the most tributed parallel processing, storage, and load balancing for var-
important elements on the page. In essence, in both examples, a ious multimedia applications and services. Specifically, we
resource-allocation strategy is needed such that a portion of proposed an MEC-computing architecture that can achieve high
rendering task can be shifted from the client to server, thus cloud QoS support for various multimedia services. On cloud-
eliminating computing burden of a thin client. aware multimedia, we addressed how multimedia services and
Rendering on mobile phones or computationally con- applications, such as storage and sharing, authoring and mash-
strained devices imposes great challenges owing to a limited up, adaptation and delivery, and rendering and retrieval, can
battery life and computing power as well as narrow wireless optimally utilize cloud-computing resources. The research
bandwidth. The cloud equipped with GPU can perform render- directions and problems of multimedia cloud computing were
ing due to its strong computing capability. Considering the discussed accordingly.
tradeoff between computing and communication, there are In this article, we presented some thoughts on multimedia
two types of cloud-based rendering. One is to conduct all the cloud computing and our preliminary research in this area. The
rendering in the cloud, and the other is to conduct only com- research in multimedia cloud computing is still in its infancy,
putational intensive part of the rendering in the cloud, while and many problems in this area remain open. For example,
the rest will be performed on the client. In this article, we media cloud QoS addressed in this article still needs more
present cloud-based media rendering. As illustrated in Figure investigations. Some other open research topics on multimedia
10, the media cloud can do full or partial rendering, generating cloud computing include media cloud transport protocol, media
IEEE SIGNAL PROCESSING MAGAZINE [68] MAY 2011
11. cloud overlay network, media cloud security, P2P cloud for mul- Corporation as a member of technical staff. He has authored and
timedia services, and so on. coauthored five books or book chapters and more than 200 journal
and conference papers as well as holds more than 90 patents. His
ACKNOWLEDGMENTS research interests include multimedia processing, retrieval, cod-
We thank the anonymous reviewers for their constructive com- ing, streaming, and mobility.
ments that greatly improved the article. We acknowledge
Lijing Qin and Lie Liu from Tsinghua University and Zheng Li REFERENCES
[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G.
from the University of Science and Technology of China Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. (2009, Feb. 10). Above
(USTC) for their contributions to the section “Multimedia- the clouds: A Berkeley view of cloud computing. EECS Dept., Univ. Califor-
nia, Berkeley, No. UCB/EECS-2009-28 [Online]. Available: http://radlab.
Aware Cloud” when they were interns at Microsoft Research cs.berkeley.edu/
Asia. We also thank Jun Liao from Tsinghua University and [2] R. Buyya, C. S. Yeo, and S. Venugopal, “Market-oriented cloud computing:
Vision, hype, and reality for delivering it services as computing utilities,” in Proc.
Siyuan Tang from the USTC for their contributions to cloud- 10th IEEE Int. Conf. High Performance Computing and Communications, 2008,
based parallel photosynthing case study when they were pp. 5–13.
[3] K. Kilkki, “Quality of experience in communications ecosystem,” J. Universal
interns at Microsoft Research Asia. We also acknowledge Prof. Comput. Sci., vol. 14, no. 5, pp. 615–624, 2008.
Yifeng He from Ryerson University for proofreading the article [4] ABI Research. (2009, July). Mobile cloud computing [Online]. Available:
and Dr. Jingdong Wang from Microsoft Research Asia for proof- http://www.abiresearch.com/research/1003385-Mobile+Cloud+Computing
reading of CBIR. [5] Q. Zhang, Z. Ji, W. Zhu, and Y.-Q. Zhang, “Power-minimized bit allocation for
video communication over wireless channels,” IEEE Trans. Circuits Syst. Video
Technol., vol. 12, no. 6, pp. 398–410, June 2002.
AUTHORS [6] P. Maymounkov and D. Mazieres, “Kademlia: A peer-to-peer information system
based on the XOR metric,” in Proc. IPTPS02, Cambridge, Mar. 2002, pp. 53–65.
Wenwu Zhu (wenwuzhu@microsoft.com) received his Ph.D.
[7] B. Aljaber, T. Jacobs, K. Nadiminti, and R. Buyya, “Multimedia on global grids:
degree from Polytechnic Institute of New York University in A case study in distributed ray tracing,” Malays. J. Comput. Sci., vol. 20, no. 1,
pp. 1–11, June 2007.
1996. He worked at Bell Labs during 1996–1999. He was with
[8] J. Nieh and S. J. Yang, “Measuring the multimedia performance of server-
Microsoft Research Asia’s Internet Media Group and Wireless based computing,” in Proc. 10th Int. Workshop on Network and Operating Sys-
and Networking Group as research manager from 1999 to 2004. tem Support for Digital Audio and Video, 2000, pp. 55–64.
[9] I. Trajkovska, J. Salvachúa, and A. M. Velasco, “A novel P2P and cloud
He was the director and chief scientist at Intel Communication computing hybrid architecture for multimedia streaming with QoS cost func-
Technology Lab, China. He is a senior researcher at the Internet tions,” in Proc. ACM Multimedia, 2010, pp. 1227–1230.
Media Group at Microsoft Research Asia. He has published more [10] G. Cybenko, “Dynamic load balancing for distributed memory multiproces-
sors,” J. Parallel Distrib. Comput., vol. 7, no. 2, pp. 279–301, 1989.
than 200 refereed papers and filed 40 patents. He is a Fellow of
[11] N. Snavely, S. M. Seitz, and R. Szeliski, “Photo tourism: Exploring photo col-
the IEEE. His research interests include Internet/wireless mul- lections in 3D,” ACM Trans. Graph., vol. 25, no. 3, pp. 835–846, 2006.
timedia and multimedia communications and networking. [12] N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the world from Internet
photo collections,” Int. J. Comput. Vision, vol. 80, no. 2, pp. 189–210, Nov.
Chong Luo (cluo@microsoft.com) received her B.S. degree 2008.
from Fudan University in Shanghai, China, in 2000 and her [13] The National Science Foundation. (2008, Sept. 30). Cyber-physical sys-
tems. Program Announcements and Information. NSF, Arlington, Virginia
M.S. degree from the National University of Singapore in 2002. [Online]. Available: http://www.nsf.gov/publications/pub_summ.jsp?ods_
She is currently pursuing her Ph.D. degree in the Department key=nsf08611
of Electronic Engineering, Shanghai Jiao Tong University, [14] D. Wu, Y. T. Hou, W. Zhu, Y.-Q. Zhang, and J. M. Peha, “Streaming video
over the Internet: Approaches and directions,” IEEE Trans. Circuits Syst. Video
Shanghai, China. She has been with Microsoft Research Asia Technol., vol. 11, no. 3, pp. 282–300, Mar. 2001.
since 2003, where she is a researcher in the Internet Media [15] X.-S. Hua and S. Li., “Personal media sharing and authoring on the web,” in
Proc. ACM Int. Conf. Multimedia, Nov. 2005, pp. 375–378.
group. She is a Member of the IEEE. Her research interests
[16] X.-S. Hua and S. Li, “Interactive video authoring and sharing based on two-
include multimedia communications, wireless sensor networks, layer templates,” in Proc. 1st ACM Int. Workshop on Human-Centered MM 2006
and media cloud computing. (HCM’06), pp. 65–74.
Jianfeng Wang (wjf2006@mail.ustc.edu.cn) received his [17] S. F. Chang and A. Vetro, “Video adaptation: Concepts, technologies, and
open issues,” Proc. IEEE, vol. 93, no. 1, pp. 148–158, 2005.
B.Eng. degree from the Department of Electronic Engineering [18] J. Xin, C-W. Lin, and M-T. Sun, “Digital video transcoding,” Proc. IEEE, vol.
and Information Science in the University of Science and 93, no. 1, pp. 84–94, Jan. 2005.
Technology of China in 2010. He was an intern at Microsoft [19] Origin Digital. (2009, Nov. 17). Video services provider to reduce
transcoding costs up to half [Online]. Available: http://www.microsoft.com/
Research Asia from February to August in 2010. Currently, he is casestudies/Case_Study_Detail.aspx?CaseStudyID=4000005952
a master’s stu dent in MOE-Microsoft Key Laboratory of [20] A. Kubota, A. Smolic, M. Magnor, M. Tanimoto, T. Chen, and C. Zhang, “Mul-
tiview imaging and 3DTV,” IEEE Signal Processing Mag., vol. 24, no. 6, pp. 10–21,
Multimedia Computing and Communication in USTC. His Nov. 2007.
research interests include signal processing, media cloud, and [21] S. Shi, W. J. Jeon, K. Nahrstedt, and R. H. Campbel, “Real-time remote
multimedia information retrieval. rendering of 3D video for mobile devices,” in Proc. ACM Multimedia, 2009, pp.
391–400.
Shipeng Li (spli@microsoft.com) received his Ph.D. degree [22] T. Lehtonen, S. Benamar, V. Laamanen, I. Luoma, O. Ruotsalainen, J. Sa-
from Lehigh University in 1996 and his M.S. and B.S. degrees from lonen, and T. Mikkonen, “Towards user-friendly mobile browsing,” in Proc. 2nd
Int. Workshop on Advanced Architectures and Algorithms for Internet Delivery
the University of Science and Technology of China in 1991 and and Applications (ACM Int. Conf. Proc. Series, vol. 198), 2006, Article 6.
1988, respectively. He has been with Microsoft Research Asia since [23] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-
based image retrieval at the end of the early years,” IEEE Trans. Pattern Anal.
May 1999, where he was a principal researcher and research area Machine Intell., vol. 22, no. 12, pp. 1349–1380, Dec. 2000.
manager. From October 1996 to May 1999, he was with Sarnoff [SP]
IEEE SIGNAL PROCESSING MAGAZINE [69] MAY 2011