Computer Networks 51 (2007) 921–960 www.elsevier.com/locate/comnet A survey on wireless multimedia sensor networks Ian F. Akyildiz *, Tommaso Melodia, Kaushik R. Chowdhury Broadband and Wireless Networking Laboratory, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States Received 11 March 2006; received in revised form 6 August 2006; accepted 5 October 2006 Available online 2 November 2006Abstract The availability of low-cost hardware such as CMOS cameras and microphones has fostered the development of Wire-less Multimedia Sensor Networks (WMSNs), i.e., networks of wirelessly interconnected devices that are able to ubiqui-tously retrieve multimedia content such as video and audio streams, still images, and scalar sensor data from theenvironment. In this paper, the state of the art in algorithms, protocols, and hardware for wireless multimedia sensor net-works is surveyed, and open research issues are discussed in detail. Architectures for WMSNs are explored, along withtheir advantages and drawbacks. Currently oﬀ-the-shelf hardware as well as available research prototypes for WMSNsare listed and classiﬁed. Existing solutions and open research issues at the application, transport, network, link, and phys-ical layers of the communication protocol stack are investigated, along with possible cross-layer synergies andoptimizations.Ó 2006 Elsevier B.V. All rights reserved.Keywords: Wireless sensor networks; Multimedia communications; Distributed smart cameras; Video sensor networks; Energy-awareprotocol design; Cross-layer protocol design; Quality of service1. Introduction vesting information from the physical environment, performing simple processing on the extracted data Wireless sensor networks (WSN)  have drawn and transmitting it to remote locations. Signiﬁcantthe attention of the research community in the last results in this area over the last few years have ush-few years, driven by a wealth of theoretical and ered in a surge of civil and military applications. Aspractical challenges. This growing interest can be of today, most deployed wireless sensor networkslargely attributed to new applications enabled by measure scalar physical phenomena like tempera-large-scale networks of small devices capable of har- ture, pressure, humidity, or location of objects. In general, most of the applications have low band- * width demands, and are usually delay tolerant. Corresponding author. Tel.: +1 404 894 5141; fax: +1 404 894 More recently, the availability of inexpensive7883. E-mail addresses: email@example.com (I.F. Akyildiz), tomma- hardware such as CMOS cameras and firstname.lastname@example.org (T. Melodia), email@example.com that are able to ubiquitously capture multimedia(K.R. Chowdhury). content from the environment has fostered the1389-1286/$ - see front matter Ó 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.comnet.2006.10.002
922 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960development of Wireless Multimedia Sensor Net- to identify the violator, or buﬀer images andworks (WMSNs) [54,90], i.e., networks of wirelessly streams in case of accidents for subsequent acci-interconnected devices that allow retrieving video dent scene analysis.and audio streams, still images, and scalar sensor • Advanced health care delivery. Telemedicine sen-data. With rapid improvements and miniaturization sor networks  can be integrated with 3G mul-in hardware, a single sensor device can be equipped timedia networks to provide ubiquitous healthwith audio and visual information collection mod- care services. Patients will carry medical sensorsules. As an example, the Cyclops image capturing to monitor parameters such as body temperature,and inference module , is designed for extre- blood pressure, pulse oximetry, ECG, breathingmely light-weight imaging and can be interfaced activity. Furthermore, remote medical centerswith a host mote such as Crossbow’s MICA2  will perform advanced remote monitoring ofor MICAz . In addition to the ability to retrieve their patients via video and audio sensors, loca-multimedia data, WMSNs will also be able to store, tion sensors, motion or activity sensors, whichprocess in real-time, correlate and fuse multimedia can also be embedded in wrist devices .data originated from heterogeneous sources. • Automated assistance for the elderly and family Wireless multimedia sensor networks will not monitors. Multimedia sensor networks can beonly enhance existing sensor network applications used to monitor and study the behavior of elderlysuch as tracking, home automation, and environ- people as a means to identify the causes ofmental monitoring, but they will also enable several illnesses that aﬀect them such as dementia .new applications such as: Networks of wearable or video and audio sensors can infer emergency situations and immediately• Multimedia surveillance sensor networks. Wireless connect elderly patients with remote assistance video sensor networks will be composed of inter- services or with relatives. connected, battery-powered miniature video • Environmental monitoring. Several projects on cameras, each packaged with a low-power wire- habitat monitoring that use acoustic and video less transceiver that is capable of processing, feeds are being envisaged, in which information sending, and receiving data. Video and audio has to be conveyed in a time-critical fashion. sensors will be used to enhance and complement For example, arrays of video sensors are already existing surveillance systems against crime and used by oceanographers to determine the evolu- terrorist attacks. Large-scale networks of video tion of sandbars via image processing techniques sensors can extend the ability of law enforcement . agencies to monitor areas, public events, private • Person locator services. Multimedia content such properties and borders. as video streams and still images, along with• Storage of potentially relevant activities. Multime- advanced signal processing techniques, can be dia sensors could infer and record potentially rel- used to locate missing persons, or identify crimi- evant activities (thefts, car accidents, traﬃc nals or terrorists. violations), and make video/audio streams or • Industrial process control. Multimedia content reports available for future query. such as imaging, temperature, or pressure• Traﬃc avoidance, enforcement and control sys- amongst others, may be used for time-critical tems. It will be possible to monitor car traﬃc in industrial process control. Machine vision is the big cities or highways and deploy services that application of computer vision techniques to oﬀer traﬃc routing advice to avoid congestion. industry and manufacturing, where information In addition, smart parking advice systems based can be extracted and analyzed by WMSNs to on WMSNs  will allow monitoring available support a manufacturing process such as those parking spaces and provide drivers with auto- used in semiconductor chips, automobiles, food mated parking advice, thus improving mobility or pharmaceutical products. For example, in in urban areas. Moreover, multimedia sensors quality control of manufacturing processes, may monitor the ﬂow of vehicular traﬃc on details or ﬁnal products are automatically highways and retrieve aggregate information inspected to ﬁnd defects. In addition, machine such as average speed and number of cars. Sen- vision systems can detect the position and orien- sors could also detect violations and transmit tation of parts of the product to be picked up by video streams to law enforcement agencies a robotic arm. The integration of machine vision
I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 923 systems with WMSNs can simplify and add tures such as Diﬀserv and Intserv for Internet QoS ﬂexibility to systems for visual inspections and delivery have been developed. However, there are automated actions that require high-speed, several main peculiarities that make QoS delivery high-magniﬁcation, and continuous operation. of multimedia content in sensor networks an even more challenging, and largely unexplored, task: As observed in , WMSNs will stretch thehorizon of traditional monitoring and surveillance • Resource constraints. Sensor devices are con-systems by: strained in terms of battery, memory, process- ing capability, and achievable data rate .• Enlarging the view. The Field of View (FoV) of a Hence, eﬃcient use of these scarce resources is single ﬁxed camera, or the Field of Regard (FoR) mandatory. of a single moving pan-tilt-zoom (PTZ) camera is • Variable channel capacity. While in wired net- limited. Instead, a distributed system of multiple works the capacity of each link is assumed to cameras and sensors enables perception of the be ﬁxed and pre-determined, in multi-hop wire- environment from multiple disparate viewpoints, less networks, the attainable capacity of each and helps overcoming occlusion eﬀects. wireless link depends on the interference level• Enhancing the view. The redundancy introduced perceived at the receiver. This, in turn, depends by multiple, possibly heterogeneous, overlapped on the interaction of several functionalities that sensors can provide enhanced understanding are distributively handled by all network devices and monitoring of the environment. Overlapped such as power control, routing, and rate policies. cameras can provide diﬀerent views of the same Hence, capacity and delay attainable at each link area or target, while the joint operation of are location dependent, vary continuously, and cameras and audio or infrared sensors can help may be bursty in nature, thus making QoS provi- disambiguate cluttered situations. sioning a challenging task.• Enabling multi-resolution views. Heterogeneous • Cross-layer coupling of functionalities. In multi- media streams with diﬀerent granularity can be hop wireless networks, there is a strict interde- acquired from the same point of view to provide pendence among functions handled at all layers a multi-resolution description of the scene and of the communication stack. Functionalities multiple levels of abstraction. For example, static handled at diﬀerent layers are inherently and medium-resolution camera views can be enriched strictly coupled due to the shared nature of the by views from a zoom camera that provides a wireless communication channel. Hence, the var- high-resolution view of a region of interest. For ious functionalities aimed at QoS provisioning example, such feature could be used to recognize should not be treated separately when eﬃcient people based on their facial characteristics. solutions are sought. • Multimedia in-network processing. Processing of Many of the above applications require the sen- multimedia content has mostly been approachedsor network paradigm to be rethought in view of as a problem isolated from the network-designthe need for mechanisms to deliver multimedia con- problem, with a few exceptions such as jointtent with a certain level of quality of service (QoS). source-channel coding  and channel-adaptiveSince the need to minimize the energy consumption streaming . Hence, research that addressedhas driven most of the research in sensor networks the content delivery aspects has typically not con-so far, mechanisms to eﬃciently deliver application sidered the characteristics of the source contentlevel QoS, and to map these requirements to net- and has primarily studied cross-layer interactionswork layer metrics such as latency and jitter, have among lower layers of the protocol stack. How-not been primary concerns in mainstream research ever, the processing and delivery of multimediaon classical sensor networks. content are not independent and their interaction Conversely, algorithms, protocols and techniques has a major impact on the levels of QoS that canto deliver multimedia content over large-scale net- be delivered. WMSNs will allow performing mul-works have been the focus of intensive research in timedia in-network processing algorithms on thethe last 20 years, especially in ATM wired and wire- raw data. Hence, the QoS required at the applica-less networks. Later, many of the results derived for tion level will be delivered by means of a combi-ATM networks have been readapted, and architec- nation of both cross-layer optimization of the
924 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 communication process, and in-network process- tion and computation with signal processing and ing of raw data streams that describe the phe- several branches of control theory and embedded nomenon of interest from multiple views, with computing. This cross-disciplinary research will diﬀerent media, and on multiple resolutions. enable distributed systems of heterogeneous embed- Hence, it is necessary to develop application- ded devices that sense, interact, and control the independent and self-organizing architectures to physical environment. There are several factors that ﬂexibly perform in-network processing of multi- mainly inﬂuence the design of a WMSN, which are media contents. outlined in this section. Eﬀorts from several research areas will need to • Application-speciﬁc QoS requirements. The wideconverge to develop eﬃcient and ﬂexible WMSNs, variety of applications envisaged on WMSNs willand this in turn, will signiﬁcantly enhance our have diﬀerent requirements. In addition to dataability to interact with the physical environment. delivery modes typical of scalar sensor networks,These include advances in the understanding of multimedia data include snapshot and streamingenergy-constrained wireless communications, and multimedia content. Snapshot-type multimediathe integration of advanced multimedia processing data contain event triggered observations obtainedtechniques in the communication process. Another in a short time period. Streaming multimediacrucial issue is the development of ﬂexible system content is generated over longer time periodsarchitectures and software to allow querying the and requires sustained information delivery.network to specify the required service (thus provid- Hence, a strong foundation is needed in terms ofing abstraction from implementation details). At the hardware and supporting high-level algorithmssame time, it is necessary to provide the service in to deliver QoS and consider application-speciﬁcthe most eﬃcient way, which may be in contrast requirements. These requirements may pertainwith the need for abstraction. to multiple domains and can be expressed, amongst In this paper, we survey the state of the art in others, in terms of a combination of bounds onalgorithms, protocols, and hardware for the devel- energy consumption, delay, reliability, distortion,opment of wireless multimedia sensor networks, or network lifetime.and discuss open research issues in detail. In partic- • High bandwidth demand. Multimedia content,ular, in Section 2 we point out the characteristics of especially video streams, require transmissionwireless multimedia sensor networks, i.e., the major bandwidth that is orders of magnitude higherfactors inﬂuencing their design. In Section 3, we than that supported by currently available sen-suggest possible architectures for WMSNs and sors. For example, the nominal transmission ratedescribe their characterizing features. In Section 4, of state-of-the-art IEEE 802.15.4 compliant com-we discuss and classify existing hardware and proto- ponents such as Crossbow’s  MICAz ortypal implementations for WMSNs, while in Section TelosB  motes is 250 kbit/s. Data rates at least5 we discuss possible advantages and challenges of one order of magnitude higher may be requiredmultimedia in-network processing. In Sections 6– for high-end multimedia sensors, with compara-10 we discuss existing solutions and open research ble power consumption. Hence, high data rateissues at the application, transport, network, link, and low-power consumption transmission tech-and physical layers of the communication stack, niques need to be leveraged. In this respect, therespectively. In Section 11, we discuss cross-layer ultra wide band (UWB) transmission techniquesynergies and possible optimizations, while in seems particularly promising for WMSNs, andSection 12 we discuss additional complementary its applicability is discussed in Section 10.research areas such as actuation, synchronization • Multimedia source coding techniques. Uncom-and security. Finally, in Section 13 we conclude pressed raw video streams require excessivethe paper. bandwidth for a multi-hop wireless environment. For example, a single monochrome frame in2. Factors inﬂuencing the design of multimedia sensor the NTSC-based Quarter Common Intermediatenetworks Format (QCIF, 176 · 120), requires around 21 Kbyte, and at 30 frames per second (fps), a Wireless Multimedia Sensor Networks (WMSNs) video stream requires over 5 Mbit/s. Hence, it iswill be enabled by the convergence of communica- apparent that eﬃcient processing techniques for
I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 925 lossy compression are necessary for multimedia source, i.e., each application processes its desired sensor networks. Traditional video coding tech- sensor feeds on the CPU of the sensor nodes niques used for wireline and wireless communica- where data are gathered. This dramatically tions are based on the idea of reducing the bit reduces the bandwidth consumed, since instead rate generated by the source encoder by exploit- of transferring raw data, IrisNet sends only a ing source statistics. To this aim, encoders rely potentially small amount of processed data. on intra-frame compression techniques to reduce However, the cost of multimedia processing algo- redundancy within one frame, while they leverage rithms may be prohibitive for low-end multime- inter-frame compression (also known as predic- dia sensors. Hence, it is necessary to develop tive encoding or motion estimation) to exploit scalable and energy-eﬃcient distributed ﬁltering redundancy among subsequent frames to reduce architectures to enable processing of redundant the amount of data to be transmitted and stored, data as close as possible to the periphery of the thus achieving good rate-distortion performance. network. Since predictive encoding requires complex • Power consumption. Power consumption is a fun- encoders, powerful processing algorithms, and damental concern in WMSNs, even more than in entails high energy consumption, it may not be traditional wireless sensor networks. In fact, sen- suited for low-cost multimedia sensors. However, sors are battery-constrained devices, while multi- it has recently been shown  that the tradi- media applications produce high volumes of tional balance of complex encoder and simple data, which require high transmission rates, and decoder can be reversed within the framework extensive processing. While the energy consump- of the so-called distributed source coding, which tion of traditional sensor nodes is known to be exploits the source statistics at the decoder, and dominated by the communication functionalities, by shifting the complexity at this end, allows this may not necessarily be true in WMSNs. the use of simple encoders. Clearly, such algo- Therefore, protocols, algorithms and architec- rithms are very promising for WMSNs and espe- tures to maximize the network lifetime while pro- cially for networks of video sensors, where it may viding the QoS required by the application are a not be feasible to use existing video encoders at critical issue. the source node due to processing and energy • Flexible architecture to support heterogeneous constraints. applications. WMSN architectures will support• Multimedia in-network processing. WMSNs allow several heterogeneous and independent applica- performing multimedia in-network processing tions with diﬀerent requirements. It is necessary algorithms on the raw data extracted from the to develop ﬂexible, hierarchical architectures that environment. This requires new architectures can accommodate the requirements of all these for collaborative, distributed, and resource-con- applications in the same infrastructure. strained processing that allow for ﬁltering and • Multimedia coverage. Some multimedia sensors, extraction of semantically relevant information in particular video sensors, have larger sensing at the edge of the sensor network. This may radii and are sensitive to direction of acquisition increase the system scalability by reducing the (directivity). Furthermore, video sensors can cap- transmission of redundant information, merging ture images only when there is unobstructed line data originated from multiple views, on diﬀerent of sight between the event and the sensor. Hence, media, and with multiple resolutions. For exam- coverage models developed for traditional wire- ple, in video security applications, information less sensor networks are not suﬃcient for pre- from uninteresting scenes can be compressed to deployment planning of a multimedia sensor a simple scalar value or not be transmitted network. altogether, while in environmental applications, • Integration with Internet (IP) architecture. It is of distributed ﬁltering techniques can create a fundamental importance for the commercial time-elapsed image . Hence, it is necessary development of sensor networks to provide ser- to develop application-independent architectures vices that allow querying the network to retrieve to ﬂexibly perform in-network processing of the useful information from anywhere and at any multimedia content gathered from the environ- time. For this reason, future WMSNs will ment. For example, IrisNet  uses applica- be remotely accessible from the Internet, and tion-speciﬁc ﬁltering of sensor feeds at the will therefore need to be integrated with the IP
926 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 architecture. The characteristics of WSNs rule the growing size of the network. Flat topologies out the possibility of all-IP sensor networks and may not always be suited to handle the amount of recommend the use of application level gateways traﬃc generated by multimedia applications includ- or overlay IP networks as the best approach for ing audio and video. Likewise, the processing power integration between WSNs and the Internet required for data processing and communications, . and the power required to operate it, may not be• Integration with other wireless technologies. available on each node. Large-scale sensor networks may be created by interconnecting local ‘‘islands’’ of sensors through other wireless technologies. This needs 3.1. Reference architecture to be achieved without sacriﬁcing on the eﬃ- ciency of the operation within each individual In Fig. 1, we introduce a reference architecture technology. for WMSNs, where three sensor networks with dif- ferent characteristics are shown, possibly deployed in diﬀerent physical locations. The ﬁrst cloud on the left shows a single-tier network of homogeneous3. Network architecture video sensors. A subset of the deployed sensors have higher processing capabilities, and are thus referred The problem of designing a scalable network to as processing hubs. The union of the processingarchitecture is of primary importance. Most propos- hubs constitutes a distributed processing architec-als for wireless sensor networks are based on a ﬂat, ture. The multimedia content gathered is relayedhomogenous architecture in which every sensor has to a wireless gateway through a multi-hop path.the same physical capabilities and can only interact The gateway is interconnected to a storage hub, thatwith neighboring sensors. Traditionally, the research is in charge of storing multimedia content locallyon algorithms and protocols for sensor networks for subsequent retrieval. Clearly, more complexhas focused on scalability, i.e., how to design solu- architectures for distributed storage can be imple-tions whose applicability would not be limited by mented when allowed by the environment and the Fig. 1. Reference architecture of a wireless multimedia sensor network.
I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 927application needs, which may result in energy sav- of scalability, lower cost, better coverage, higherings since by storing it locally, the multimedia functionality, and better reliability.content does not need to be wirelessly relayed toremote locations. The wireless gateway is also 3.3. Coverageconnected to a central sink, which implements thesoftware front-end for network querying and In traditional WSNs, sensor nodes collect infor-tasking. The second cloud represents a single-tiered mation from the environment within a pre-deﬁnedclustered architecture of heterogeneous sensors sensing range, i.e., a roughly circular area deﬁned(only one cluster is depicted). Video, audio, and by the type of sensor being used.scalar sensors relay data to a central clusterhead, Multimedia sensors generally have larger sensingwhich is also in charge of performing intensive mul- radii and are also sensitive to the direction of datatimedia processing on the data (processing hub). acquisition. In particular, cameras can captureThe clusterhead relays the gathered content to the images of objects or parts of regions that are notwireless gateway and to the storage hub. The last necessarily close to the camera itself. However, thecloud on the right represents a multi-tiered network, image can obviously be captured only when therewith heterogeneous sensors. Each tier is in charge is an unobstructed line-of-sight between the eventof a subset of the functionalities. Resource-con- and the sensor. Furthermore, each multimediastrained, low-power scalar sensors are in charge sensor/camera perceives the environment or theof performing simpler tasks, such as detecting observed object from a diﬀerent and unique view-scalar physical measurements, while resource-rich, point, given the diﬀerent orientations and positionshigh-power devices are responsible for more com- of the cameras relative to the observed event orplex tasks. Data processing and storage can be region. In , a preliminary investigation of theperformed in a distributed fashion at each diﬀerent coverage problem for video sensor networks is con-tier. ducted. The concept of sensing range is replaced with the camera’s ﬁeld of view, i.e., the maximum3.2. Single-tier vs. multi-tier sensor deployment volume visible from the camera. It is also shown how an algorithm designed for traditional sensor One possible approach for designing a multime- networks does not perform well with video sensorsdia sensor application is to deploy homogeneous in terms of coverage preservation of the monitoredsensors and program each sensor to perform all pos- area.sible application tasks. Such an approach yields aﬂat, single-tier network of homogeneous sensor 4. Multimedia sensor hardwarenodes. An alternative, multi-tier approach is to useheterogeneous elements . In this approach, In this section, we review and classify existingresource-constrained, low-power elements are in imaging, multimedia, and processing wirelesscharge of performing simpler tasks, such as detect- devices that will ﬁnd application in next generationing scalar physical measurements, while resource- wireless multimedia sensor networks. In particular,rich, high-power devices take on more complex we discuss existing hardware, with a particulartasks. For instance, a surveillance application can emphasis on video capturing devices, review existingrely on low-ﬁdelity cameras or scalar acoustic sen- implementations of multimedia sensor networks,sors to perform motion or intrusion detection, while and discuss current possibilities for energy harvest-high-ﬁdelity cameras can be woken up on-demand ing for multimedia sensor devices.for object recognition and tracking. In , amulti-tier architecture is advocated for video sensor 4.1. Enabling hardware platformsnetworks for surveillance applications. The architec-ture is based on multiple tiers of cameras with diﬀer- High-end pan-tilt-zoom cameras and high resolu-ent functionalities, with the lower tier constituted of tion digital cameras are widely available on the mar-low-resolution imaging sensors, and the higher tier ket. However, while such sophisticated devices cancomposed of high-end pan-tilt-zoom cameras. It is ﬁnd application as high-quality tiers of multimediaargued, and shown by means of experiments, that sensor networks, we concentrate on low-cost, low-such an architecture oﬀers considerable advantages energy consumption imaging and processing deviceswith respect to a single-tier architecture in terms that will be densely deployed and provide detailed
928 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960visual information from multiple disparate view- Researchers at Carnegie Mellon University arepoints, help overcoming occlusion eﬀects, and thus developing the CMUcam 3, which is an embeddedenable enhanced interaction with the environment. camera endowed with a CIF Resolution (352 · 288) RGB color sensor that can load images into memory4.1.1. Low-resolution imaging motes at 26 frames per second. CMUcam 3 has software The recent availability of CMOS imaging sensors JPEG compression and has a basic image manipula- that capture and process an optical image within tion library, and can be interface with an 802.15.4a single integrated chip, thus eliminating the need for compliant TelosB mote .many separate chips required by the traditional In , the design of an integrated mote for wire-charged-coupled device (CCD) technology, has less image sensor networks is described. The designenabled the massive deployment of low-cost visual is driven by the need to endow motes with adequatesensors. CMOS image sensors are already in many processing power and memory size for image sens-industrial and consumer sectors, such as cell phones, ing applications. It is argued that 32-bit processorspersonal digital assistants (PDAs), consumer and are better suited for image processing than their 8-industrial digital cameras. CMOS image quality is bit counterpart, which is used in most existingnow matching CCD quality in the low- and mid- motes. It is shown that the time needed to performrange, while CCD is still the technology of choice operations such as 2-D convolution on an 8-bit pro-for high-end image sensors. The CMOS technology cessor such as the ATMEL ATmega128 clocked atallows integrating a lens, an image sensor and image 4 MHz is 16 times higher than with a 32-bitprocessing algorithms, including image stabilization ARM7 device clocked at 48 MHz, while the powerand image compression, on the same chip. With consumption of the 32-bit processor is only six timesrespect to CCD, cameras are smaller, lighter, and higher. Hence, an 8-bit processor turns out to beconsume less power. Hence, they constitute a suit- slower and more energy-consuming. Based on theseable technology to realize imaging sensors to be premises, a new image mote is developed based oninterfaced with wireless motes. an ARM7 32-bit CPU clocked at 48 MHz, with However, existing CMOS imagers are still external FRAM or Flash memory, 802.15.4 compli-designed to be interfaced with computationally rich ant Chipcon CC2420 radio, that is interfaced withhost devices, such as cell phones or PDAs. For this mid-resolution ADCM-1670 CIF CMOS sensorsreason, the objective of the Cyclops module  is and low-resolution 30 · 30 pixel optical sensors.to ﬁll the gap between CMOS cameras and compu- The same conclusion is drawn in , where thetationally constrained devices. Cyclops is an elec- energy consumption of the 8-bit Atmel AVR pro-tronic interface between a CMOS camera module cessor clocked at 8 MHz is compared to that ofand a wireless mote such as MICA2 or MICAz, the PXA255 32-bit Intel processor, embedded on aand contains programmable logic and memory for Stargate platform  and clocked at 400 MHz.high-speed data communication. Cyclops consists Three representative algorithms are selected asof an imager (CMOS Agilent ADCM-1700 CIF benchmarks, i.e., the cyclic redundancy check, acamera), an 8-bit ATMEL ATmega128L microcon- ﬁnite impulse response ﬁlter, and a fast Fouriertroller (MCU), a complex programmable logic transform. Surprisingly, it is shown that even fordevice (CPLD), an external SRAM and an external such relatively simple algorithms the energy con-Flash. The MCU controls the imager, conﬁgures its sumption of an 8-bit processor is between one andparameters, and performs local processing on the two orders of magnitude higher.image to produce an inference. Since image capturerequires faster data transfer and address generation 4.1.2. Medium-resolution imaging motes based on thethan the 4 MHz MCU used, a CPLD is used to pro- Stargate platformvide access to the high-speed clock. Cyclops ﬁrm- Intel has developed several prototypes that con-ware is written in the nesC language , based on stitute important building platform for WMSNthe TinyOS libraries. The module is connected to applications. The Stargate board  is a high-per-a host mote to which it provides a high level inter- formance processing platform designed for sensor,face that hides the complexity of the imaging device signal processing, control, robotics, and sensor net-to the host mote. Moreover, it can perform simple work applications. It is designed by Intel and pro-inference on the image data and present it to the duced by Crossbow. Stargate is based on Intel’shost. PXA-255 XScale 400 MHz RISC processor, which
I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 929is the same processor found in many handheld com- Systems able to perpetually power sensors basedputers including the Compaq IPAQ and the Dell on simple COTS photovoltaic cells coupled withAxim. Stargate has 32 Mbyte of Flash memory, supercapacitors and rechargeable batteries have64 Mbyte of SDRAM, and an on-board connector been already demonstrated . In , the statefor Crossbow’s MICA2 or MICAz motes as well of the art in more unconventional techniques foras PCMCIA Bluetooth or IEEE 802.11 cards. energy harvesting (also referred to as energy scav-Hence, it can work as a wireless gateway and as a enging) is surveyed. Technologies to generate energycomputational hub for in-network processing algo- from background radio signals, thermoelectric con-rithms. When connected with a webcam or other version, vibrational excitation, and the humancapturing device, it can function as a medium-reso- body, are overviewed.lution multimedia sensor, although its energy con- As far as collecting energy from backgroundsumption is still high, as documented in . radio signals is concerned, unfortunately, an electricMoreover, although eﬃcient software implementa- ﬁeld of 1 V/m yields only 0.26 lW/cm2, as opposedtions exist, XScale processors do not have hardware to 100 lW/cm2 produced by a crystalline siliconsupport for ﬂoating point operations, which may be solar cell exposed to bright sunlight. Electric ﬁeldsneeded to eﬃciently perform multimedia processing of intensity of a few volts per meter are only encoun-algorithms. tered close to strong transmitters. Another practice, Intel has also developed two prototypal genera- which consists in broadcasting RF energy deliber-tions of wireless sensors, known as Imote and ately to power electronic devices, is severely limitedImote2. Imote is built around an integrated wireless by legal limits set by health and safety concerns.microcontroller consisting of an 8-bit 12 MHz While thermoelectric conversion may not be suit-ARM7 processor, a Bluetooth radio, 64 Kbyte able for wireless devices, harvesting energy fromRAM and 32 Kbyte FLASH memory, as well as vibrations in the surrounding environment may pro-several I/O options. The software architecture is vide another useful source of energy. Vibrationalbased on an ARM port of TinyOS. The second gen- magnetic power generators based on moving mag-eration of Intel motes has a common core to the nets or coils may yield powers that range from tensnext generation Stargate 2 platform, and is built of microwatts when based on microelectromechani-around a new low-power 32-bit PXA271 XScale cal system (MEMS) technologies to over a milliwattprocessor at 320/416/520 MHz, which enables per- for larger devices. Other vibrational microgenera-forming DSP operations for storage or compres- tors are based on charged capacitors with movingsion, and an IEEE 802.15.4 ChipCon CC2420 plates, and depending on their excitation and powerradio. It has large on-board RAM and Flash mem- conditioning, yield power on the order of 10 lW. Inories (32 Mbyte), additional support for alternate , it is also reported that recent analysis  sug-radios, and a variety of high-speed I/O to connect gested that 1 cm3 vibrational microgenerators candigital sensors or cameras. Its size is also very lim- be expected to yield up to 800 lW/cm3 fromited, 48 · 33 mm, and it can run the Linux operating machine-induced stimuli, which is orders of magni-system and Java applications. tude higher than what provided by currently avail- able microgenerators. Hence, this is a promising4.2. Energy harvesting area of research for small battery-powered devices. While these techniques may provide an addi- As mentioned before, techniques for prolonging tional source of energy and help prolong the lifetimethe lifetime of battery-powered sensors have been of sensor devices, they yield power that is severalthe focus of a vast amount of literature in sensor orders of magnitude lower as compared to thenetworks. These techniques include hardware opti- power consumption of state-of-the-art multimediamizations such as dynamic optimization of voltage devices. Hence, they may currently be suitable onlyand clock rate, wake-up procedures to keep elec- for very-low duty cycle devices.tronics inactive most of the time, and energy-awareprotocol development for sensor communications. 4.3. Examples of deployed multimedia sensorIn addition, energy-harvesting techniques, which networksextract energy from the environment where the sen-sor itself lies, oﬀer another important mean to pro- There have been several recent experimentallong the lifetime of sensor devices. studies, mostly limited to video sensor networks.
930 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960Panoptes  is a system developed for environmen- ﬁdelity Cyclops  or CMUcam  camera sen-tal observation and surveillance applications, based sors. The third tier consists of Stargate  nodeson Intel StrongARM PDA platforms with a Logi- equipped with webcams. Each Stargate is equippedtech webcam as a video capture device. Here, video with an embedded 400 MHz XScale processor thatsensors are high-end devices with Linux operating runs Linux and a webcam that can capture highersystem, 64 Mbyte of memory, and are networked ﬁdelity images than tier 2 cameras. Tier 3 nodes alsothrough 802.11 networking cards. The system perform gateway functions, as they are endowedincludes spatial compression (but not temporal), with a low data rate radio to communicate withdistributed ﬁltering, buﬀering, and adaptive priori- motes in tiers 1–2 at 900 MHz, and an 802.11 radioties for the video stream. to communicate with tier 3 Stargate nodes. An addi- In , a system whose objective is to limit the tional fourth tier may consist of a sparse deploy-computation, bandwidth, and human attention bur- ment of high-resolution, high-end pan-tilt-zoomdens imposed by large-scale video surveillance sys- cameras connected to embedded PCs. The cameratems is described. In-network processing is used sensors at this tier can be used to track movingon each camera to ﬁlter out uninteresting events objects, and can be utilized to ﬁll coverage gapslocally, avoiding disambiguation and tracking of and provide additional redundancy. The underlyingirrelevant environmental distractors. A resource design principle is to map each task requested by theallocation algorithm is also proposed to steer pan- application to the lowest tier with suﬃcienttilt cameras to follow interesting targets while main- resources to perform the task. Devices from highertaining awareness of possibly emerging new targets. tiers are woken up on-demand only when necessary. In , the design and implementation of Sens- For example, a high-resolution camera can beEye, a multi-tier network of heterogeneous wireless woken up to retrieve high resolution images of annodes and cameras, is described. The surveillance object that has been previously detected by a lowerapplication consists of three tasks: object detection, tier. It is shown that the system can achieve an orderrecognition and tracking. The objective of the of magnitude reduction in energy consumptiondesign is to demonstrate that a camera sensor net- while providing comparable surveillance accuracywork containing heterogeneous elements provides with respect to single-tier surveillance systems.numerous beneﬁts over traditional homogeneous In , experimental results on the energy con-sensor networks. For this reason, SensEye follows sumption of a video sensor network testbed are pre-a three-tier architecture, as shown in Fig. 2. The sented. Each sensing node in the testbed consists oflowest tier consists of low-end devices, i.e., MICA2 a Stargate board equipped with an 802.11 wirelessMotes equipped with 900 MHz radios interfaced network card and a Logitech QuickCam Pro 4000with scalar sensors, e.g., vibration sensors. The sec- webcam. The energy consumption is assessed usingond tier is made up of motes equipped with low- a benchmark that runs basic tasks such as process- ing, ﬂash memory access, image acquisition, and communication over the network. Both steady state Video stream and transient energy consumption behavior Tier 3 handoff obtained by direct measurements of current with a digital multimeter are reported. In the steady state, it is shown that communication-related tasks are Webcam + Stargate wakeup less energy-consuming than intensive processing and ﬂash access when the radio modules are loaded. Tier 2 Interestingly, and unlike in traditional wireless sen- sor networks , the processing-intensive bench- Low-res cam + Mote mark results in the highest current requirement, wakeup and transmission is shown to be only about 5% more energy-consuming than reception. Experimen- Tier 1 tal results also show that delay and additional amount of energy consumed due to transitions Scalar Sensors + (e.g., to go to sleep mode) are not negligible and Mote must be accounted for in network and protocol Fig. 2. The multi-tier architecture of SensEye . design.
I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 931 IrisNet (Internet-scale Resource-Intensive Sensor allows testing more complex algorithms and assessNetwork Services)  is an example software plat- the scalability of the communication protocolsform to deploy heterogeneous services on WMSNs. under examination.IrisNet allows harnessing a global, wide-area sensor The WMSN-testbed includes three diﬀerent typesnetwork by performing Internet-like queries on this of multimedia sensors: low-end imaging sensors,infrastructure. Video sensors and scalar sensors are medium-quality webcam-based multimedia sensors,spread throughout the environment, and collect and pan-tilt cameras mounted on mobile robots.potentially useful data. IrisNet allows users to per- Low-end imaging sensors such as CMOS cam-form Internet-like queries to video sensors and eras can be interfaced with Crossbow MICAzother data. The user views the sensor network as a motes. Medium-end video sensors are based onsingle unit that can be queried through a high-level Logitech webcams interfaced with Stargate plat-language. Each query operates over data collected forms (see Fig. 3).from the global sensor network, and allows simple The high-end video sensors consist of pan-tiltGoogle-like queries as well as more complex queries cameras installed on an Acroname GARCIAinvolving arithmetic and database operators. The architecture of IrisNet is two-tiered: hetero-geneous sensors implement a common shared inter-face and are called sensing agents (SA), while thedata produced by sensors is stored in a distributeddatabase that is implemented on organizing agents(OA). Diﬀerent sensing services are run simulta-neously on the architecture. Hence, the same hard-ware infrastructure can provide diﬀerent sensingservices. For example, a set of video sensors canprovide a parking space ﬁnder service, as well as asurveillance service. Sensor data is represented inthe Extensible Markup Language (XML), whichallows easy organization of hierarchical data. Agroup of OAs is responsible for a sensing service,collects data produced by that service, and orga- Fig. 3. Stargate board interfaced with a medium resolutionnizes the information in a distributed database to camera. Stargate hosts an 802.11 card and a MICAz mote thatanswer the class of relevant queries. IrisNet also functions as a gateway to the sensor network.allows programming sensors with ﬁltering code thatprocesses sensor readings in a service-speciﬁc way.A single SA can execute several such software ﬁlters(called senselets) that process the raw sensor databased on the requirements of the service that needsto access the data. After senselet processing, the dis-tilled information is sent to a nearby OA. We have recently built an experimental testbedat the Broadband and Wireless Networking(BWN) Laboratory at Georgia Tech based on cur-rently oﬀ-the-shelf advanced devices to demonstratethe eﬃciency of algorithms and protocols for multi-media communications through wireless sensornetworks. The testbed is integrated with our scalar sensornetwork testbed, which is composed of a heteroge-neous collection of imotes from Intel and MICAzmotes from Crossbow. Although our testbed Fig. 4. Acroname GARCIA, a mobile robot with a mountedalready includes 60 scalar sensors, we plan to pan-tilt camera and endowed with 802.11 as well as Zigbeeincrease its size to deploy a higher scale testbed that interfaces.
932 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 processing architectures, to allow real-time retrieval of useful information. Similarly, it is necessary to develop architectures that eﬃciently allow performing data fusion or other complex processing operations in-network. Algorithms for both inter-media and intra-media data aggregation and fusion need to be developed, as simple distributed processing schemes developed for existing scalar sensors are not suitable for com- putation-intensive processing required by multime- dia contents. Multimedia sensor networks may require computation-intensive processing algo- rithms (e.g., to detect the presence of suspiciousFig. 5. GARCIA deployed on the sensor testbed. It acts as a activity from a video stream). This may require con-mobile sink, and can move to the area of interest for closer visual siderable processing to extract meaningful informa-inspection. It can also coordinate with other actors and has built- tion and/or to perform compression. A fundamentalin collision avoidance capability. question to be answered is whether this processing can be done on sensor nodes (i.e., a ﬂat architecture of multi-functional sensors that can perform anyrobotic platform , which we refer to as actor, and task), or if the need for specialized devices, e.g.,shown in Fig. 4. Actors constitute a mobile platform computation hubs, arises.that can perform adaptive sampling based on event In what follows, we discuss a non-exhaustive setfeatures detected by low-end motes. The mobile of signiﬁcative examples of processing techniquesactor can redirect high-resolution cameras to a that would be applicable distributively in a WMSN,region of interest when events are detected by and that will likely drive research on architectureslower-tier, low-resolution video sensors that are and algorithms for distributed processing of rawdensely deployed, as seen in Fig. 5. sensor data. The testbed also includes storage and computa-tional hubs, which are needed to store large multi- 5.1. Data alignment and image registrationmedia content and perform computationallyintensive multimedia processing algorithms. Data alignment consists of merging information from multiple sources. One of the most widespread5. Collaborative in-network processing data alignment concepts, image registration , is a family of techniques, widely used in areas such as As discussed previously, collaborative in-net- remote sensing, medical imaging, and computerwork multimedia processing techniques are of great vision, to geometrically align diﬀerent images (refer-interest in the context of a WMSN. It is necessary to ence and sensed images) of the same scene taken atdevelop architectures and algorithms to ﬂexibly per- diﬀerent times, from diﬀerent viewpoints, and/or byform these functionalities in-network with minimum diﬀerent sensors:energy consumption and limited execution time.The objective is usually to avoid transmitting large • Diﬀerent Viewpoints (Multi-view Analysis). Imagesamounts of raw streams to the sink by processing of the same scene are acquired from diﬀerentthe data in the network to reduce the communica- viewpoints, to gain a larger 2D view or a 3D rep-tion volume. resentation of the scene of interest. Main applica- Given a source of data (e.g., a video stream), dif- tions are in remote sensing, computer vision andferent applications may require diverse information 3D shape recovery.(e.g., raw video stream vs. simple scalar or binary • Diﬀerent times (multi-temporal analysis). Imagesinformation inferred by processing the video of the same scene are acquired at diﬀerent times.stream). This is referred to as application-speciﬁc The aim is to ﬁnd and evaluate changes in time inquerying and processing. Hence, it is necessary to the scene of interest. The main applications aredevelop expressive and eﬃcient querying languages, in computer vision, security monitoring, andand to develop distributed ﬁltering and in-network motion tracking.
I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 933• Diﬀerent sensors (multi-modal analysis). Images coordinate computations across the vision nodes of the same scene are acquired by diﬀerent sen- and return the integrated results, which will consist sors. The objective is to integrate the information of metadata information, to the ﬁnal user. obtained from diﬀerent source streams to gain In , the proposed Deep Vision network per- more complex and detailed scene representation. forms operations including object detection or clas- siﬁcation, image segmentation, and motion analysis Registration methods usually consist of four through a network of low-end MICA motessteps, i.e., feature detection, feature matching, trans- equipped with Cyclops cameras . Informationform model estimation, and image resampling and such as the presence of an intruder, the number oftransformation. In feature detection, distinctive visitors in a scene or the probability of presence ofobjects such as closed-boundary regions, edges, con- a human in the monitored area is obtained by col-tours, line intersections, corners, etc. are detected. lecting the results of these operations. Deep VisionIn feature matching, the correspondence between provides a querying interface to the user in the formthe features detected in the sensed image and those of declarative queries. Each operation is representeddetected in the reference image is established. In as an attribute that can be executed through antransform model estimation, the type and parame- appropriate query. In this way, low-level operationsters of the so-called mapping functions, which align and processing are encapsulated in a high-level que-the sensed image with the reference image, are esti- rying interface that enables simple interaction withmated. The parameters of the mapping functions the video network. As an example, the vision net-are computed by means of the established feature work can be deployed in areas with public andcorrespondence. In the last step, image resampling restricted access spaces. The task of detectingand transformation, the sensed image is trans- objects in the restricted-access area can be expressedformed by means of the mapping functions. as a query that requests the result of object detec- These functionalities can clearly be prohibitive tion computations such asfor a single sensor. Hence, research is needed onhow to perform these functionalities on parallel SELECT Object,Locationarchitectures of sensors to produce single data sets. REPORT = 30 FROM Network5.2. WMSNs as distributed computer vision systems WHERE Access = Restricted PERIOD = 30. Computer vision is a subﬁeld of artiﬁcial intelli-gence, whose purpose is to allow a computer to The above query triggers the execution of theextract features from a scene, an image or multi- object detection process on the vision nodes thatdimensional data in general. The objective is to are located in the restricted-access areas in 30 spresent this information to a human operator or intervals.to control some process (e.g., a mobile robot or anautonomous vehicle). The image data that is fed 6. Application layerinto a computer vision system is often a digitalimage, a video sequence, a 3D volume from a The functionalities handled at the applicationtomography device or other multimedia content. layer of a WMSN are characterized by high hetero-Traditional computer vision algorithms require geneity, and encompass traditional communicationextensive computation, which in turn entails high problems as well as more general system challenges.power consumption. The services oﬀered by the application layer include: WMSNs enable a new approach to computer (i) providing traﬃc management and admission con-vision, where visual observations across the network trol functionalities, i.e., prevent applications fromcan be performed by means of distributed computa- establishing data ﬂows when the network resourcestions on multiple, possibly low-end, vision nodes. needed are not available; (ii) performing sourceThis requires tools to interface with the user such coding according to application requirements andas new querying languages and abstractions to hardware constraints, by leveraging advanced mul-express complex tasks that are then distributively timedia encoding techniques; (iii) providing ﬂexibleaccomplished through low-level operations on mul- and eﬃcient system software, i.e., operating systemstiple vision nodes. To this aim, it is necessary to and middleware, to export services for higher-layer
934 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960applications to build upon; (iv) providing primitives but the application is moderately loss-tolerant.for applications to leverage collaborative, advanced The bandwidth demand is usually between lowin-network multimedia processing techniques. In and moderate.this section, we provide an overview of these • Real-time, Loss-intolerant, Data. This may includechallenges. data from time-critical monitoring processes such as distributed control applications. The band-6.1. Traﬃc classes width demand varies between low and moderate. • Delay-tolerant, Loss-intolerant, Data. This may Admission control has to be based on QoS include data from critical monitoring processes,requirements of the overlying application. We envi- with low or moderate bandwidth demand thatsion that WMSNs will need to provide support and require some form of oﬄine post processing.diﬀerentiated service for several diﬀerent classes • Delay-tolerant, Loss-tolerant, Data. This mayof applications. In particular, they will need to include environmental data from scalar sensorprovide diﬀerentiated service between real-time networks, or non-time-critical snapshot multime-and delay-tolerant applications, and loss-tolerant dia content, with low or moderate bandwidthand loss-intolerant applications. Moreover, some demand.applications may require a continuous stream ofmultimedia data for a prolonged period of time QoS requirements have recently been considered(multimedia streaming), while some other applica- as application admission criteria for sensor networks.tions may require event triggered observations In , an application admission control algorithm isobtained in a short time period (snapshot multimedia proposed whose objective is to maximize the networkcontent). The main traﬃc classes that need to be lifetime subject to bandwidth and reliability con-supported are: straints of the application. An application admission control method is proposed in , which determines• Real-time, Loss-tolerant, Multimedia Streams. admissions based on the added energy load and This class includes video and audio streams, or application rewards. While these approaches address multi-level streams composed of video/audio application level QoS considerations, they fail to con- and other scalar data (e.g., temperature read- sider multiple QoS requirements (e.g., delay, reliabil- ings), as well as metadata associated with the ity, and energy consumption) simultaneously, as stream, that need to reach a human or automated required in WMSNs. Furthermore, these solutions operator in real-time, i.e., within strict delay do not consider the peculiarities of WMSNs, i.e., they bounds, and that are however relatively loss tol- do not try to base admission control on a tight bal- erant (e.g., video streams can be within a certain ancing between communication optimizations and level of distortion). Traﬃc in this class usually in-network computation. There is a clear need for has high bandwidth demand. new criteria and mechanisms to manage the admis-• Delay-tolerant, Loss-tolerant, Multimedia Streams. sion of multimedia ﬂows according to the desired This class includes multimedia streams that, being application-layer QoS. intended for storage or subsequent oﬄine process- ing, do not need to be delivered within strict delay 6.2. Multimedia encoding techniques bounds. However, due to the typically high band- width demand of multimedia streams and to lim- There exists a vast literature on multimedia ited buﬀers of multimedia sensors, data in this encoding techniques. The captured multimedia con- traﬃc class needs to be transmitted almost in tent should ideally be represented in such a way as real-time to avoid excessive losses. to allow reliable transmission over lossy channels• Real-time, Loss-tolerant, Data. This class may (error-resilient coding), using algorithms that mini- include monitoring data from densely deployed mize processing power and the amount of informa- scalar sensors such as light sensors whose moni- tion to be transmitted. The main design objectives tored phenomenon is characterized by spatial of a coder for multimedia sensor networks are thus: correlation, or loss-tolerant snapshot multimedia data (e.g., images of a phenomenon taken from • High compression eﬃciency. Uncompressed raw several multiple viewpoints at the same time). video streams require high data rates and thus Hence, sensor data has to be received timely consume excessive bandwidth and energy. It is
I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 935 necessary to achieve a high ratio of compres- refers to the compression of multiple correlated sen- sion to eﬀectively limit bandwidth and energy sor outputs that do not communicate with each consumption. other . Joint decoding is performed by a central• Low complexity. Multimedia encoders are entity that receives data independently compressed embedded in sensor devices. Hence, they need by diﬀerent sensors. However, practical solutions to be low complexity to reduce cost and form fac- have not been developed until recently. Clearly, tors, and low-power to prolong the lifetime of such techniques are very promising for WMSNs sensor nodes. and especially for networks of video sensors. The• Error resiliency. The source coder should provide encoder can be simple and low-power, while the robust and error-resilient coding of source data. decoder at the sink will be complex and loaded with most of the processing and energy burden. The To achieve a high compression eﬃciency, the tra- reader is referred to [131,50] for excellent surveysditional broadcasting paradigm for wireline and on the state of the art of distributed source codingwireless communications, where video is com- in sensor networks and in distributed video coding,pressed once at the encoder and decoded several respectively. Other encoding and compressiontimes, has been dominated by predictive encoding schemes that may be considered for source codingtechniques. These, used in the widely spread ISO of multimedia streams, including JPEG with diﬀer-MPEG schemes, or the ITU-T recommendations ential encoding, distributed coding of images takenH.263  and H.264  (also known as AVC or by cameras having overlapping ﬁelds of view, orMPEG-4 part 10), are based on the idea of reducing multi-layer coding with wavelet compression, arethe bit rate generated by the source encoder by discussed in . Here, we focus on recent advancesexploiting source statistics. Hence, intra-frame com- on low complexity encoders based on Wyner–Zivpression techniques are used to reduce redundancy coding , which are promising solutions for dis-within one frame, while inter-frame compression tributed networks of video sensors that are likely to(also known as predictive encoding or motion estima- have a major impact in future design of protocolstion) exploits correlation among subsequent frames for WMSNs.to reduce the amount of data to be transmitted The objective of a Wyner–Ziv video coder is toand stored, thus achieving good rate-distortion per- achieve lossy compression of video streams andformance. Since the computational complexity is achieve performance comparable to that of inter-dominated by the motion estimation functionality, frame encoding (e.g., MPEG), with complexity atthese techniques require complex encoders, power- the encoder comparable to that of intra-frame cod-ful processing algorithms, and entail high energy ers (e.g., Motion-JPEG).consumption, while decoders are simpler and loadedwith lower processing burden. For typical imple- 6.2.1. Pixel-domain Wyner–Ziv encodermentations of state-of-the-art video compression In [14,15], a practical Wyner–Ziv encoder is pro-standards, such as MPEG or H.263 and H.264, posed as a combination of a pixel-domain intra-the encoder is 5–10 times more complex than the frame encoder and inter-frame decoder system fordecoder . It is easy to see that to realize low-cost, video compression. A block diagram of the systemlow-energy-consumption multimedia sensors it is is reported in Fig. 6. A regularly spaced subset ofnecessary to develop simpler encoders, and still frames is coded using a conventional intra-frameretain the advantages of high compression coding technique, such as JPEG, as shown at theeﬃciency. bottom of the ﬁgure. These are referred to as key However, it is known from information-theoretic frames. All frames between the key frames arebounds established by Slepian and Wolf for lossless referred to as Wyner–Ziv frames and are intra-framecoding  and by Wyner and Ziv  for lossy encoded but inter-frame decoded. The intra-framecoding with decoder side information, that eﬃcient encoder for Wyner–Ziv frames (shown on top) iscompression can be achieved by leveraging knowl- composed of a quantizer followed by a Slepian–edge of the source statistics at the decoder only. This Wolf coder. Each Wyner–Ziv frame is quantizedway, the traditional balance of complex encoder and and blocks of symbols are sent to the Slepian–Wolfsimple decoder can be reversed . Techniques that coder, which is implemented through rate-compati-build upon these results are usually referred to as ble punctured turbo codes (RCPT). The parity bitsdistributed source coding. Distributed source coding generated by the RCPT coder are stored in a buﬀer.
936 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 INTRAFRAME ENCODER INTERFRAME DECODER WYNER-ZIV SLEPIAN-WOLF CODER FRAMES RCPT RCPT QUANTIZER BUFFER RECONSTRUCTION ENCODER DECODER DECODED WYNER-ZIV FRAMES SIDE INFORMATION DECODER FEEDBACK E REQUEST ADDITIONAL BITS INTERPOLATION AND EXTRAPOLATION DECODED KEY INTRAFRAME KEY FRAMES FRAMES ENCODER INTRAFRAME (E.G. JPEG) DECODER Fig. 6. Block diagram of a pixel-domain Wyner–Ziv encoder .A subset of these bits is then transmitted upon 6.2.2. Transform-domain Wyner–Ziv encoderrequest from the decoder. This allows adapting the In conventional source coding, a source vector israte based on the temporally varying statistics typically decomposed into spectral coeﬃcients bybetween the Wyner–Ziv frame and the side informa- using orthonormal transforms such as the Discretetion. The parity bits generated by the RCPT coder Cosine Transform (DCT). These coeﬃcients areare in fact used to ‘‘correct’’ the frame interpo- then individually coded with scalar quantizerslated at the decoder. For each Wyner–Ziv frame, and entropy coders. In , a transform-domainthe decoder generates the side information frame Wyner–Ziv encoder is proposed. A block-wiseby interpolation or extrapolation of previously DCT of each Wyner–Ziv frame is performed. Thedecoded key frames and Wyner–Ziv frames. The transform coeﬃcients are independently quantized,side information is leveraged by assuming a Lapla- grouped into coeﬃcient bands, and then com-cian distribution of the diﬀerence between the indi- pressed by a Slepian–Wolf turbo coder. As in thevidual pixels of the original frame and the side pixel-domain encoder described in the previous sec-information. The parameter deﬁning the Laplacian tion, the decoder generates a side information framedistribution is estimated online. The turbo decoder based on previously reconstructed frames. Based oncombines the side information and the parity bits the side information, a bank of turbo decodersto reconstruct the original sequence of symbols. If reconstructs the quantized coeﬃcient bands inde-reliable decoding of the original symbols is impossi- pendently. The rate-distortion performance isble, the turbo decoder requests additional parity bits between conventional intra-frame transform codingfrom the encoder buﬀer. and conventional motion-compensated transform Compared to predictive coding such as MPEG or coding.H.26X, pixel-domain Wyner–Ziv encoding is much A diﬀerent approach consists of allowing somesimpler. The Slepian–Wolf encoder only requires simple temporal dependence estimation at the enco-two feedback shift registers and an interleaver. der to perform rate control without the need forIts performance, in terms of peak signal-to-noise feedback from the receiver. In the PRISM schemeratio (PSNR), is 2–5 dB better than conventional , the encoder selects the coding mode basedmotion-JPEG intra-frame coding. The main draw- on the frame diﬀerence energy between the currentback of this scheme is that it relies on online feed- frame and a previous frame. If the energy of the dif-back from the receiver. Hence it may not be ference is very small, the block is not encoded. If thesuitable for applications where video is encoded block diﬀerence is large, the block is intra-coded.and stored for subsequent use. Moreover, the feed- Between these two situations, one of diﬀerentback may introduce excessive latency for video encoding modes with diﬀerent rates is selected.decoding in a multi-hop network. The rate estimation does not involve motion
I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 937compensation and hence is necessarily inaccurate, if not compromising on performance. The principalmotion compensation is used at the decoder. design objective of existing operating systems for sen-Further, the ﬂexibility of the decoder is restricted. sor networks such as TinyOS is high performance. However, their ﬂexibility, inter-operability and rep-6.3. System software and middleware rogrammability are very limited. There is a need for research on systems that allow for this integration. The development of eﬃcient and ﬂexible system We believe that it is of paramount importance tosoftware to make functional abstractions and infor- develop eﬃcient, high level abstractions that willmation gathered by scalar and multimedia sensors enable easy and fast development of sensor networkavailable to higher layer applications is one of the applications. An abstraction similar to the famousmost important challenges faced by researchers to Berkeley TCP sockets, that fostered the develop-manage complexity and heterogeneity of sensor sys- ment of Internet applications, is needed for sensortems. As in , the term system software is used systems. However, diﬀerently from the Berkeleyhere to refer to operating systems, virtual machines, sockets, it is necessary to retain control on the eﬃ-and middleware, which export services to higher- ciency of the low-level operations performed on bat-layer applications. Diﬀerent multimedia sensor net- tery-limited and resource-constrained sensor nodes.work applications are extremely diverse in their As a ﬁrst step towards this direction, Chu et al.requirements and in the way they interact with the  recently proposed Sdlib, a sensor network datacomponents of a sensor system. Hence, the main and communications library built upon the nescdesired characteristics of a system software for language  for applications that require best-WMSNs can be identiﬁed as follows: eﬀort collection of large-size data such as video monitoring applications. The objective of the eﬀort• Provides a high-level interface to specify the is to identify common functionalities shared by behavior of the sensor system. This includes several sensor network applications and to develop semantically rich querying languages that allow a library of thoroughly-tested, reusable and eﬃcient specifying what kind of data is requested from nesC components that abstract high-level opera- the sensor network, the quality of the required tions common to most applications, while leaving data, and how it should be presented to the user; diﬀerences among them to adjustable parameters.• Allows the user to specify application-speciﬁc The library is called Sdlib, Sensor Data Library, algorithms to perform in-network processing on as an analogy to the traditional C++ Standard the multimedia content . For example, the Template Library. Sdlib provides an abstraction user should be able to specify particular image for common operations in sensor networks while processing algorithms or multimedia coding the developer is still able to access low-level opera- format; tions, which are implemented as a collection of nesC• Long-lived, i.e., needs to smoothly support evo- components, when desired. Moreover, to retain eﬃ- lutions of the underlying hardware and software; ciency of operations that are so critical for sensor• Shared among multiple heterogeneous appli- networks battery lifetime and resource constraints, cations; Sdlib exposes policy decisions such as resource allo-• Shared among heterogeneous sensors and plat- cation and rate of operation to the developer, while forms. Scalar and multimedia sensor networks hiding the mechanisms of policy enforcement. should coexist in the same architecture, without compromising on performance; 6.4. Open research issues• Scalable. • While theoretical results on Slepian–Wolf and There is an inherent trade-oﬀ between degrees of Wyner–Ziv coding exist since 30 years, there isﬂexibility and network performance. Platform-inde- still a lack of practical solutions. The net beneﬁtspendence is usually achieved through layers of and the practicality of these techniques still needabstraction, which usually introduce redundancy to be demonstrated.and prevent the developer from accessing low-level • It is necessary to fully explore the trade-oﬀsdetails and functionalities. However, WMSNs are between the achieved ﬁdelity in the descriptioncharacterized by the contrasting objectives of opti- of the phenomenon observed, and the resultingmizing the use of the scarce network resources and energy consumption. As an example, the video
938 I.F. Akyildiz et al. / Computer Networks 51 (2007) 921–960 distortion perceived by the ﬁnal user depends on from temporary disruption of the application, it source coding (frame rate, quantization), and may cause rapid depletion of the node’s energy. on channel coding strength. For example, in a While applications running on traditional wire- surveillance application, the objective of maxi- less networks may only experience performance mizing the event detection probability is in con- degradation, the energy loss (due to collisions trast with the objective of minimizing the power and retransmissions) can result in network parti- consumption. tion. Thus, congestion control algorithms may• As discussed above, there is a need for high- need to be tuned for immediate response and layer abstractions that will allow fast develop- yet avoid oscillations of data rate along the ment of sensor applications. However, due to the aﬀected path. resource-constrained nature of sensor systems, • Packet re-ordering due to multi-path. Multiple it is necessary to control the eﬃciency of the paths may exist between a given source-sink pair, low-level operations performed on battery- and the order of packet delivery is strongly inﬂu- limited and resource-constrained sensor nodes. enced by the characteristics of the route chosen.• There is a need for simple yet expressive As an additional challenge, in real-time video/ high-level primitives for applications to leverage audio feeds or streaming media, information that collaborative, advanced in-network multimedia cannot be used in the proper sequence becomes processing techniques. redundant, thus stressing on the need for trans- port layer packet reordering.7. Transport layer We next explore the functionalities and support In applications involving high-rate data, the provided by the transport layer to address thesetransport layer assumes special importance by pro- and other challenges of WMSNs. The following dis-viding end-to-end reliability and congestion control cussion is classiﬁed into (1) TCP/UDP and TCPmechanisms. Particularly, in WMSNs, the following friendly schemes for WMSNs, and (2) application-additional considerations are in order to accommo- speciﬁc and non-standardized protocols. Fig. 7 sum-date both the unique characteristics of the WSN marizes the discussion in this section.paradigm and multimedia transport requirements. 7.1. TCP/UDP and TCP friendly schemes for• Eﬀects of congestion. In WMSNs, the eﬀect of WMSNs congestion may be even more pronounced as compared to traditional networks. When a bot- For real-time applications like streaming media, tleneck sensor is swamped with packets coming the User Datagram Protocol (UDP) is preferred from several high-rate multimedia streams, apart over TCP as timeliness is of greater concern than Transport Layer TCP/UDP and TCP Friendly Schemes Application Specific and Non-standard Protocols • TCP may be preferred over UDP unlike traditional wireless networks • Compatible with the TCP rate control mechanism Reliability Congestion Control Use of Multipath E.g.. STCP , MPEG-TFRCP  • Per-packet delivery • Spatio-temporal • Better load guarantee for selected reporting balancing and packet types robustness to channel • Adjusting of reporting state variability. • Redundancy by frequency based on caching at intermediate current congestion levels • Need to regulate nodes multiple sources Eg. ESRT  monitoring the same Eg. RMST , event PSFQ , (RT)2  Eg. CODA , MRTP  Fig. 7. Classiﬁcation of existing transport layer protocols.