01337277

2992 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 10, OCTOBER 2004
On Compressing Encrypted Data
Mark Johnson, Student Member, IEEE, Prakash Ishwar, Vinod Prabhakaran, Student Member, IEEE,
Daniel Schonberg, Student Member, IEEE, and Kannan Ramchandran, Senior Member, IEEE
Abstract—When it is desired to transmit redundant data over
an insecure and bandwidth-constrained channel, it is customary
to first compress the data and then encrypt it. In this paper, we
investigate the novelty of reversing the order of these steps, i.e.,
first encrypting and then compressing, without compromising
either the compression efficiency or the information-theoretic
security. Although counter-intuitive, we show surprisingly that,
through the use of coding with side information principles, this
reversal of order is indeed possible in some settings of interest
without loss of either optimal coding efficiency or perfect secrecy.
We show that in certain scenarios our scheme requires no more
randomness in the encryption key than the conventional system
where compression precedes encryption. In addition to proving
the theoretical feasibility of this reversal of operations, we also
describe a system which implements compression of encrypted
data.
Index Terms—Channel coding, code constructions, cryptog-
raphy, data compression, information theory, performance
bounds, secrecy, security, side-informaiton coding, source coding.
I. INTRODUCTION
CONSIDER the problem of transmitting redundant data
over an insecure, bandwidth-constrained communications
channel. It is desirable to both compress and encrypt the data.
The traditional way to do this, shown in Fig. 1, is to first
compress the data to strip it of its redundancy followed by
encryption of the compressed bitstream. The source is first
compressed to its entropy rate using a standard source coder.
Then, the compressed source is encrypted using one of the
many widely available encryption technologies. At the receiver,
decryption is performed first, followed by decompression.
In this paper, we investigate the novelty of reversing the order
of these steps, i.e., first encrypting and then compressing the en-
crypted source, as shown in Fig. 2. The compressor does not
have access to the cryptographic key, so it must be able to com-
press the encrypted data (also called ciphertext) without any
knowledge of the original source. At first glance, it appears that
only a minimal compression gain, if any, can be achieved, since
the output of an encryptor will look very random. However, at
the receiver, there is a decoder in which both decompression
and decryption are performed in a joint step. The fact that the
Manuscript received November 3, 2003; revised February 2, 2004. The work
of M. Johnson was supported by the Fannie and John Hertz Foundation. This
work was supported by the National Science Foundation (NSF) under Grants
CCR-0219722, CCR-0208883, and CCR-0096071 and by the Defence Ad-
vanced Research Projects Agency (DARPA) under Grant F30602-00-2-0538.
The associate editor coordinating the review of this manuscript and approving
it for publication was Dr. Ton A. C. M. Kalker.
The authors are with the Department of Electrical Engineering and
Computer Sciences, University of California, Berkeley, CA 94720 USA
(e-mail: mjohnson@eecs.berkeley.edu; ishwar@eecs.berkeley.edu; vinodmp@
eecs.berkeley.edu; dschonbe@eecs.berkeley.edu; kannanr@eecs.berkeley.edu).
Digital Object Identifier 10.1109/TSP.2004.833860
decoder can use the cryptographic key to assist in the decom-
pression of the received bitstream leads to the possibility that
we may be able to compress the encrypted source.
In fact, we show that a significant compression ratio can be
achieved if compression is performed after encryption. This is
true for both lossless and lossy compression. In some cases, we
can even achieve the same compression ratio as in the standard
case of first compressing and then encrypting. The fact that we
can still compress the encrypted source follows directly from
distributed source-coding theory. When we consider the case of
lossless compression, we use the Slepian–Wolf theorem [1] to
show that we can achieve the same compression gain as if we
had compressed the original, unencrypted source. For the case
of lossy compression, the Wyner–Ziv theorem [2] dictates the
compression gains that can be achieved. If the original source is
Gaussian, then we can achieve the same compression efficiency,
for any fixed distortion, as when we compress before encrypting.
For more general sources, we cannot achieve the same compres-
sion gains as in the conventional system, which is a direct result
of the rate-loss of the underlying Wyner–Ziv problem.
All of these claims relate to the theoretical limits of com-
pressing an encrypted source, and are demonstrated via non-
constructive, existence proofs. However, in addition to studying
the theoretical bounds, we also implement a system where the
compression step follows the encryption. We will describe the
construction of this system and present computer simulations of
its performance.
We also investigate the security provided by a system where a
message is first encrypted and then compressed. We first define
a measure of secrecy based on the statistical correlation of the
original source and the compressed, encrypted source. Then, we
show that the “reversed” cryptosystem in Fig. 2 can still have
perfect secrecy under some conditions.
While we focus here on the fact that the reversed cryp-
tosystem can match the performance of a conventional system,
we have uncovered a few application scenarios where the
reversed system might be preferable. In one such scenario, we
can imagine that some content, either discrete or continuous
in nature, is being distributed over a network. We will further
assume that the content owner and the network operator are
two distinct entities, and do not trust each other. The content
owner is very interested in protecting the privacy of the content
via encryption. However, because the owner has no incentive
to compress his data, he will not use his limited computational
resources to run a compression algorithm before encrypting
the data. The network operator, on the other hand, has an
overriding interest in compressing all network traffic, in order
to maximize network utilization and, therefore, maximize his
profit. However, because the content owner does not trust the
1053-587X/04$20.00 © 2004 IEEE

JOHNSON et al.: ON COMPRESSING ENCRYPTED DATA 2993
Fig. 1. Conventional system: The encoder first compresses the source and then encrypts before transmitting over a public channel. The decoder first decrypts
the received bitstream and then decompresses the result.
Fig. 2. Proposed system: The source is first encrypted and then compressed. The compressor does not have access to the key used in the encryption step. At the
decoder, decompression and decryption are performed in a single joint step.
network operator, the owner will not supply the cryptographic
key that was used to encrypt the data. The network operator is
forced to compress the data after it has been encrypted.
Our work was primarily inspired by recent code constructions
for the distributed source-coding problem [3], which we use in
the compression stage of our system. We are not aware of any
previous literature on the topic of compressing data that has al-
ready been encrypted.
The two main contributions of this work are the identification
of the connection between the stated problem and distributed
source coding, and the demonstration that in some scenarios re-
versing the order of encryption and compression does not com-
promise either the compression efficiency or the security. This
paper is organized in the following manner. Section II gives
some background information on distributed source coding. The
topics presented in that section will be used subsequently to de-
velop an efficient system for compressing encrypted data. In
Section III, the formal notion of information-theoretic security
is introduced and the performance limits of general cryptosys-
tems are established. The problem of compressing encrypted
data is formally stated in Section IV and a solution based on the
Wyner–Ziv-distributed source-coding theorem [2] is presented.
Results from computer simulations are presented in Section V
and some concluding remarks are given in Section VI. Involved
proofs of the main results have been placed in appendices to
maintain the flow of the presentation.
Notation: denotes the set of nonnegative real numbers.
Random quantities will be denoted by capital letters (e.g., ).
Specific realizations of random quantities will be denoted by
small letters (e.g., ). Boldface letters will denote vectors of
some generic block length , e.g., ,
, etc. Often, and
shall be abbreviated to and respectively. We
will denote the mathematical expectation operator by and
event probabilities by .
II. DISTRIBUTED SOURCE CODING
In this section, we describe the distributed source-coding
problem and provide the principles behind code constructions
for both lossless compression and compression with a fidelity
criterion. These code constructions will be used subsequently to
construct systems that implement the compression of encrypted
data.
A. Lossless Compression
Distributed source coding considers the problem of com-
pressing sources and that are correlated, but cannot
communicate with each other. In this subsection, we look at the
case where and are to be compressed losslessly. This is
possible only if and are drawn from discrete alphabets,
i.e., the size of the alphabets is at most countably infinite. An
important special case of this problem, upon which we will
focus, is when needs to be sent to a decoder which has access
to the correlated side-information . For this situation, the
Slepian–Wolf theorem [1] gives the smallest achievable rate for
communicating losslessly to the decoder. The Slepian–Wolf
theorem asserts that the best achievable rate required to transmit

Fig. 3. Distributed source-coding problem: The side information K is
available at both the encoder and the decoder.
Fig. 4. Distributed source-coding problem: Y and K are three-bit binary
sequences which differ by at most one bit. K is available only at the decoder.
The encoder can compress Y to two bits by sending the index of the coset in
which Y occurs.
is given by the conditional entropy [4] of given , denoted
by bits/sample.
While these results are theoretical, there has been some recent
work that provides practical code constructions to realize these
distributed compression gains [3]. We will use an example to
show the intuition behind these constructions.
We begin by looking at a problem where is available at
both the encoder and the decoder, as depicted in Fig. 3. In our
example, and are each uniformly distributed binary data
of length 3. Furthermore, and are correlated such that their
Hamming distance is at most 1, i.e., they differ in at most one
of the three bit positions. For example, if is 010, then will
equally likely be one of the four patterns .
The encoder forms the error pattern . Because
and differ in at most one bit position, the error pattern can
take on only four possible values, namely .
These four values can be indexed with 2 bits. That index is trans-
mitted to the decoder, which looks up the error pattern corre-
sponding to the index received from the encoder, and then com-
putes .
We next consider the problem in Fig. 4 where is available
at the decoder, but not at the encoder. Without , the encoder
cannot form the error pattern . However, it is still possible for
the encoder to compress to 2 bits and the decoder to recon-
struct without error. The reason behind this surprising fact is
that it is unnecessary for the encoder to spend any bits to differ-
entiate between and . The Hamming distance
of 3 between these two codewords is sufﬁciently large to enable
the decoder to correctly decode based on its access to and
the knowledge that is within a Hamming distance of 1 from
. If the decoder knows to be either or ,
it can resolve this ambiguity by checking which of the two is
closer in Hamming distance to , and declaring that codeword
to be . We observe that the set is a 3-bit repetition
code with a minimum distance of 3.
Likewise, in addition to the set , we can con-
sider the following three sets: , , and
. Each of these sets is composed of two codewords
whose Hamming distance is 3. These sets are the cosets of the
3-bit repetition code. While we typically use the set
as the 3-bit repetition code (0 is encoded as 000, and 1 as 111),
it is clear that one could just as easily have used any of the other
three cosets with the same performance. Also, these four sets
cover the complete space of binary 3-tuples that can assume.
Thus, instead of describing by its 3-bit value, we can instead
encode the coset in which occurs. There are four cosets, so
we need only 2 bits to index the cosets. We can compress
to 2 bits, just as in the case where was available at both the
encoder and decoder.
This simple code construction can be used to compress data
that has been encrypted with a one-time pad. In this problem,
is a binary pad that is used to encrypt a 3-bit data sequence ,
forming an encrypted data sequence . If can only
take on the values , then the Hamming dis-
tance between and is at most 1. We can use this construc-
tion to compress to 2 bits, and a decoder which has access
to will be able to correctly decode . The decoder can then
recover the original data by computing .
This construction can be extended beyond the simple example
considered here. The space of all possible words is partitioned
into cosets, which are associated with the syndromes of the prin-
cipal underlying channel code (the 3-bit repetition code in the
above example). The encoding procedure is to compute the syn-
drome of with respect to the appropriate channel code and
transmit this syndrome to the decoder. The choice of channel
code depends on the correlation structure between and .
If and are more correlated, then the required strength of
the code is less. In practice, we will use a much more complex
channel code than the simple repetition code. The decoding pro-
cedure is to identify the closest codeword to in the coset asso-
ciated with the transmitted syndrome, and declare that codeword
to be .
B. Compression with a Fidelity Criterion
The Wyner–Ziv theorem [2] extends the Slepian–Wolf re-
sult to the case of lossy coding with a distortion measure. This
theorem gives the best achievable rate-distortion pairs for the
problem of coding with side information. The theorem applies
to both discrete and continuous sources. However, the real line
is the natural alphabet for representing many signals of interest,
such as “natural” images. We are primarily interested in the case
where the source is a real number and a mean squared error dis-
tortion measure is used. We will provide an example that illus-
trates some of the intuition behind the implementation of an en-
coder/decoder pair for distributed source coding with a nonzero
ﬁdelity criterion.

Fig. 5. Composite quantizer: The scalar quantizer with step size can be thought of as three interleaved scalar quantizers with step size 3.
Fig. 6. Distributed lossy compression example: The encoder quantizes Y to ^Y and transmits the label of ^Y , an “A.” The decoder finds the reconstruction level
labeled “A,” which is closest to the side information K, which is equal to ^Y .
In this example, is uniformly distributed in the interval
. The side information is correlated with ,
such that . The encoder will first quantize to
with a scalar quantizer with step size , which we show in
Fig. 5. Clearly, the distance between and is bounded by
. We can think of the quantizer as consisting
of three interleaved quantizers (cosets), each of step size . In
Fig. 5, we have labeled the reconstruction levels of the three
quantizers as “A,” “B,” and “C,” respectively. The encoder, after
quantizing , will note the label of and send this label to the
decoder, which requires bits on average.
The decoder has access to the label transmitted by the encoder
and the side information . We can bound the distance between
and as
(1)
Because and are within a distance of of each other
and the reconstruction levels with the same label are separated
by , the decoder can correctly find by selecting the recon-
struction level with the label sent by the encoder that is closest
to . This can be seen in Fig. 6, which shows one realization
of and . In this figure, the encoder quantizes to and
transmits the label, an “A”, to the decoder. The decoder finds
the reconstruction level labeled “A” that is closest to , which
is in fact .
In this example, the encoder has transmitted only bits,
and the decoder can correctly reconstruct , an estimate within
of the source . In the absence of at the decoder, the en-
coder would have had to transmit bits in order to send
the index of the quantized level. This shows that the presence of
the side-information can be used to reduce the required trans-
mission rate for meeting a target distortion. Further, observe that
if the decoder had merely used as an estimate of , then
by definition that estimate could have been as far as from the
source . Hence, the encoder, by sending the label of the quan-
tized source, has reduced the maximum possible distortion at
the decoder by a factor of two. It should be noted that in this
example we have simply chosen to use as the best estimate of
. In reality, the decoder can use both and to compute an
optimal estimate of (using the joint statistics of ). We
have omitted this step here, as our intention was to highlight the
gains that are achieved by transmitting the label (coset member-
ship information) to the decoder.
Fig. 7. General secret-key cryptosystem: A message source X of block
length n is encrypted to a bitstream B using a secret key T available to the
decoder through a secure channel. An eavesdropper has access to the bitstream
B which is being sent over a public channel operating at rate R bits per
source symbol. The goal is to design the system so that the decoder can recover
the message source to an acceptable fidelity while providing security against
eavesdropping and being efficient about utilizing system resources such as the
bandwidth of the public channel and the cardinality of the key alphabet.
In our example, we have used a scalar quantizer and the
encoder computed the label of the quantized source via a
very simple idea of alternating the levels with three labels.
In practice, we can achieve better performance by replacing
both of these methods with more complex alternatives, such
as nested linear codes. For example, the encoder can quantize
a sequence of source samples with a trellis-coded quantiza-
tion (TCQ) scheme [5]. Then it can find the syndrome of the
quantized sequence with respect to a trellis-coded modulation
(TCM) scheme [6] and transmit that syndrome. The correlation
structure between and governs the amount of redundancy
that we require in these codes. Practical code constructions for
the distributed source-coding problem can be found in [3].
III. INFORMATION-THEORETIC SECURITY
In this section, we set up the problem of secure and band-
width efficient transmission of messages over channels where an
eavesdropper can listen to the messages being transmitted. We
formalize the notion of information-theoretic security and es-
tablish performance limits for key size, secrecy, and compress-
ibility for general cryptosystems. Fig. 7 shows a general model
of a secret-key cryptosystem. A (stochastic) message source
taking values in a source alphabet is encoded into a bitstream
in blocks of suitable length using a cryptographic key
. The source alphabet can be arbitrary, i.e., discrete or con-
tinuous, unless otherwise noted. For simplicity, we assume that

the source sequence , is independent and identi-
cally distributed (i.i.d) with distribution on alphabet .
Our results can, in principle, be extended to more general situa-
tions, e.g., for stationary ergodic sources. The key is a random
variable taking values in a finite alphabet independent of
the message source. The secret key is known to the decoder
through a secure channel. The encoding takes place through
a rate- bits per source symbol (bits/symbol) en-
coding function . The encoded mes-
sage bitstream of rate bits/symbol is sent to the decoder
through an insecure public channel which is effectively noise-
less.1 The encoding operation should be such that the decoder
can recover an estimate of the source message in a recon-
struction alphabet , to an acceptable degree of fidelity, using
and . The decoding takes place through a decoding function
.
Definition 3.1 (Cryptosystem): A cryptosystem of rate
and block length is a triple consisting of i) a finite
secret-key alphabet with an associated key distribution;2 ii)
a rate- encoder map ; and iii)
a decoder map .
Associated with the source and reconstruction alphabets ,
is a per-symbol nonnegative distortion criterion
. The distortion criterion for a pair of -length sequences
belonging to and , respectively, is taken to be additive in the
components, i.e., . The rate-
distortion function of the source is the minimum number of bits
per symbol (bits/symbol) needed to index reconstructions of the
source so that the expected per-symbol distortion is no more
than [4, Ch. 13]. We denote the rate-distortion function of
the message source by bits/symbol.
An eavesdropper has access to the public channel and strives
to recover information about the message source from the en-
coded bitstream . The goal is to design an encoder and a de-
coder such that an eavesdropper who has access to the public
channel bitstream , but not the key , learns as little as pos-
sible about the message source on the average. The idea is to
provide secrecy against ciphertext-only attacks.
Associated with such a cryptosystem are several inter-related
design and performance parameters of interest that one would
like to optimize:
1) measure of secrecy against eavesdropping discussed
below;
2) measure of the fidelity of the source reconstruction at the
decoder given by ;
3) number of bits per source symbol transmitted over the
public channel given by ;
4) number of bits of “randomness” or “uncertainty” in the se-
cret key as measured by the bits per source symbol needed
to index the key. This is related to the cardinality of the
key-alphabet . A more “random” key would impose, in
general, a “greater” burden on the resources of the secure
key-distribution channel.
1The effects of channel noise, which can be dealt with using error-correcting
channel codes, are not considered to be part of the cryptosystem in this work.
2We do not explicitly include the key distribution as part of the definition of
the cryptosystem because as will become clear in the sequel, “good” cryptosys-
tems will have a uniform key distribution.
A good system provides maximum secrecy with maximum fi-
delity using the least amount of system resources, i.e., the min-
imum number of bits/symbol and the smallest key-alphabet
necessary.
A. Notion of Perfect Secrecy
In his 1949 paper [7], Shannon provided the first rigorous sta-
tistical treatment of secrecy. The idea is that an eavesdropper
will learn nothing at all about the message source if the encoded
bitstream is statistically independent of the source messages.
An information-theoretic measure of the extent of the corre-
lation between two random quantities introduced by Shannon
is their mutual information [4, p. 18]. The larger the mutual
information, the greater is the correlation. Mutual information
is nonnegative and is zero if and only if the two associated
random quantities are statistically independent. According to
Shannon, a cryptosystem has (Shannon-sense) perfect secrecy
if the encoded bitstream is statistically independent of the
message source (when the secret key is unavailable), i.e.,
if . In [8], Wyner introduced the following defini-
tion of perfect secrecy that we shall be using in this work.
Definition 3.2 (Measure of secrecy and Wyner-sense per-
fect secrecy): An information-theoretic measure of secrecy of
a rate- cryptosystem of block length is given by
, where stands for mutual information. A
sequence of rate- cryptosystems is said to
have Wyner-sense perfect secrecy if
We would like to point out that this is weaker than Shannon’s
definition of perfect secrecy because the independence between
the source messages and the encoded bitstream holds only
asymptotically as the block length goes to infinity. Shannon’s
definition is nonasymptotic, i.e., should hold for
every . In [9], Maurer proposed another asymptotic notion
of perfect secrecy which is stronger than Wyner’s definition,
but weaker than Shannon’s. According to this notion, a se-
quence of cryptosystems has (Maurer-sense) perfect secrecy
if . We do not know if our results will
continue to hold under this stronger notion of perfect secrecy.
However, the techniques that have been developed in [10] sug-
gest that our results can be strengthened. Also see Remark 4.7.
An information-theoretic measure of the amount of “uncer-
tainty” or randomness in the key is the compressibility of the key
in bits per source symbol. This is governed by the entropy of the
key per source symbol: [4, p. 13 and Ch. 5], which
represents the minimum number of bits/symbol that would have
to be “supported” by the secure key-distribution channel. It turns
out that with equality if and only
if all the values of the key are equally likely, [4, Theor. 2.6.4,
p. 27]. Thus, key randomness directly impacts the cardinality of
the key-alphabet needed.
The following theorem reveals certain important aspects
of the tradeoff between the various performance parameters
of a cryptosystem that strives to achieve maximum secrecy
with maximum efficiency for a maximum tolerable expected
distortion. It is a straightforward extension of a similar result by

Fig. 8. Shannon cryptosystem: Shannon’s cryptosystem is efficient and achieves Shannon-sense perfect secrecy with expected distortion D with the smallest
key alphabet.
Shannon [7] that assumed lossless recovery (zero distortion) of
the source at the decoder. The proof of the theorem is presented
in Appendix A and applies to both discrete and continuous
alphabet sources.
Theorem 3.3: For a sequence of rate- cryptosys-
tems where the key is drawn inde-
pendently of the source, if and
then and
.
Thus, in any cryptosystem that provides perfect secrecy (in
the Shannon, Wyner, or Maurer sense) with expected distortion
, the key-alphabet must grow with block length at least as
fast as . Hence, there must be at least as many binary
digits in the secret key as there are bits of information in the
compressed message source if the cryptosystem provides per-
fect secrecy (in the Shannon, Wyner, or Maurer sense) with ex-
pected distortion less than or equal to . Intuitively, it is clear
that if the key is chosen independently of the message source
and the decoder is able to reconstruct the source to within an
expected distortion , the encoded bitstream rate cannot be
smaller than : the smallest number of bits needed to re-
construct the source with an expected distortion no more than
. A cryptosystem is efficient if it operates at a rate close to
bits/symbol, using a key-alphabet whose size is close to
and achieves an expected distortion less than or equal
to with almost perfect secrecy (in the Shannon, Wyner, or
Maurer sense).
The question of whether there exists an efficient cryptosystem
having the smallest possible key-alphabet that provides per-
fect secrecy with maximum expected distortion was also
answered by Shannon for the case when and involved
the idea of separating the performance requirements into
two parts: i) Efficient utilization of system resources through
optimal source compression and ii) encryption of the com-
pressed bitstream with a Vernam one-time pad [a Bernoulli(1/2)
bitstream]. A slightly general version of Shannon’s solution
involving nonzero distortion is shown in Fig. 8. Shannon’s
system meets all the four desirable attributes of a cryptosystem
discussed earlier. Clearly, the bit rate of the output bitstream
is . The expected distortion between
and is no more than because the decoder successfully
recovers the rate-distortion compressed bitstream . Since is
assumed to be uniformly distributed over its alphabet in Fig. 8,
the entropy of the key (and the size of the key-alphabet) in bits
per source symbol is . Since and are inde-
pendent, so are and . Hence,
XOR XOR ,
which does not depend on the value that takes. Thus, the
bitstreams and are independent without the key . Since
form a Markov chain, by the data processing
inequality3, , i.e., the cryptosystem
achieves Shannon-sense perfect secrecy for each . We would
like to note that in practice, the Vernam one-time pad would
be simulated by a pseudorandom sequence and the seed of the
pseudorandom generator would play the role of the key that is
shared by the sender of messages and the intended recipient.
IV. COMPRESSION OF ENCRYPTED DATA
As motivated in the introduction, an interesting question that
arises in the context of Shannon’s cryptosystem above is if it is
possible to swap the operations of encryption and compression
in a way that the resulting system continues to function as a good
cryptosystem. To “encrypt” the source data “directly” before
any compression, we need a notion of “addition” on a general
alphabet similar to the XOR operation (modulo two addition)
on binary data. Let be endowed with a binary operator . The
salient properties of the XOR operation on binary data that make
things work are captured by the following requirements on the
tuple 4: For all
i)
ii)
Consider the system shown in Fig. 9. In this system, the secret
key-word is selected randomly from the secret-key code-
book of size according to a uniform distribu-
tion independent of the source sequence . Let
be the random variable (seed or key) which indicates which
3This essentially states that successive stages of “processing” cannot increase
the statistical correlation between the “processed” data and the raw data as mea-
sured by their mutual information. Specifically, if three random variables X ,
Y , and Z form a Markov chain X 0 Y 0 Z then I(X; Z) I(Y ; Z) and
I(X; Z) I(X; Y ) [4, p. 32].
4A tuple (X; 8) satisfying these requirements is called a commutative quasi-
group when X is a finite set [11].

Fig. 9. Reversed cryptosystem: A cryptosystem where “encryption”
precedes compression.
key-word was selected. Note that . This is directly
“added” to the source sequence to produce the “encrypted”
sequence where the “addition” is component-
wise. Let denote the encoded message bitstream
produced by the encoder. The decoder produces the reconstruc-
tion . The average, per-component distortion is
. The “encrypted” sequence is compressed a la
Wyner-Ziv (W-Z) [2] by exploiting the fact that the key-word,
which is available to the decoder, is related to . This leads us
to the following definition.
Definition 4.1 (cryptographic Wyner–Ziv source code): A
rate- cryptographic Wyner–Ziv source code of block
length is a triple consisting of i) a secret-key
codebook such that ; ii) an en-
coder map ; and iii) a decoder map
.
As in general cryptosystems, a sequence of cryptographic
W-Z source codes is said to have Wyner-
sense perfect secrecy if .
Definition 4.2: The triplet is said to be achiev-
able with Wyner-sense perfect secrecy if there exists a sequence
of rate- cryptographic W-Z codes having Wyner-sense
perfect secrecy such that
The first parameter in the triplet represents the
encryption efficiency of the cryptosystem, i.e., the number of
bits of randomness in the key per source symbol which has direct
bearing on the size of the key codebook. The second parameter
represents the compression efficiency of the cryptosystem,
i.e., the number of bits of the output bitstream per symbol of the
message source generated by the cryptosystem. The following
theorem tells us what sort of encryption and compression rates,
and , respectively, can definitely be achieved using
a cryptosystem having the structure of Fig. 9 for a source re-
construction quality while being able to achieve Wyner-sense
perfect secrecy by using progressively longer block lengths for
coding. The two corollaries following the theorem show that it is
possible to compress encrypted data without any loss of encryp-
tion or compression efficiency with respect to a system where
compression precedes encryption. These results constitute the
main theoretical contribution of this work.
Theorem 4.3: Let and be drawn independently and
i.i.d with the (common) distribution , , and
(2)
where is an auxiliary random variable taking values in an
alphabet and the minimization is over all conditional prob-
ability distributions , with forming a
Markov chain, and all functions such that
Then, is achievable with Wyner-sense
perfect secrecy.
The proof of this theorem for finite alphabets is presented in
Appendix B. The proof for continuous alphabets (e.g., Gaussian
sources) and unbounded distortion criteria (e.g., mean squared
error) can be established along similar lines using the techniques
in [12], [13]. We would like to note that there is no fundamental
difficulty in carrying out this proof. The associated technical
aspects are definitely important and nontrivial but only detract
from the main concepts underlying the proof.
Theorem 4.3 tells us that the triple
is achievable with Wyner-sense perfect secrecy by cryptosys-
tems having the structure shown in Fig. 9 but are there better
encryption and compression rates that can be realized on these
cryptosystems at the same distortion while being able to
achieve Wyner-sense perfect secrecy?
Remark 4.4: It can be shown that the achievable perfor-
mance given by Theorem 4.3 is also the best possible for any
cryptosystem having the structure shown in Fig. 9, i.e., any
system having this structure needs a rate of at least
bits per source symbol to achieve Wyner-sense perfect secrecy
and expected distortion . The proof (omitted here) is along
the lines of the proof of the optimality of W-Z distributed source
coding in [2], [4].
For general distortion criteria and source distributions the
W-Z cryptosystems can suffer from some loss of compression
efficiency, i.e., , (but no loss of Wyner-
sense perfect secrecy) with respect to the Shannontype cryp-
tosystems. However, as discussed below, in two important cases
of interest, W-Z cryptosystems are efficient.
Corollary 4.5 (Zero distortion, i.e., lossless recovery of
data): If are countable alphabets and the distortion
criterion satisfies , and ,
then is achievable with Wyner-sense perfect
secrecy. Furthermore, cannot be improved
upon by any cryptosystem (not necessarily having the structure
of Fig. 9) when it is required that the message source be recov-
ered losslessly, i.e., .
Proof: The achievability can be proven from Theorem 4.3
along the lines of Remark 3 in [2, p. 3], where it is shown that
. Since [4, Ch.
5], cannot be improved for lossless recovery
of the source as per Theorem 3.3.
Corollary 4.6 (Gaussian sources): When is Gaussian,
i.e., , , and the distortion
criterion is squared-error, i.e., , then

is achievable with Wyner-sense perfect
secrecy by a W-Z cryptosystem for any target distortion .
Hence, W-Z cryptosystems are optimal in every sense for
Gaussian sources and squared error distortion.
Proof: Here, ,
is the rate-distortion function of a Gaussian source with
variance [4, p. 344]. Achievability follows from The-
orem 4.3 by choosing , ,
, and , where
is chosen such that , i.e.,
. With these choices it can be shown
that, . The optimality follows
again from Theorem 3.3.
The above corollary shows that for Gaussian sources, cryp-
tographic Wyner–Ziv systems are as efficient as source-coding-
followed-by-encryption systems in terms of compression rate
and the requirements on the secret key.
Remark 4.7: For finite alphabets, it is possible to guarantee
the stronger notion of Shannon-sense perfect secrecy for the
system of Fig. 9 if one is willing to sacrifice key-efficiency
(measured by ). Specifically, let the variable in Theorem
4.3 be distributed according to Uniform instead of ,
and let denote the corresponding rate as in (2). Then
is achievable with Shannon-sense
perfect secrecy. The proof of this result is along the same lines
as that of Theorem 4.3 (see Appendix B). The only additional
condition that needs to be checked is if Shannon-sense perfect
secrecy is attainable. This is verified by an argument which
parallels the one for the Shannon cryptosystem of Fig. 8.
Example: With reference to Fig. 9, let ,
and .
Hence, is a correlated sequence of 3 bits where it is known
that at most 1 bit of is equal to one. Clearly,
bits, , and is a Vernam
one-time pad with uniformly distributed over . For
the encryption system of Fig. 9, (XOR) so that
and differ in at most one out of their three bits. Hence, if the
coset-codebook of Fig. 4 is used for the compression box of
Fig. 9, identifying and in Fig. 4, respectively, with and
, only 2 bits (equal to ) are needed to represent (the
output of ), even though the compression box does not have
access to the secret key . Here, is the index of the coset to
which belongs. The decoder first recovers by finding
the 3-bit codeword in the coset indexed by which is closest to
the key available to it. Finally, is recovered from and
as (XOR) . Hence, the compression and se-
crecy performance of this system matches that of the Shannon
cryptosystem of Fig. 8, where is first compressed to 2 bits
and then encrypted with a Vernam one-time pad. However, the
Shannon system is more efficient in terms of the length of the
Vernam one-time pad needed. The Shannon cryptosystem needs
a one-time pad of length two whereas the system of Fig. 9 needs
a one-time pad of length three.
So far, we have considered cryptosystems where we had con-
trol over the design of both the encryption and the compression
components. An interesting question is: how much compres-
sion can we achieve if the encryption scheme is pre-specified
by some user? Let us look at this situation in more detail for the
case when the source is required to be reproduced at the decoder
losslessly. Let be a pre-designed encryption
map which is parameterized by a secret key . We only require
that there be a corresponding decryption map ,
such that is the identity map. As before lets assume
that the components of the message sequence are indepen-
dent indentically distributed (i.i.d.) with distribution . The
pre-specified encryption box can in general produce an output
sequence whose components are correlated across time. To
compress such a sequence, one would need to exploit the de-
pendence between the output and the key that is available to the
decoder. We know how to do this for the case when the output is
i.i.d. by using the Slepian–Wolf distributed source-coding the-
orem [1]. The results of Slepian and Wolf can be generalized to
the case where the input to the distributed compression box has
memory [14]. This would give us an encoding-decoding scheme
which would work at the entropy-rate of the unencrypted source,
, but still recover with high probability for sufficiently
large . But what happens to the secrecy performance if we con-
catenate a pre-specified encryption box (which outputs ) with
the generalized Slepian-Wolf compression box (which outputs
) that has been tailored to provide optimal compression per-
formance? By the data processing inequality (cf. last footnote
of Section III) we have . Very often, the
inequality is strict. Thus, compression will preserve, if not en-
hance, the information theoretic secrecy.
V. COMPUTER SIMULATION RESULTS
Up to this point, our focus has been on the theoretical as-
pects of the problem of compressing encrypted data, in partic-
ular on the performance that can be theoretically achieved. In
this section, we consider real systems that implement the com-
pression of encrypted data. We discuss the codes used to con-
struct such systems and give computer simulations of their com-
pression performance. We will describe systems for both loss-
less and lossy compression.
A. Lossless Compression of Encrypted Bilevel Images
In the following example, the bilevel image in Fig. 10 is en-
crypted and then compressed. For the purpose of illustration,
this image is treated as an i.i.d. string of 10 000 binary digits (the
image is of size 100 100 pixels where filled pixels correspond
to a one, and unfilled pixels correspond to a zero) disregarding
any spatial correlation or structure that is evidently present in
such a “natural” image. Thus, for the purpose of this example,
the source is not an image but is represented as such in order
to aid the readers’ understanding, and shall henceforth be re-
ferred to as a string to highlight this fact. It is possible to design
distributed source codes that can exploit the spatial correlation
structures in natural images, much like the Lempel–Ziv algo-
rithm [15] and its variants exploit context information for com-
pressing files. However, this is beyond the scope of this work.
The methods used in these examples were developed specifi-
cally in [16], but are strongly related to a significant body of
work [17]–[21].
The string that is depicted as an image in Fig. 10 has 706
nonzero entries corresponding to an empirical first-order en-

Fig. 10. Bilevel image used in the computer simulation: For the purpose of
display, the bit 1 is mapped to the grayscale value 0 and the bit 0 is mapped to the
grayscale value 255. “Natural” images, such as the cartoon shown here, have
considerable memory (correlation) across spatial locations as evidenced by the
presence of significant 2-D structure that is easily recognized by humans. For the
purpose of simulation though, the pixel values (taken in the raster-scan order)
are assumed to be i.i.d. Bernoulli random variables. The image has 706 nonzero
entries corresponding to an empirical first-order entropy of about H(X) =
0:37 bits/pixel.
tropy of about bits/pixel. The string is encrypted
by adding a unique pseudorandom Bernoulli(1/2) string of the
appropriate length. The encrypted string has an empirical first-
order entropy of about bit/pixel. A traditional com-
pression approach, which treats the data as originating from an
i.i.d. binary source, would consider the encrypted string to be
incompressible.
The encrypted string is compressed by finding its syndrome
with respect to a rate-1/2 low-density parity-check (LDPC)
channel code [16]. That is, a string of length is multiplied by
a code’s parity check matrix of dimension
to obtain an output string of length . Thus, a
LDPC code is used to compress an encrypted string to rate
. Via this multiplication, the encrypted code space is
broken into cosets. These cosets consist of all encrypted strings
with the same syndrome with respect to the chosen LDPC
code, and the cosets are indexed by that common syndrome.
By breaking the space into cosets in this manner, we insure that
in each coset there will be only one element which is jointly
typical with the key.
At the receiver, the compressed data is decoded with a
DISCUS-style Slepian–Wolf decoder [3] by using the key
bit-sequence as side information. The decoder makes use of
the fact that the encrypted source bit-sequence and the key are
correlated. That is, the key can be seen as a noisy version of
the encrypted sequence. Under this view the goal of decoding
can be seen as finding the nearest codeword to the key residing
within the coset specified by the compressed encrypted se-
quence. Knowledge of the correlation between the encrypted
string and the key (which is equivalent to knowledge of the
source statistics) and the syndrome (bin-index or coset-index)
of the encoded data is exploited by a belief propagation al-
gorithm [22], [23] to recover exactly the encrypted sequence.
Belief propagation is an iterative algorithm that operates over
graphical models and converges upon the marginal distribu-
tions for each of the unknown bits, from which the bits can
be estimated. The algorithm is exact over trees, but in practice
performs quite well for sparsely loopy graphs, such as LDPC
codes. The instance of belief propagation used is nearly iden-
tical to that used for decoding standard LDPC codes, but with
some adaptations. First, the check-node update rule is modified
to incorporate the knowledge of the syndrome of the encrypted
word. Second, initial marginal distributions of the encrypted
bits are obtained based on the knowledge of the key and its
correlation to the encrypted string. Finally, with knowledge of
the key and the encrypted sequence the decryption is a trivial
matter and is considered to be a part of the decoding process.
Using this algorithm, the string in Fig. 11 is perfectly decoded
in 13 iterations. Samples of the best estimate at each stage of
the iterative algorithm are provided in Fig. 12.
B. Lossy Compression of Encrypted Real-Valued Data
In this section, we provide simulations of the compression
of an encrypted real-valued data source. In these experiments,
the data was an i.i.d. Gaussian sequence with variance 1.0. The
data was encrypted with a stream cipher. A key sequence, of
the same length as the data sequence, was added to the data
on a sample-by-sample basis. The key was an i.i.d. Gaussian
sequence, independent of the data. Our simulations show the
compression performance of the scheme as a function of the
variance of the key sequence.
Clearly, an i.i.d. Gaussian sequence is not a good model for
real world signals such as natural images. However, more com-
plex models that incorporate Gaussian variables, such as cas-
cades of Gaussian scale mixtures [24], have been shown to be
good models of natural signals. While this work focuses on the
problem of compressing encrypted data and not modeling of sig-
nals, we believe that constructing codes for an i.i.d. Gaussian
sequence is an initial step toward developing a system that can
be used with a more complicated source.
Our encoder compresses the encrypted data to a rate of 1
bit/sample. In the first stage of the encoder, each sample in the
encrypted data sequence is quantized with a scalar quantizer. We
will provide simulation results for three different values of the
step size of the scalar quantizer. The reconstruction levels of the
scalar quantizer are labeled with numbers in the set ,
with the labels assigned to the reconstruction levels in a cyclic
manner. Each quantized sample is then replaced with the 2-bit
binary representation of its label, resulting in a binary sequence
that is twice as long as the original real-valued data sequence.
Finally, we find the syndrome of this binary sequence with re-
spect to a rate 1/2 trellis code [6]. The syndrome is the output
of the encoder, which is transmitted to the decoder. In our sim-
ulations, we used a 64–state trellis code in the encoder. Since
we use a rate 1/2 code, the length of the syndrome is half of the
length of the binary input. Hence, the syndrome is a binary se-
quence of the same length as the encrypted data sequence. The
encrypted data has been compressed by the scalar quantizer and
trellis code to the rate of 1 bit/sample.

Fig. 11. Compressing encrypted images, example: An image (at left) is first encrypted by adding a Bernoulli(1/2) bit-sequence generated by a pseudorandom
key to produce the second image. The result is then compressed by a factor of two using practical distributed source codes developed in [16] to produce the third
compressed and encrypted “image” bitstream. For the purpose of display, the encrypted and compressed bitstream has been arranged into the rectangular shape
shown here. Finally, the compressed bits are simultaneously decompressed and decrypted using an iterative decoding algorithm provided in [16] to obtain the last
image. The decoded image is identical to the original.
Fig. 12. Convergence of decoded estimate: The best estimate of the image at the end of the specified number of iterations at the decoder (cf. Fig. 11). Clearly,
the initial estimate is quite grainy, but converges rapidly towards the solution.
The decoder has access to the syndrome transmitted by the
encoder, as well as the key sequence used in the stream cipher.
The decoder considers the set of real-valued sequences which
take on values from the set of reconstruction levels of the scalar
quantizer. The decoder looks at the subset of such sequences
whose syndrome is the same as the syndrome sent by the en-
coder, and then finds the sequence in that subset which is closest
to the key sequence. At this point, the decoder has two estimates
of the encrypted data sequence. It has the output of the trellis
decoder and it has the key sequence, which can be thought of
as a noisy version of the encrypted data where the noise is the
original, unencrypted data. The decoder combines these two es-
timates to form the optimal estimate of the encrypted data. Fi-
nally, it subtracts the key sequence to obtain the optimal estimate
of the original data.
Our simulations measure the performance of our scheme by
computing the distortion and the probability of error in the trellis
decoder as a function of the variance of the key sequence. For
each value of the key variance, we ran 500 trials, where each trial
consisted of a block of 2000 symbols. We present plots of the
mean squared error distortion in Fig. 13(a) and of the probability
of error in the trellis decoder in Fig. 13(b) versus the variance
of the key sequence. On each plot there are three lines, which
represent the performance for three different scalar quantizer
step sizes. The plots show that the distortion and probability of
error do not change as we change the variance of the key. The
performance of our encoder/decoder pair depends only on the
source and not on the side information.
We note that, because the data has a variance of 1.0 and we are
compressing it to a rate of 1 bit/sample, the minimum possible
distortion is 0.25. This result follows from standard rate-distor-
tion theory [4]. The distortions that we achieved for the various
step sizes are in the range of 0.5–0.6, which is about 3–3.8 dB
above the rate-distortion minimum. In these experiments, the bit
error rate was in the range of to . The goal of these
simulations was to show that we can compress the encrypted
data with the same efficiency, regardless of the key sequence.
In particular, the variance of the key sequence can be chosen as

Fig. 13. Compression of encrypted Gaussian data: An i.i.d. Gaussian data
sequence, with variance 1.0, is encrypted with an i.i.d. Gaussian key sequence,
with variance as indicated by the horizontal axis, and then compressed. The
three lines indicate three different quantizer step sizes used in the compressor.
(a) Mean-squared error distortion as a function of key variance. (b) Probability
of decoding error in the trellis as a function of key variance.
a function of the security requirements of the system, and the
compression gain will not be affected. The performance of our
scheme depends only on the statistics of the source, not the key.
Our aim was not to compress the encrypted data to the bound
provided by the Wyner–Ziv theorem, but to demonstrate that
increasing the variance of the key sequence does not affect the
distortion or probability of decoding error. In order to compress
the source to a distortion closer to the bound, it would be neces-
sary to use a more powerful channel code in our scheme, such
as the codes described in [25].
VI. CONCLUDING REMARKS
In this work, we have examined the possibility of first en-
crypting a data stream and then compressing it, such that the
compressor does not have knowledge of the encryption key.
The encrypted data can be compressed using distributed source-
coding principles, because the key will be available at the de-
coder. We have shown that under some conditions the encrypted
data can be compressed to the same rate as the original, unen-
crypted data could have been compressed. We have also shown
that our system can still have perfect secrecy in some scenarios.
We have presented numerical results showing that we can in fact
recover the original source when we compress the encrypted
data.
In the future, we plan to extend our security analysis to the
area of cryptographic security. This involves the study of the
situation where the key sequence is not truly random, but instead
is generated by a pseudorandom generator from a short seed.
We are examining the relationship between the computational
resources available to an attacker and the probability that such
an attacker can break the encryption scheme.
We are also working on the related problem of image hashing.
In this problem, it is desired to map an image to a binary hash
value in a secure manner. This mapping should capture the per-
ceptual aspects of the image, so that two images which appear
nearly identical will have the same hash value. Although the
hashing problem is not the same as a communication problem,
we are also using distributed source coding to provide security
in the context of image hash functions.
APPENDIX A
PROOF OF THEOREM 3.3
The following proofs rely on standard techniques and results
in information theory. Most of these results are basic properties
of information quantities like entropy, mutual information, and
the rate-distortion function that are available in [4]. For the ben-
efit of those who are not familiar with these results, we have
attempted to explain most of the results that are used, to the ex-
tent that they can be found in [4] for further clarification.
The rate-distortion function of a source is the minimum rate,
in bits per source symbol, needed to index reconstructions of the
source to achieve an expected distortion less than or equal to .
For an i.i.d. source , Shannon showed that is
given by
(A.1)
where the minimization is over all conditional distributions
. The function is convex, continuous, and non-
increasing for [12]. Let and
, with . We have

because and is nonincreasing in ;
and is nonincreasing and continuous in
; is a convex function; (A.1); chain-rule
for entropy, the source is i.i.d., and unconditioning can only
increase the entropy [4, p. 27]; form a
Markov chain and the last footnote of Section III; chain-rule
for mutual information [4, p. 22]; cryptosystem has perfect
secrecy (in the Shannon, Wyner, or Maurer sense); defini-
tion of conditional mutual information [4, p. 22]; takes
values in a finite alphabet for each , i.e., is finite for
each , and un-conditioning can only increase the entropy;
A source taking values on a finite alphabet has maximum en-
tropy if and only if it is uniformly distributed over its alphabet
and the maximum entropy in bits equals the logarithm to the
base two of the cardinality of the alphabet [4, p. 27]. Hence,
with equality if and only if all
the values of the key are equally likely.
Similarly, by interchanging the roles of and above and
reasoning in a similar manner, we obtain
because and are independent and by design,
takes values in a finite alphabet of cardinality .
APPENDIX B
PROOF OF THEOREM 4.3 FOR FINITE ALPHABETS
For the sake of simplicity, we prove the theorem only for fi-
nite alphabets (i.e., , , ). It turns out that when
working with finite alphabets, it is sufficient to consider aux-
iliary random variables taking values in a finite alphabet
with cardinality for the minimization in (2) [2].
The proof for continuous alphabets (for Gaussian sources for ex-
ample) and unbounded distortion criteria (such as mean squared
error) can be established along similar lines using the notion of
weakly typical sequences [4, p. 51] and the techniques in [12]
and [13].
We shall make use of the following notion of strongly typical
sequences in our proof for finite alphabets.
Definition (Strongly typical sequences [4]): A triple
of sequences is said to be
jointly, -strongly typical with respect to (w.r.t.) a distribution
on , if for all
and whenever .
Here, is the number of occurrences of the
triple in the triple of sequences . The collec-
tion of all triples of sequences
that are jointly -strongly typical w.r.t. is de-
noted by . Similarly, denotes the
collection of all pairs of jointly -strongly typical sequences of
length under , etc. From the definition, it follows
that
, etc. Loosely speaking, a tuple of sequences is
jointly strongly typical w.r.t. a specified joint probability distri-
bution if their empirical joint distributions are close to the spec-
ified distribution at all points. The phrase “ -strongly typical”
shall often be abbreviated to “typical.”
Let be an upper bound on the values that can
take. Let be independent of and have identical distribu-
tion (i.e., ), and
as in the statement of the theorem. Fix the function
and to be the minimizers in (2). The marginal dis-
tribution of the auxiliary random variable is given by
. Fix . Note that since
form a Markov chain, .
Random codebook construction: Let
and . Generate
bins of independent codewords indexed by
by drawing i.i.d. codeword components ac-
cording to the distribution , with .
The total number of independent codewords generated in this
manner is which lie in bins. The set of all
codewords and the bins together constitute the Wyner–Ziv
(W-Z) codebook. Let . Generate independent
key codewords indexed by by drawing i.i.d.
key-word components according to the distribution .
This is the secret key (SK) codebook . The W-Z and SK
codebooks are known to the decoder. A (random) (W-Z,SK)
codebook pair shall be denoted by .
Encryption: The source sequence is “added” to the secret
key sequence to produce the encrypted sequence
.
Compression: If there exists a codeword such that
, the jointly -strongly typical set of
-length vectors, then the sequence is encoded to the
codeword . If no such codeword exists, the encoding is
said to have failed and an arbitrary codeword, say ,
is selected. If more than one codeword exists, any one of them
is chosen. In any case, the index of the bin to
which the selected codeword belongs is sent to the decoder.
Decoding: The decoder tries to find a codeword in the bin
indicated by the encoder such that .
If the decoder succeeds in finding such a unique , the en-
crypted sequence is reconstructed as where

is applied component-wise. If no such unique pair exists, is
set to an arbitrary sequence.
Key recovery: Presented with and a hypothetical key-
decoder (or an eavesdropper having access to pairs of inputs
and outputs of the encryption-compression system) can attempt
to recover using the following procedure: Find the unique
pair with and belonging to the bin of
the W-Z codebook indicated by such that
. If such a pair exists, declare the key estimate
to be . If no such pair exists, set the key estimate to an
arbitrary value, say . Please note that The key-decoder
is not part of the original problem and is purely hypothetical.
This is only a proof technique used for showing the existence
of codes that can also meet the Wyner-sense perfect secrecy
requirements. The hypothetical key-decoder is separate from the
decoder for the source messages. The key-decoder has access to
and . In contrast, the message decoder has access to and
but not .
Existence of “good” (W-Z,SK) codebook pairs: Let
denote the event that the decoder fails to find a such that
is jointly -strongly typical. Let denote the
event that the key is not recovered by the key-decoder given the
bin index and the source sequence. Let . It shall be
shown shortly that, when averaged across the choice of random
W-Z and SK codebooks and sufficiently large , the probability
of can be made arbitrarily small. In particular, it shall be
shown that for any , for sufficiently
large. Since , there is at least one
(W-Z,SK) codebook pair for which . Note
that this probability is with respect to the source distribution
and a uniform distribution on . We call such a codebook pair
a “good” (W-Z,SK) codebook pair.
The expected distortion and the Wyner-sense perfect secrecy
properties of “good” codebooks are now analyzed under the as-
sumption that for sufficiently large.
Distortion analysis for a “good” (W-Z,SK) codebook pair:
When is jointly -strongly typical
where is the count of the number of triples
that occur in . In the third step above, we used
and
, which follows directly from the definition of
-strong typicality. Therefore, for a “good” (W-Z,SK) codebook
pair , the average distortion can be upper bounded as follows:
According to our earlier notation, the conditioning within the
expectation operator on a specific codebook pair such as
above should have been implicit. In the above derivation, we
have abused the notation and made the conditioning explicit for
clarity.
Secrecy analysis for a “good” (W-Z,SK) codebook pair:
We have
because is a function of and , hence
[4] and is independent of . According to Fano’s in-
equality [4, p. 38]
Key recovery failure
Thus, will be less than for sufficiently
large . Also, . Hence, .
Therefore, by choosing , we would obtain a sequence of
“good” codebook pairs that would asymptotically (as goes to
zero goes to infinity) achieve Wyner-sense perfect secrecy.
Analysis of the probability of error. It now remains to
show that the probability of can be made smaller than by
choosing sufficiently large. To show this, will be expressed
as the union of four subevents described below.
Note that the error sub-events describe encoding
and decoding errors while the error event describes a security
violation. It will be shown that the probability of each subevent
and can be made smaller than by choosing sufficiently
large.
and are not jointly typical w.r.t.
the joint distribution of and : By the asymptotic
equipartition theorem for strongly typical sequences [4,
Lemma 13.6.1 p. 359] the probability of this event can
be made smaller than any specified by choosing
sufficiently large.
and are jointly typical w.r.t. but
encoding fails because there is no codeword in the
W-Z codebook such that is jointly typical
w.r.t. : By the proof of the rate-distortion
theorem using strongly typical sequences and the choice of
, the probability of this event can be made smaller than
any specified by choosing sufficiently large [4, pp.
361 and 362].
is jointly typical w.r.t. , but
is not jointly typical w.r.t. :
By the Markov lemma [4, p. 436], the probability of this
event can be made smaller than any specified by
choosing sufficiently large.

is typical w.r.t. , but there is another
codeword in the same bin as , such that
: For sufficiently large ,
the probability that a single randomly and independently
chosen codeword is jointly strongly typical with
is no more than [4, Lemma 13.6.2
and Equation 14.310]. There are codewords
in each bin. Hence, the probability of this event can
be upper bounded by . Since
, this probability can be made
smaller than any specified by choosing sufficiently
large.
There is another codeword , chosen
independently of such that
: To see that the probability of this event
can be made arbitrarily small by choosing large enough,
condition over , for an arbitrary
and notice that the conditional probability of the event
can be upperbounded by for any
[4, Lemma 13.6.2 and eq. (14.310)]. Since is inde-
pendent of , . Using the chain-rule for
mutual information [4, p. 22], we have
. Furthermore,
. Hence, . We
also have
where in deriving the last equality above, we used the chain
rule for mutual information and the independence of and
. Hence,
. This implies that the
exponent in the upperbound for the conditional probability
is strictly negative. Thus, the
probability of this error event can be made smaller than any
specified by choosing a sufficiently large .
Since ,
the desired result follows.
ACKNOWLEDGMENT
The authors would like to thank Prof. S. S. Pradhan in the
Electrical Engineering and Computer Science Department, Uni-
versity of Michigan, Ann Arbor, for discussions on distributed
source coding and information-theoretic security. The authors
would also like to thank Prof. D. Wagner of the Computer Sci-
ence Department, University of California, Berkeley, for his dis-
cussions on the ongoing work on computational security. Fi-
nally, the authors also wish to thank the anonymous reviewers
for their encouraging and critical comments which helped to im-
prove the manuscript.
REFERENCES
[1] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information
sources,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 471–480, July
1973.
[2] A. Wyner and J. Ziv, “The rate-distortion function for source coding with
side information at the decoder,” IEEE Trans. Inform. Theory, vol. IT-22,
pp. 1–10, Jan. 1976.
[3] S. S. Pradhan and K. Ramchandran, “Distributed source coding using
syndromes (DISCUS): Design and construction,” IEEE Trans. Inform.
Theory, vol. 49, pp. 626–643, Mar. 2003.
[4] T. M. Cover and J. A. Thomas, Elements of Information Theory. New
York: Wiley, 1991.
[5] M. W. Marcellin and T. R. Fischer, “Trellis coded quantization of mem-
oryless and Gauss-Markov sources,” IEEE Trans. Commun., vol. 38, pp.
82–93, Jan. 1990.
[6] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEE
Trans. Inform. Theory, vol. IT-28, pp. 55–67, Jan. 1982.
[7] C. E. Shannon, “Communication theory of secrecy systems,” Bell Syst.
Tech. J., vol. 28, pp. 656–175, 1949.
[8] A. D. Wyner, “The wire-tap channel,” Bell Syst. Tech. J., vol. 54, no. 8,
pp. 1355–1387, 1975.
[9] U. Maurer, “The strong secret key rate of discrete random triples,” in
Communication and Cryptography—Two Sides of One Tapestry, R.
Blahut, Ed. Norwell, MA: Kluwer, 1994, pp. 271–285.
[10] U. Maurer and S. Wolf, “Information-theoretic key agreement: From
weak to strong secrecy for free,” in Advances in Cryptology—EURO-
CRYPT , vol. 1807, Springer-Verlag Lecture Notes in Computer Science,
B. Preneel, Ed., 2000, pp. 351–368.
[11] J. H. van Lint and R. M. Wilson, A Course in Combinatorics. New
York: Cambridge Univ. Press, 1992.
[12] R. G. Gallager, Information Theory and Reliable Communication.
New York: Wiley, 1968.
[13] Y. Oohama, “The rate-distortion function for the quadratic Gaussian
CEO problem,” IEEE Trans. Inform. Theory, vol. 44, pp. 55–67, May
1998.
[14] T. M. Cover, “A proof of the data compression theorem of Slepian and
Wolf for ergodic sources,” IEEE Trans. Inform. Theory, vol. IT-21, pp.
226–228, Mar. 1975.
[15] J. Ziv and A. Lempel, “A universal algorithm for sequential data com-
pression,” IEEE Trans. Inform. Theory, vol. IT-23, pp. 337–343, May
1977.
[16] D. Schonberg, S. S. Pradhan, and K. Ramchandran, “Distributed code
constructions for the entire Slepian-Wolf rate region for arbitrarily cor-
related sources,” in Proc. 37th Asilomar Conf. Signals, Systems, Com-
puters, vol. 1, Nov. 9–12, 2003, pp. 835–839.
[17] R. Zamir and S. Shamai, “Nested linear/lattice codes for Wyner-Ziv en-
coding,” in Proc. IEEE Information Theory workshop, Killarney, Ire-
land, 1998.
[18] S. S. Pradhan and K. Ramchandran, “Distributed source coding: Sym-
metric rates and applications to sensor networks,” in Proc. Data Com-
pression Conf., Snowbird, UT, Mar. 2000.
[19] J. García-Frías and Y. Zhao, “Compression of correlated binary
sources using turbo codes,” IEEE Commun. Lett., vol. 5, pp. 417–419,
Oct. 2001.
[20] P. Mitran and J. Bajcsy, “Turbo source coding: A noise-robust approach
to data compression,” in IEEE Data Compression Conf., Apr. 2002, p.
465.
[21] A. Liveris, C. Lan, K. Narayanan, Z. Xiong, and C. Georghiades,
“Slepian-Wolf coding of three binary sources using LDPC codes,” in
Proc. Int. Symp. Turbo Codes and Related Topics, Brest, France, Sept.
2003.
[22] R. G. Gallager, “Low density parity check codes,” Ph.D. dissertation,
MIT, Cambridge, MA, 1963.
[23] R. J. McEliece and S. M. Aji, “The generalized distributive law,” IEEE
Trans. Inform. Theory, vol. 46, pp. 325–343, Mar. 2000.
[24] M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky, “Random cas-
cades on wavelet trees and their use in analyzing and modeling nat-
ural images,” Appl. Computat. Harmon. Anal., vol. 11, pp. 89–123, July
2001.
[25] J. Chou, S. S. Pradhan, and K. Ramchandran, “Turbo and trellis-based
constructions for source coding with side information,” in Proc. Data
Compression Conf., Snowbird, UT, Mar. 2003.

Mark Johnson (S’97) received the B.S. degree (with
highest honors) from the University of Illinois at Ur-
bana-Champaign in 2000 and the M.S. degree from
the University of California, Berkeley, in 2003, both
in electrical engineering. He is currently working to-
ward the Ph.D. degree in the Department of Electrical
Engineering and Computer Sciences, University of
California, Berkeley.
His research interests include distributed signal
processing and security in sensor networks.
Mr. Johnson was awarded the Fannie and John
Hertz Foundation Fellowship in 2000.
Prakash Ishwar received the B.Tech. degree in elec-
trical engineering from the Indian Institute of Tech-
nology, Bombay, in 1996, and the M.S. and Ph.D. de-
grees in electrical and computer engineering from the
University of Illinois at Urbana-Champaign in 1998
and 2002, respectively.
Since August 2002, he has been a post-doctoral
Researcher in the Electrical Engineering and Com-
puter Sciences Department, University of California,
Berkeley. He will be joining the faculty of the
Electrical and Computer Engineering Department,
Boston University, Boston, MA, in the Fall of 2004. His recent interests include
distributed and collaborative signal processing, multiterminal information
theory, and statistical inference with applications to sensor networks, multi-
media-over-wireless, and security.
Dr. Ishwar was awarded the 2000 Frederic T. and Edith F. Mavis College of
Engineering Fellowship of the University of Illinois.
Vinod Prabhakaran (S’97) was born in Trivan-
drum, India. He received the B.Tech. degree in
electronics and communication from the University
of Kerala, Trivandrum, India, in 1999, and the
M.E. degree in signal processing from the Indian
Institute of Science, Bangalore, India, in 2001. He
is currently working toward the Ph.D. degree in
electrical engineering at the University of California,
Berkeley.
His research interests include multi-user informa-
tion theory, sensor networks, and information-theo-
retic cryptography.
Daniel Schonberg (S’01) received the B.S.E. degree
in 2001 and the M.S.E. degree in 2001, both in elec-
trical engineering, from the University of Michigan,
Ann Arbor. He is currently working toward the Ph.D.
degree in the Department of Electrical Engineering
and Computer Sciences, University of California,
Berkeley.
During the summer of 2003, he was with Microsoft
Research in Redmond, WA. His research interests in-
clude coding theory and data security.
Kannan Ramchandran (S’92–M’93–SM’03)
received the M.S. and Ph.D. degrees from Columbia
University, New York, in electrical engineering in
1984 and 1993, respectively.
From 1984 to 1990, he was a Member of the
Technical Staff at ATT Bell Labs in the Telecom-
munications Research and Development area. From
1993 to 1999, he was on the faculty of the Electrical
and Computer Engineering Department, University
of Illinois at Urbana-Champaign, and a Research
Assistant Professor at the Beckman Institute and the
Coordinated Science Laboratory. Since late 1999, he has been an Associate
Professor in the Electrical Engineering and Computer Sciences Department,
University of California, Berkeley. His current research interests include
distributed algorithms for signal processing and communications, multi-user
information theory, wavelet theory and multiresolution signal processing, and
uniﬁed algorithms for multimedia signal processing, communications, and
networking.
Dr. Ramchandran was a recipient of the 1993 Eliahu I. Jury Award at Co-
lumbia University for the best doctoral thesis in the area of systems, signal pro-
cessing, or communications. His research awards include the National Science
Foundation (NSF) CAREER award in 1997, and ONR and ARO Young Inves-
tigator awards in 1996 and 1997, and the Okawa Foundation Award from the
University of California Berkeley in 2000. In 1998, he was selected as a Henry
Magnusky Scholar by the Electrical and Computer Engineering Department at
the University of Illinois, chosen to recognize excellence among junior faculty.
He is the co-recipient of two Best Paper Awards from the IEEE Signal Pro-
cessing Society, and has been a member of the technical committees of the IEEE
Image and Multidimensional Signal Processing Committee and the IEEE Mul-
timedia Signal Processing Committee, and served as an Associate Editor for the
IEEE TRANSACTIONS ON IMAGE PROCESSING.

01337277

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 01337277

Similar to 01337277 (20)

01337277