1. UNEQUAL-COST PREFIX-FREE CODES
DANIEL BULHOSA
Abstract. We demonstrate that the average word length of a binary unequal-
cost prefix-free code obeys a fundamental lower bound analogous to that of
equal-cost prefix-free D-ary codes. The costs of the characters are taken to
be 1 and 2 respectively. Furthermore, we show that prefix-free codes of this
type can always be created whose average word length is within 2 units of the
fundamental bound.
Introduction
Prefix-free codes are very important in the context of information theory as the
uniqueness of their words allows for a bijective correspondence between these and
the symbols they represent. From a practical standpoint, these codes are attractive
since their prefix-free property allows for the decoding of an incoming message as it
arrives. For comparison, although the more general uniquely decodable codes yield
messages that can always be decoded uniquely they do not possess this property.
Often one would need to wait for the full message to arrive to begin interpreting
it, leading to time and resources being wasted.
The prefix-free codes that are usually considered utilize an alphabet for which
the cost associated with transmitting any particular character is the same for any
character. Though an understanding of this case is sufficient for most practical ap-
plications, it is by no means the only possible case. One can consider the variation
of this problem in which different characters cost different amounts to transmit.
In this situation the standard results proven in the literature do not apply, and it
is necessary to generalize them. The goal of this paper is to extend some central
results of equal-cost prefix-free codes to unequal-cost prefix-free codes with binary
alphabets.
Generalization of Fundamental Bound
In this section we determine a fundamental lower bound for the average code-
word length of a prefix-free code with binary alphabet {., −} with length costs 1
and 2 respectively for each of these characters. The proof will not assume that the
characters have equal cost, leading to a more general result that will be applicable
to unequal-cost prefix-free codes. First we generalize the Kraft Inequality:
Date: March 25, 2015.
1
2. 2 DANIEL BULHOSA
Theorem 1 (Kraft Inequality): Suppose that C is a binary finite set of
codewords that form a prefix-free code, with symbols . and − costing 1 and 2 units
of length respectively. Then for φ equal to the golden ratio:
words inC
φ−l(word)
≤ 1
Proof: First note that the only code of maximal length 1 that fits this description
is {.}, and it obeys the inequality. The only codes of maximal length 2 fitting this
description are {.}, {., −}, {.., −}, and {−} and they also obey the inequality.
Now we induct over the maximal length of the code. Let lmax be the length of the
longest word in the code C. Assume that the statement to be proved is true for all
codes with maximal legth l < lmax. We can create two codes with maximal lengths
less than lmax as follows: Take all the words starting with ., remove the ., and let the
set of these words form a code. This code C. inherits the prefix-free property from
C and its maximal length is at most lmax −1. If we do the same thing for all words
starting with − we end up with a code C− with maximal length of at most lmax −2.
Now, by the inductive hypothesis:
wordsinC.
φ−l.(word)
≤ 1 and
wordsinC−
φ−l−(word)
≤ 1
And by construction,
words in C.
φ−l.(word)−1
+
words in C−
φ−l−(word)−2
=
words inC
φ−l(word)
since . and − have lenghts 1 and 2 respectively. Combining this equation and the
inequalities we find that:
∞
words inC
φ−l(word)
=
words in C.
φ−l.(word)−1
+
words in C−
φ−l−(word)−2
≤ φ−1
+ φ−2
= 1
Now we prove the fundamental bound holds for these type of codes.
Theorem 2: Suppose that C is a code as described in Theorem 1, describing
the outcomes of some random variable X with probability distribution p. Then the
average length L of the codewords obeys L ≥ Hφ(p).
3. UNEQUAL-COST PREFIX-FREE CODES 3
Proof: We follow the template of Theorem 5.3.1 of Cover and Thomas. Let
r(x) = φ−l(x)
/ x φ−l(x)
and c = x φ−l(x)
. The difference of the average length
L and the entropy can be written as:
L − Hφ(p) =
x
p(x)l(x) +
X
p(x) logφ p(x)
= −
x
p(x) logφ φ−l(x)
+
X
p(x) logφ p(x)
= −
x
p(x) logφ
φ−l(x)
x φ−l(x)
+
X
p(x) logφ p(x)
+
x
p(x) logφ
1
x φ−l(x)
=
x
p(x) logφ
p(x)
r(x)
+ log c
= D(p||r) + log c
≥ 0
Here the inequality follows from the fact that the relative entropy is positive definite,
and the fact that c ≥ 1 by the Kraft inequality. Note that the inequality is saturated
if and only if the inequality is saturated and if p = r.
Achieving the Fundamental Bound
Our goal now is to demonstrate the existence of prefix-free codes of this type
whose average word length is within 2 units of the fundamental bound. Our ap-
proach will be to prove a version of the converse Kraft inequality and then using a
set of lengths motivated by Theorem 2 to generate the desired code.
Before doing this however, we will prove a useful lemma. First we make a few
useful definitions. Let C be a prefix-free code formed with the alphabet {., −},
then we define T(C) as the tree representation of C. In this representation, every
childless node of T(C) corresponds to a unique word in C. The word correspond-
ing to a given childless node can be constructed by following the path to the node
starting from the root, and adding a . for every left path (child) that is taken and a
− for every right path that is taken. Once we arrive to the childless node all neces-
sasry characters will be added and the word will be completed. Now, for the lemma:
Lemma: Let C be a prefix-free code such that T(C) is a full binary tree. Let .
have length 1 and − have length 2. Then:
words in C
φ−l(word)
= 1
4. 4 DANIEL BULHOSA
Proof: First note that given a generic complete and full tree T we can create
a prefix-free code C by considering all of the words with N letters (independent
of cost). Here N is the depth of the childless nodes of T . A simple combinatoric
argument based on counting the number n of −’s in a word shows that:
words in C
φ−l(word)
=
0≤n≤N
N
n
φ−(N+n)
Now we show that this sum is equal to 1. First note that when N = 1 this sum
reduces to φ−1
+φ−2
= 1, so the base case holds. Now, assume that the postulated
equality holds for N, then:
1 = φ−1
+ φ−2
=
0≤n≤N
N
n
φ−[(N+1)+n]
+
0≤n≤N
N
n
φ−[(N+1)+(n+1)]
=
0≤n≤N
N
n
φ−[(N+1)+n]
+
1≤n≤N+1
N
n − 1
φ−[(N+1)+(n)]
= φ−(N+1)
+
1≤n≤N
N
n
+
N
n − 1
φ−[(N+1)+n]
+ φ−2(N+1)
= φ−(N+1)
+
1≤n≤N
N + 1
n
φ−[(N+1)+n]
+ φ−2(N+1)
=
0≤n≤N+1
N + 1
n
φ−[(N+1)+n]
Thus by induction the sum is equal to 1 for all N ≥ 1. So for a code with a complete
and full tree the Kraft inequality is saturated.
Now consider the tree T(C). Let N be the depth of the deepest childless node
of T(C). We can create a complete and full tree T (C) from T(C) by appending
complete and full subtrees to all the childless nodes that have depth less than N.
Note that if w is some word in C and D(w) is the set of its childless descendents
then:
D(w) in T (C)
φ−l(descendant)
= φ−l(w)
·
D(w) in T (C)
φ−(l(descendant)−l(w))
= φ−l(w)
This follows from the fact that the subtree of T (C) rooted at w is complete and
full (by construction), that its childless nodes have lenghts l − l(w), and the fact
that complete and full trees saturate the Kraft inequality as shown above. The
implication of this equation is that:
5. UNEQUAL-COST PREFIX-FREE CODES 5
w∈C
φ−l(w)
=
w∈C D(w) in T (C)
φ−l(descendant)
=
cn of T (C)
φ−l(cn)
= 1
Here cn stands for childness node. The second equality follows from the fact that
the sets of childless descendents D(w) are disjoint, and their union encompases all
childless nodes of the tree T (C).
Now we prove the converse of the Kraft inequality:
Theorem 3 (Converse Kraft Inequality): Suppose that L = {l1, ..., ln} is a
set of lengths (which may repeat) satisfying the Kraft inequality:
l∈L
φ−l
≤ 1
Then there exists some prefix-free code C with alphabet {., −}, where l(.) = 1 and
l(−) = 2, such that the lenght of each word is either equal to li or li +1 for a unique
1 ≤ i ≤ n.
Proof: We proceed by induction on the size of the set of lengths. Note that the
statement is trivial when there is only one length l1 in L, as then there is only one
word and a code with one word is trivially prefix-free. Note also that this length
can be arbitrary. This covers the base case.
Now suppose that L contains n lengths, some of which may be equal to one
another. Suppose that these lengths satisfy the Kraft inequality, and assume that
the theorem holds for all L of size n−1. Then the set L = L−{ln} that we create
by removing the last length must also obey the Kraft inequality:
l∈L
φ−l
<
l∈L
φ−l
≤ 1
The set L contains n − 1 lengths, so by the inductive hypothesis there exists some
prefix-free code C such that the length of each word is either equal to li or li + 1
for a unique 1 ≤ i ≤ n − 1.
Now, note that since the new set of lengths L obeys the Kraft inequality strictly
the contrapositive of the Lemma implies that the tree T(C ) is not full. This
means some node has a child available for whom a codeword has not been assigned.
Furthermore, the other child of this node must lead to codewords so the set of
characters leading to this node must have length of at most ln−1. Thus if ln > ln−1
we can create the code C from the code C by simply creating a path with appropiate
length starting from this node and going through its unassigned child.
6. 6 DANIEL BULHOSA
The remaining concern is if ln = ln−1, for if this is the case and the characters
leading to the node in question have length ln this implies that the word corre-
sponding to ln must have length ln + 1. In turn, this means that the unassigned
child for the node in question leads to a word that has length ln +2, which is larger
than our goal.
However, if in fact ln = ln−1 and the length of the path to the node is ln−1 we
can do the following: Imagine that we do create a codeword w1 of length ln + 2
for this node, and create a code C by adding this word to the code C . Since
ln + 2 < ln The set of lengths for C must obey the Kraft inequality strictly, so
there is some node in the tree T(C ) with an unassigned child. Furthermore, the
characters leading to this node must have length of at most ln−2. We can take out
the word w1 from C and then add a new word w2 that is equal to the path to this
unassigned child. Again we must determine whether ln = ln−2 and if the length of
the path leading to this new node is ln−2. If this is not the case adding w2 to C
gives the desired code C.
If this is the case however we can again repeat the process by adding the words
w1, w2, ... to the code C to get a new code C(n)
strictly obeying the Kraft inequality.
We are guarateed that ln > lk for some 0 < k < n for otherwise the Kraft inequality
would be contradicted.
The Kraft inequality must be generalized in this manner because for some choices
of lengths one may obey the Kraft inequality but not have enough words of that
length to create a code. For example, consider the set of lengths L = {3, 3, 3, 3}.
This set obeys the Kraft inequality, but there are only three words we can create
from this alphabet that have length three: namely .., −., and .−. There are still
words of length four available after we use these three, and increasing any of the
lengths does not affect compliance with the Kraft inequality.
Having proved the Kraft inequality we can now prove the principal result of this
section. Returning to the result of Theorem 2, we found that if r = p and the
Kraft inequality is satisfied then the entropy bound is saturated. Imposing these
conditions, namely r = p along with c = 1, implies that the lengths must be:
p(x) = φ−l(x)
/
x
φ−l(x)
= φ−l(x)
=⇒ l(x) = logφ
1
p(x)
These may not valid lengths because they may be non-integer. However the
lenghts,
l (x) = logφ
1
p(x)
still obey the Kraft inequality and they are intergers. We use these lenghts to prove
the main result.
7. UNEQUAL-COST PREFIX-FREE CODES 7
Theorem 4: Let p be the distribution for a random variable X. Let l (x) be
the length associated with the outcome X = x. Then there exists a prefix-free code
C with alphabet {., −} such that:
Hφ(p) ≤ L ≤ Hφ(p) + 2
It follows that the optimal code C∗
for this probability distribution must be at least
as good as C.
Proof: Since the lenghts l (x) obey the Kraft inequality then Theorem 3 implies
that there is a code C such that the length of the word corresponding to x is either
l (x) or l (x) + 1. Let lC(x) denote the lengths of the words in this code. Then the
average word length L satisfies:
L =
x
p(x)lC(x)
≤
x
p(x)[l (x) + 1], worst case for Theorem 3
≤
x
p(x) logφ
1
p(x)
+ 2 , property of celing function
= Hφ(p) + 2
x
p(x)
= Hφ(p) + 2
L must also obey the fundamental bound from Theorem 2, so the result follows.
Conclusion
We have seen how the principal results for equal-cost prefix-free codes can be
generalized to the class of unequal-cost prefix-free codes with a binary dictionary
and costs 1 and 2. Although the results may not have any obvious practical ap-
plications, they do motivate an a roadmap for the interesting more general case of
non-binary codes with a different set of costs for the characters.
To illustrate the point, the key observation in proving Theorem 1 was to note
that:
φ−2
+ φ−1
= 1
It is simple to see however that this condition is equivalent to the condition φ2
−
φ − 1 = 0, which is the equation that defines the golden ratio. Thus it is apparent
that for a general unequal-cost code with costs c1, c2, ...cn the kind of relationship
that we would exploit to generalize the Kraft inequality would take the form:
8. 8 DANIEL BULHOSA
ξ−c1
+ ξ−c2
+ ... + ξ−cn
= 1
This equation defines an algebrabic number ξ, which if real we can use to generalize
the Kraft inequality in the form:
words in C
ξ−l(words)
≤ 1
The generalization would likely follow an identical prescription as that of Theorem
1.
It is worth noting that every polynomial with odd degree and real coefficients has
at least one real root, which means that if the highest cost cmax is an odd number
such a ξ is guaranteed to exist. Thus at least certain classes of general unequal-
cost codes are promising for this type of generalization. This is an interesting
generalization from the standpoint of pure mathematics.