SlideShare a Scribd company logo
1 of 8
Download to read offline
UNEQUAL-COST PREFIX-FREE CODES
DANIEL BULHOSA
Abstract. We demonstrate that the average word length of a binary unequal-
cost prefix-free code obeys a fundamental lower bound analogous to that of
equal-cost prefix-free D-ary codes. The costs of the characters are taken to
be 1 and 2 respectively. Furthermore, we show that prefix-free codes of this
type can always be created whose average word length is within 2 units of the
fundamental bound.
Introduction
Prefix-free codes are very important in the context of information theory as the
uniqueness of their words allows for a bijective correspondence between these and
the symbols they represent. From a practical standpoint, these codes are attractive
since their prefix-free property allows for the decoding of an incoming message as it
arrives. For comparison, although the more general uniquely decodable codes yield
messages that can always be decoded uniquely they do not possess this property.
Often one would need to wait for the full message to arrive to begin interpreting
it, leading to time and resources being wasted.
The prefix-free codes that are usually considered utilize an alphabet for which
the cost associated with transmitting any particular character is the same for any
character. Though an understanding of this case is sufficient for most practical ap-
plications, it is by no means the only possible case. One can consider the variation
of this problem in which different characters cost different amounts to transmit.
In this situation the standard results proven in the literature do not apply, and it
is necessary to generalize them. The goal of this paper is to extend some central
results of equal-cost prefix-free codes to unequal-cost prefix-free codes with binary
alphabets.
Generalization of Fundamental Bound
In this section we determine a fundamental lower bound for the average code-
word length of a prefix-free code with binary alphabet {., −} with length costs 1
and 2 respectively for each of these characters. The proof will not assume that the
characters have equal cost, leading to a more general result that will be applicable
to unequal-cost prefix-free codes. First we generalize the Kraft Inequality:
Date: March 25, 2015.
1
2 DANIEL BULHOSA
Theorem 1 (Kraft Inequality): Suppose that C is a binary finite set of
codewords that form a prefix-free code, with symbols . and − costing 1 and 2 units
of length respectively. Then for φ equal to the golden ratio:
words inC
φ−l(word)
≤ 1
Proof: First note that the only code of maximal length 1 that fits this description
is {.}, and it obeys the inequality. The only codes of maximal length 2 fitting this
description are {.}, {., −}, {.., −}, and {−} and they also obey the inequality.
Now we induct over the maximal length of the code. Let lmax be the length of the
longest word in the code C. Assume that the statement to be proved is true for all
codes with maximal legth l < lmax. We can create two codes with maximal lengths
less than lmax as follows: Take all the words starting with ., remove the ., and let the
set of these words form a code. This code C. inherits the prefix-free property from
C and its maximal length is at most lmax −1. If we do the same thing for all words
starting with − we end up with a code C− with maximal length of at most lmax −2.
Now, by the inductive hypothesis:
wordsinC.
φ−l.(word)
≤ 1 and
wordsinC−
φ−l−(word)
≤ 1
And by construction,
words in C.
φ−l.(word)−1
+
words in C−
φ−l−(word)−2
=
words inC
φ−l(word)
since . and − have lenghts 1 and 2 respectively. Combining this equation and the
inequalities we find that:
∞
words inC
φ−l(word)
=
words in C.
φ−l.(word)−1
+
words in C−
φ−l−(word)−2
≤ φ−1
+ φ−2
= 1
Now we prove the fundamental bound holds for these type of codes.
Theorem 2: Suppose that C is a code as described in Theorem 1, describing
the outcomes of some random variable X with probability distribution p. Then the
average length L of the codewords obeys L ≥ Hφ(p).
UNEQUAL-COST PREFIX-FREE CODES 3
Proof: We follow the template of Theorem 5.3.1 of Cover and Thomas. Let
r(x) = φ−l(x)
/ x φ−l(x)
and c = x φ−l(x)
. The difference of the average length
L and the entropy can be written as:
L − Hφ(p) =
x
p(x)l(x) +
X
p(x) logφ p(x)
= −
x
p(x) logφ φ−l(x)
+
X
p(x) logφ p(x)
= −
x
p(x) logφ
φ−l(x)
x φ−l(x)
+
X
p(x) logφ p(x)
+
x
p(x) logφ
1
x φ−l(x)
=
x
p(x) logφ
p(x)
r(x)
+ log c
= D(p||r) + log c
≥ 0
Here the inequality follows from the fact that the relative entropy is positive definite,
and the fact that c ≥ 1 by the Kraft inequality. Note that the inequality is saturated
if and only if the inequality is saturated and if p = r.
Achieving the Fundamental Bound
Our goal now is to demonstrate the existence of prefix-free codes of this type
whose average word length is within 2 units of the fundamental bound. Our ap-
proach will be to prove a version of the converse Kraft inequality and then using a
set of lengths motivated by Theorem 2 to generate the desired code.
Before doing this however, we will prove a useful lemma. First we make a few
useful definitions. Let C be a prefix-free code formed with the alphabet {., −},
then we define T(C) as the tree representation of C. In this representation, every
childless node of T(C) corresponds to a unique word in C. The word correspond-
ing to a given childless node can be constructed by following the path to the node
starting from the root, and adding a . for every left path (child) that is taken and a
− for every right path that is taken. Once we arrive to the childless node all neces-
sasry characters will be added and the word will be completed. Now, for the lemma:
Lemma: Let C be a prefix-free code such that T(C) is a full binary tree. Let .
have length 1 and − have length 2. Then:
words in C
φ−l(word)
= 1
4 DANIEL BULHOSA
Proof: First note that given a generic complete and full tree T we can create
a prefix-free code C by considering all of the words with N letters (independent
of cost). Here N is the depth of the childless nodes of T . A simple combinatoric
argument based on counting the number n of −’s in a word shows that:
words in C
φ−l(word)
=
0≤n≤N
N
n
φ−(N+n)
Now we show that this sum is equal to 1. First note that when N = 1 this sum
reduces to φ−1
+φ−2
= 1, so the base case holds. Now, assume that the postulated
equality holds for N, then:
1 = φ−1
+ φ−2
=
0≤n≤N
N
n
φ−[(N+1)+n]
+
0≤n≤N
N
n
φ−[(N+1)+(n+1)]
=
0≤n≤N
N
n
φ−[(N+1)+n]
+
1≤n≤N+1
N
n − 1
φ−[(N+1)+(n)]
= φ−(N+1)
+
1≤n≤N
N
n
+
N
n − 1
φ−[(N+1)+n]
+ φ−2(N+1)
= φ−(N+1)
+
1≤n≤N
N + 1
n
φ−[(N+1)+n]
+ φ−2(N+1)
=
0≤n≤N+1
N + 1
n
φ−[(N+1)+n]
Thus by induction the sum is equal to 1 for all N ≥ 1. So for a code with a complete
and full tree the Kraft inequality is saturated.
Now consider the tree T(C). Let N be the depth of the deepest childless node
of T(C). We can create a complete and full tree T (C) from T(C) by appending
complete and full subtrees to all the childless nodes that have depth less than N.
Note that if w is some word in C and D(w) is the set of its childless descendents
then:
D(w) in T (C)
φ−l(descendant)
= φ−l(w)
·
D(w) in T (C)
φ−(l(descendant)−l(w))
= φ−l(w)
This follows from the fact that the subtree of T (C) rooted at w is complete and
full (by construction), that its childless nodes have lenghts l − l(w), and the fact
that complete and full trees saturate the Kraft inequality as shown above. The
implication of this equation is that:
UNEQUAL-COST PREFIX-FREE CODES 5
w∈C
φ−l(w)
=
w∈C D(w) in T (C)
φ−l(descendant)
=
cn of T (C)
φ−l(cn)
= 1
Here cn stands for childness node. The second equality follows from the fact that
the sets of childless descendents D(w) are disjoint, and their union encompases all
childless nodes of the tree T (C).
Now we prove the converse of the Kraft inequality:
Theorem 3 (Converse Kraft Inequality): Suppose that L = {l1, ..., ln} is a
set of lengths (which may repeat) satisfying the Kraft inequality:
l∈L
φ−l
≤ 1
Then there exists some prefix-free code C with alphabet {., −}, where l(.) = 1 and
l(−) = 2, such that the lenght of each word is either equal to li or li +1 for a unique
1 ≤ i ≤ n.
Proof: We proceed by induction on the size of the set of lengths. Note that the
statement is trivial when there is only one length l1 in L, as then there is only one
word and a code with one word is trivially prefix-free. Note also that this length
can be arbitrary. This covers the base case.
Now suppose that L contains n lengths, some of which may be equal to one
another. Suppose that these lengths satisfy the Kraft inequality, and assume that
the theorem holds for all L of size n−1. Then the set L = L−{ln} that we create
by removing the last length must also obey the Kraft inequality:
l∈L
φ−l
<
l∈L
φ−l
≤ 1
The set L contains n − 1 lengths, so by the inductive hypothesis there exists some
prefix-free code C such that the length of each word is either equal to li or li + 1
for a unique 1 ≤ i ≤ n − 1.
Now, note that since the new set of lengths L obeys the Kraft inequality strictly
the contrapositive of the Lemma implies that the tree T(C ) is not full. This
means some node has a child available for whom a codeword has not been assigned.
Furthermore, the other child of this node must lead to codewords so the set of
characters leading to this node must have length of at most ln−1. Thus if ln > ln−1
we can create the code C from the code C by simply creating a path with appropiate
length starting from this node and going through its unassigned child.
6 DANIEL BULHOSA
The remaining concern is if ln = ln−1, for if this is the case and the characters
leading to the node in question have length ln this implies that the word corre-
sponding to ln must have length ln + 1. In turn, this means that the unassigned
child for the node in question leads to a word that has length ln +2, which is larger
than our goal.
However, if in fact ln = ln−1 and the length of the path to the node is ln−1 we
can do the following: Imagine that we do create a codeword w1 of length ln + 2
for this node, and create a code C by adding this word to the code C . Since
ln + 2 < ln The set of lengths for C must obey the Kraft inequality strictly, so
there is some node in the tree T(C ) with an unassigned child. Furthermore, the
characters leading to this node must have length of at most ln−2. We can take out
the word w1 from C and then add a new word w2 that is equal to the path to this
unassigned child. Again we must determine whether ln = ln−2 and if the length of
the path leading to this new node is ln−2. If this is not the case adding w2 to C
gives the desired code C.
If this is the case however we can again repeat the process by adding the words
w1, w2, ... to the code C to get a new code C(n)
strictly obeying the Kraft inequality.
We are guarateed that ln > lk for some 0 < k < n for otherwise the Kraft inequality
would be contradicted.
The Kraft inequality must be generalized in this manner because for some choices
of lengths one may obey the Kraft inequality but not have enough words of that
length to create a code. For example, consider the set of lengths L = {3, 3, 3, 3}.
This set obeys the Kraft inequality, but there are only three words we can create
from this alphabet that have length three: namely .., −., and .−. There are still
words of length four available after we use these three, and increasing any of the
lengths does not affect compliance with the Kraft inequality.
Having proved the Kraft inequality we can now prove the principal result of this
section. Returning to the result of Theorem 2, we found that if r = p and the
Kraft inequality is satisfied then the entropy bound is saturated. Imposing these
conditions, namely r = p along with c = 1, implies that the lengths must be:
p(x) = φ−l(x)
/
x
φ−l(x)
= φ−l(x)
=⇒ l(x) = logφ
1
p(x)
These may not valid lengths because they may be non-integer. However the
lenghts,
l (x) = logφ
1
p(x)
still obey the Kraft inequality and they are intergers. We use these lenghts to prove
the main result.
UNEQUAL-COST PREFIX-FREE CODES 7
Theorem 4: Let p be the distribution for a random variable X. Let l (x) be
the length associated with the outcome X = x. Then there exists a prefix-free code
C with alphabet {., −} such that:
Hφ(p) ≤ L ≤ Hφ(p) + 2
It follows that the optimal code C∗
for this probability distribution must be at least
as good as C.
Proof: Since the lenghts l (x) obey the Kraft inequality then Theorem 3 implies
that there is a code C such that the length of the word corresponding to x is either
l (x) or l (x) + 1. Let lC(x) denote the lengths of the words in this code. Then the
average word length L satisfies:
L =
x
p(x)lC(x)
≤
x
p(x)[l (x) + 1], worst case for Theorem 3
≤
x
p(x) logφ
1
p(x)
+ 2 , property of celing function
= Hφ(p) + 2
x
p(x)
= Hφ(p) + 2
L must also obey the fundamental bound from Theorem 2, so the result follows.
Conclusion
We have seen how the principal results for equal-cost prefix-free codes can be
generalized to the class of unequal-cost prefix-free codes with a binary dictionary
and costs 1 and 2. Although the results may not have any obvious practical ap-
plications, they do motivate an a roadmap for the interesting more general case of
non-binary codes with a different set of costs for the characters.
To illustrate the point, the key observation in proving Theorem 1 was to note
that:
φ−2
+ φ−1
= 1
It is simple to see however that this condition is equivalent to the condition φ2
−
φ − 1 = 0, which is the equation that defines the golden ratio. Thus it is apparent
that for a general unequal-cost code with costs c1, c2, ...cn the kind of relationship
that we would exploit to generalize the Kraft inequality would take the form:
8 DANIEL BULHOSA
ξ−c1
+ ξ−c2
+ ... + ξ−cn
= 1
This equation defines an algebrabic number ξ, which if real we can use to generalize
the Kraft inequality in the form:
words in C
ξ−l(words)
≤ 1
The generalization would likely follow an identical prescription as that of Theorem
1.
It is worth noting that every polynomial with odd degree and real coefficients has
at least one real root, which means that if the highest cost cmax is an odd number
such a ξ is guaranteed to exist. Thus at least certain classes of general unequal-
cost codes are promising for this type of generalization. This is an interesting
generalization from the standpoint of pure mathematics.

More Related Content

What's hot

On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particles
Cemal Ardil
 
Optimization
OptimizationOptimization
Optimization
Springer
 
RSA final notation change2
RSA final notation change2RSA final notation change2
RSA final notation change2
Coleman Gorham
 
CiE 2010 talk
CiE 2010 talkCiE 2010 talk
CiE 2010 talk
ilyaraz
 
2014 vulnerability assesment of spatial network - models and solutions
2014   vulnerability assesment of spatial network - models and solutions2014   vulnerability assesment of spatial network - models and solutions
2014 vulnerability assesment of spatial network - models and solutions
Francisco Pérez
 
Analysis Of Algorithms Ii
Analysis Of Algorithms IiAnalysis Of Algorithms Ii
Analysis Of Algorithms Ii
Sri Prasanna
 

What's hot (20)

On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particles
 
Optimization
OptimizationOptimization
Optimization
 
Mid term
Mid termMid term
Mid term
 
Stochastic Processes Assignment Help
Stochastic Processes Assignment HelpStochastic Processes Assignment Help
Stochastic Processes Assignment Help
 
Lesson 29
Lesson 29Lesson 29
Lesson 29
 
RSA final notation change2
RSA final notation change2RSA final notation change2
RSA final notation change2
 
CiE 2010 talk
CiE 2010 talkCiE 2010 talk
CiE 2010 talk
 
paper
paperpaper
paper
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
 
2014 vulnerability assesment of spatial network - models and solutions
2014   vulnerability assesment of spatial network - models and solutions2014   vulnerability assesment of spatial network - models and solutions
2014 vulnerability assesment of spatial network - models and solutions
 
Quiz 1 solution
Quiz 1 solutionQuiz 1 solution
Quiz 1 solution
 
Huffman Code Decoding
Huffman Code DecodingHuffman Code Decoding
Huffman Code Decoding
 
1 6
1 61 6
1 6
 
Signal Processing Assignment Help
Signal Processing Assignment HelpSignal Processing Assignment Help
Signal Processing Assignment Help
 
Indexing Text with Approximate q-grams
Indexing Text with Approximate q-gramsIndexing Text with Approximate q-grams
Indexing Text with Approximate q-grams
 
The Complexity Of Primality Testing
The Complexity Of Primality TestingThe Complexity Of Primality Testing
The Complexity Of Primality Testing
 
Algorithms
AlgorithmsAlgorithms
Algorithms
 
Analysis Of Algorithms Ii
Analysis Of Algorithms IiAnalysis Of Algorithms Ii
Analysis Of Algorithms Ii
 
Huffman codes
Huffman codesHuffman codes
Huffman codes
 
Lesson 28
Lesson 28Lesson 28
Lesson 28
 

Viewers also liked

NATIONAL INSTITUTE OF TECHNOLOGY
NATIONAL INSTITUTE OF TECHNOLOGYNATIONAL INSTITUTE OF TECHNOLOGY
NATIONAL INSTITUTE OF TECHNOLOGY
Nishant Singh
 
JYC GUINNESS AND COMMUNITIES PRSENTATION 2
JYC GUINNESS AND COMMUNITIES PRSENTATION 2JYC GUINNESS AND COMMUNITIES PRSENTATION 2
JYC GUINNESS AND COMMUNITIES PRSENTATION 2
Josephine Yelang
 

Viewers also liked (20)

nUDC_presentation
nUDC_presentationnUDC_presentation
nUDC_presentation
 
Secreto del éxito
Secreto del éxitoSecreto del éxito
Secreto del éxito
 
Интернет-маркетинг и бизнес
Интернет-маркетинг и бизнесИнтернет-маркетинг и бизнес
Интернет-маркетинг и бизнес
 
Alles wat je moet weten over de zorgverzekering
Alles wat je moet weten over de zorgverzekeringAlles wat je moet weten over de zorgverzekering
Alles wat je moet weten over de zorgverzekering
 
In Toto Marketing Services
In Toto Marketing ServicesIn Toto Marketing Services
In Toto Marketing Services
 
NATIONAL INSTITUTE OF TECHNOLOGY
NATIONAL INSTITUTE OF TECHNOLOGYNATIONAL INSTITUTE OF TECHNOLOGY
NATIONAL INSTITUTE OF TECHNOLOGY
 
Mom Resume 1
Mom Resume 1Mom Resume 1
Mom Resume 1
 
Презентация участникам строительной выставки Обнинск Строй Экспо 2016
Презентация участникам строительной выставки Обнинск Строй Экспо 2016Презентация участникам строительной выставки Обнинск Строй Экспо 2016
Презентация участникам строительной выставки Обнинск Строй Экспо 2016
 
Интегрированные коммуникации в интернет-пространстве для ВШЭ
Интегрированные коммуникации в интернет-пространстве для ВШЭИнтегрированные коммуникации в интернет-пространстве для ВШЭ
Интегрированные коммуникации в интернет-пространстве для ВШЭ
 
Sibo CV
Sibo CVSibo CV
Sibo CV
 
Sizzle properties pvt ltd
Sizzle properties pvt ltdSizzle properties pvt ltd
Sizzle properties pvt ltd
 
JYC GUINNESS AND COMMUNITIES PRSENTATION 2
JYC GUINNESS AND COMMUNITIES PRSENTATION 2JYC GUINNESS AND COMMUNITIES PRSENTATION 2
JYC GUINNESS AND COMMUNITIES PRSENTATION 2
 
Introduction to business
Introduction to businessIntroduction to business
Introduction to business
 
letest my cv0509
letest my cv0509letest my cv0509
letest my cv0509
 
Innovatieproces Kaal Masten B.V.
Innovatieproces Kaal Masten B.V.Innovatieproces Kaal Masten B.V.
Innovatieproces Kaal Masten B.V.
 
Sizzle properties pvt ltd
Sizzle properties pvt ltdSizzle properties pvt ltd
Sizzle properties pvt ltd
 
M.Alamrawy's CV
M.Alamrawy's CVM.Alamrawy's CV
M.Alamrawy's CV
 
этнография ансамбль Истоки 2009 Устюг
этнография ансамбль Истоки 2009 Устюгэтнография ансамбль Истоки 2009 Устюг
этнография ансамбль Истоки 2009 Устюг
 
Practica 7
Practica 7Practica 7
Practica 7
 
Aparato reproductor
Aparato reproductorAparato reproductor
Aparato reproductor
 

Similar to Unequal-Cost Prefix-Free Codes

Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimization
Attaporn Ninsuwan
 
Data Complexity in EL Family of Description Logics
Data Complexity in EL Family of Description LogicsData Complexity in EL Family of Description Logics
Data Complexity in EL Family of Description Logics
Adila Krisnadhi
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
George Ang
 

Similar to Unequal-Cost Prefix-Free Codes (20)

Losseless
LosselessLosseless
Losseless
 
Huffman Encoding Pr
Huffman Encoding PrHuffman Encoding Pr
Huffman Encoding Pr
 
Huffman analysis
Huffman analysisHuffman analysis
Huffman analysis
 
Proof of Kraft Mc-Millan theorem - nguyen vu hung
Proof of Kraft Mc-Millan theorem - nguyen vu hungProof of Kraft Mc-Millan theorem - nguyen vu hung
Proof of Kraft Mc-Millan theorem - nguyen vu hung
 
Iswc 2016 completeness correctude
Iswc 2016 completeness correctudeIswc 2016 completeness correctude
Iswc 2016 completeness correctude
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimization
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 
Lecture 3.pptx
Lecture 3.pptxLecture 3.pptx
Lecture 3.pptx
 
Lecture 3.pptx
Lecture 3.pptxLecture 3.pptx
Lecture 3.pptx
 
Flat unit 1
Flat unit 1Flat unit 1
Flat unit 1
 
Data Complexity in EL Family of Description Logics
Data Complexity in EL Family of Description LogicsData Complexity in EL Family of Description Logics
Data Complexity in EL Family of Description Logics
 
Lec1
Lec1Lec1
Lec1
 
Context free grammer.ppt
Context free grammer.pptContext free grammer.ppt
Context free grammer.ppt
 
Basics of coding theory
Basics of coding theoryBasics of coding theory
Basics of coding theory
 
Unit ii
Unit iiUnit ii
Unit ii
 
Computer Science Exam Help
Computer Science Exam HelpComputer Science Exam Help
Computer Science Exam Help
 
Sums (Sumatorias)
Sums (Sumatorias)Sums (Sumatorias)
Sums (Sumatorias)
 
Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
 
Weight enumerators of block codes and the mc williams
Weight  enumerators of block codes and  the mc williamsWeight  enumerators of block codes and  the mc williams
Weight enumerators of block codes and the mc williams
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
 

Unequal-Cost Prefix-Free Codes

  • 1. UNEQUAL-COST PREFIX-FREE CODES DANIEL BULHOSA Abstract. We demonstrate that the average word length of a binary unequal- cost prefix-free code obeys a fundamental lower bound analogous to that of equal-cost prefix-free D-ary codes. The costs of the characters are taken to be 1 and 2 respectively. Furthermore, we show that prefix-free codes of this type can always be created whose average word length is within 2 units of the fundamental bound. Introduction Prefix-free codes are very important in the context of information theory as the uniqueness of their words allows for a bijective correspondence between these and the symbols they represent. From a practical standpoint, these codes are attractive since their prefix-free property allows for the decoding of an incoming message as it arrives. For comparison, although the more general uniquely decodable codes yield messages that can always be decoded uniquely they do not possess this property. Often one would need to wait for the full message to arrive to begin interpreting it, leading to time and resources being wasted. The prefix-free codes that are usually considered utilize an alphabet for which the cost associated with transmitting any particular character is the same for any character. Though an understanding of this case is sufficient for most practical ap- plications, it is by no means the only possible case. One can consider the variation of this problem in which different characters cost different amounts to transmit. In this situation the standard results proven in the literature do not apply, and it is necessary to generalize them. The goal of this paper is to extend some central results of equal-cost prefix-free codes to unequal-cost prefix-free codes with binary alphabets. Generalization of Fundamental Bound In this section we determine a fundamental lower bound for the average code- word length of a prefix-free code with binary alphabet {., −} with length costs 1 and 2 respectively for each of these characters. The proof will not assume that the characters have equal cost, leading to a more general result that will be applicable to unequal-cost prefix-free codes. First we generalize the Kraft Inequality: Date: March 25, 2015. 1
  • 2. 2 DANIEL BULHOSA Theorem 1 (Kraft Inequality): Suppose that C is a binary finite set of codewords that form a prefix-free code, with symbols . and − costing 1 and 2 units of length respectively. Then for φ equal to the golden ratio: words inC φ−l(word) ≤ 1 Proof: First note that the only code of maximal length 1 that fits this description is {.}, and it obeys the inequality. The only codes of maximal length 2 fitting this description are {.}, {., −}, {.., −}, and {−} and they also obey the inequality. Now we induct over the maximal length of the code. Let lmax be the length of the longest word in the code C. Assume that the statement to be proved is true for all codes with maximal legth l < lmax. We can create two codes with maximal lengths less than lmax as follows: Take all the words starting with ., remove the ., and let the set of these words form a code. This code C. inherits the prefix-free property from C and its maximal length is at most lmax −1. If we do the same thing for all words starting with − we end up with a code C− with maximal length of at most lmax −2. Now, by the inductive hypothesis: wordsinC. φ−l.(word) ≤ 1 and wordsinC− φ−l−(word) ≤ 1 And by construction, words in C. φ−l.(word)−1 + words in C− φ−l−(word)−2 = words inC φ−l(word) since . and − have lenghts 1 and 2 respectively. Combining this equation and the inequalities we find that: ∞ words inC φ−l(word) = words in C. φ−l.(word)−1 + words in C− φ−l−(word)−2 ≤ φ−1 + φ−2 = 1 Now we prove the fundamental bound holds for these type of codes. Theorem 2: Suppose that C is a code as described in Theorem 1, describing the outcomes of some random variable X with probability distribution p. Then the average length L of the codewords obeys L ≥ Hφ(p).
  • 3. UNEQUAL-COST PREFIX-FREE CODES 3 Proof: We follow the template of Theorem 5.3.1 of Cover and Thomas. Let r(x) = φ−l(x) / x φ−l(x) and c = x φ−l(x) . The difference of the average length L and the entropy can be written as: L − Hφ(p) = x p(x)l(x) + X p(x) logφ p(x) = − x p(x) logφ φ−l(x) + X p(x) logφ p(x) = − x p(x) logφ φ−l(x) x φ−l(x) + X p(x) logφ p(x) + x p(x) logφ 1 x φ−l(x) = x p(x) logφ p(x) r(x) + log c = D(p||r) + log c ≥ 0 Here the inequality follows from the fact that the relative entropy is positive definite, and the fact that c ≥ 1 by the Kraft inequality. Note that the inequality is saturated if and only if the inequality is saturated and if p = r. Achieving the Fundamental Bound Our goal now is to demonstrate the existence of prefix-free codes of this type whose average word length is within 2 units of the fundamental bound. Our ap- proach will be to prove a version of the converse Kraft inequality and then using a set of lengths motivated by Theorem 2 to generate the desired code. Before doing this however, we will prove a useful lemma. First we make a few useful definitions. Let C be a prefix-free code formed with the alphabet {., −}, then we define T(C) as the tree representation of C. In this representation, every childless node of T(C) corresponds to a unique word in C. The word correspond- ing to a given childless node can be constructed by following the path to the node starting from the root, and adding a . for every left path (child) that is taken and a − for every right path that is taken. Once we arrive to the childless node all neces- sasry characters will be added and the word will be completed. Now, for the lemma: Lemma: Let C be a prefix-free code such that T(C) is a full binary tree. Let . have length 1 and − have length 2. Then: words in C φ−l(word) = 1
  • 4. 4 DANIEL BULHOSA Proof: First note that given a generic complete and full tree T we can create a prefix-free code C by considering all of the words with N letters (independent of cost). Here N is the depth of the childless nodes of T . A simple combinatoric argument based on counting the number n of −’s in a word shows that: words in C φ−l(word) = 0≤n≤N N n φ−(N+n) Now we show that this sum is equal to 1. First note that when N = 1 this sum reduces to φ−1 +φ−2 = 1, so the base case holds. Now, assume that the postulated equality holds for N, then: 1 = φ−1 + φ−2 = 0≤n≤N N n φ−[(N+1)+n] + 0≤n≤N N n φ−[(N+1)+(n+1)] = 0≤n≤N N n φ−[(N+1)+n] + 1≤n≤N+1 N n − 1 φ−[(N+1)+(n)] = φ−(N+1) + 1≤n≤N N n + N n − 1 φ−[(N+1)+n] + φ−2(N+1) = φ−(N+1) + 1≤n≤N N + 1 n φ−[(N+1)+n] + φ−2(N+1) = 0≤n≤N+1 N + 1 n φ−[(N+1)+n] Thus by induction the sum is equal to 1 for all N ≥ 1. So for a code with a complete and full tree the Kraft inequality is saturated. Now consider the tree T(C). Let N be the depth of the deepest childless node of T(C). We can create a complete and full tree T (C) from T(C) by appending complete and full subtrees to all the childless nodes that have depth less than N. Note that if w is some word in C and D(w) is the set of its childless descendents then: D(w) in T (C) φ−l(descendant) = φ−l(w) · D(w) in T (C) φ−(l(descendant)−l(w)) = φ−l(w) This follows from the fact that the subtree of T (C) rooted at w is complete and full (by construction), that its childless nodes have lenghts l − l(w), and the fact that complete and full trees saturate the Kraft inequality as shown above. The implication of this equation is that:
  • 5. UNEQUAL-COST PREFIX-FREE CODES 5 w∈C φ−l(w) = w∈C D(w) in T (C) φ−l(descendant) = cn of T (C) φ−l(cn) = 1 Here cn stands for childness node. The second equality follows from the fact that the sets of childless descendents D(w) are disjoint, and their union encompases all childless nodes of the tree T (C). Now we prove the converse of the Kraft inequality: Theorem 3 (Converse Kraft Inequality): Suppose that L = {l1, ..., ln} is a set of lengths (which may repeat) satisfying the Kraft inequality: l∈L φ−l ≤ 1 Then there exists some prefix-free code C with alphabet {., −}, where l(.) = 1 and l(−) = 2, such that the lenght of each word is either equal to li or li +1 for a unique 1 ≤ i ≤ n. Proof: We proceed by induction on the size of the set of lengths. Note that the statement is trivial when there is only one length l1 in L, as then there is only one word and a code with one word is trivially prefix-free. Note also that this length can be arbitrary. This covers the base case. Now suppose that L contains n lengths, some of which may be equal to one another. Suppose that these lengths satisfy the Kraft inequality, and assume that the theorem holds for all L of size n−1. Then the set L = L−{ln} that we create by removing the last length must also obey the Kraft inequality: l∈L φ−l < l∈L φ−l ≤ 1 The set L contains n − 1 lengths, so by the inductive hypothesis there exists some prefix-free code C such that the length of each word is either equal to li or li + 1 for a unique 1 ≤ i ≤ n − 1. Now, note that since the new set of lengths L obeys the Kraft inequality strictly the contrapositive of the Lemma implies that the tree T(C ) is not full. This means some node has a child available for whom a codeword has not been assigned. Furthermore, the other child of this node must lead to codewords so the set of characters leading to this node must have length of at most ln−1. Thus if ln > ln−1 we can create the code C from the code C by simply creating a path with appropiate length starting from this node and going through its unassigned child.
  • 6. 6 DANIEL BULHOSA The remaining concern is if ln = ln−1, for if this is the case and the characters leading to the node in question have length ln this implies that the word corre- sponding to ln must have length ln + 1. In turn, this means that the unassigned child for the node in question leads to a word that has length ln +2, which is larger than our goal. However, if in fact ln = ln−1 and the length of the path to the node is ln−1 we can do the following: Imagine that we do create a codeword w1 of length ln + 2 for this node, and create a code C by adding this word to the code C . Since ln + 2 < ln The set of lengths for C must obey the Kraft inequality strictly, so there is some node in the tree T(C ) with an unassigned child. Furthermore, the characters leading to this node must have length of at most ln−2. We can take out the word w1 from C and then add a new word w2 that is equal to the path to this unassigned child. Again we must determine whether ln = ln−2 and if the length of the path leading to this new node is ln−2. If this is not the case adding w2 to C gives the desired code C. If this is the case however we can again repeat the process by adding the words w1, w2, ... to the code C to get a new code C(n) strictly obeying the Kraft inequality. We are guarateed that ln > lk for some 0 < k < n for otherwise the Kraft inequality would be contradicted. The Kraft inequality must be generalized in this manner because for some choices of lengths one may obey the Kraft inequality but not have enough words of that length to create a code. For example, consider the set of lengths L = {3, 3, 3, 3}. This set obeys the Kraft inequality, but there are only three words we can create from this alphabet that have length three: namely .., −., and .−. There are still words of length four available after we use these three, and increasing any of the lengths does not affect compliance with the Kraft inequality. Having proved the Kraft inequality we can now prove the principal result of this section. Returning to the result of Theorem 2, we found that if r = p and the Kraft inequality is satisfied then the entropy bound is saturated. Imposing these conditions, namely r = p along with c = 1, implies that the lengths must be: p(x) = φ−l(x) / x φ−l(x) = φ−l(x) =⇒ l(x) = logφ 1 p(x) These may not valid lengths because they may be non-integer. However the lenghts, l (x) = logφ 1 p(x) still obey the Kraft inequality and they are intergers. We use these lenghts to prove the main result.
  • 7. UNEQUAL-COST PREFIX-FREE CODES 7 Theorem 4: Let p be the distribution for a random variable X. Let l (x) be the length associated with the outcome X = x. Then there exists a prefix-free code C with alphabet {., −} such that: Hφ(p) ≤ L ≤ Hφ(p) + 2 It follows that the optimal code C∗ for this probability distribution must be at least as good as C. Proof: Since the lenghts l (x) obey the Kraft inequality then Theorem 3 implies that there is a code C such that the length of the word corresponding to x is either l (x) or l (x) + 1. Let lC(x) denote the lengths of the words in this code. Then the average word length L satisfies: L = x p(x)lC(x) ≤ x p(x)[l (x) + 1], worst case for Theorem 3 ≤ x p(x) logφ 1 p(x) + 2 , property of celing function = Hφ(p) + 2 x p(x) = Hφ(p) + 2 L must also obey the fundamental bound from Theorem 2, so the result follows. Conclusion We have seen how the principal results for equal-cost prefix-free codes can be generalized to the class of unequal-cost prefix-free codes with a binary dictionary and costs 1 and 2. Although the results may not have any obvious practical ap- plications, they do motivate an a roadmap for the interesting more general case of non-binary codes with a different set of costs for the characters. To illustrate the point, the key observation in proving Theorem 1 was to note that: φ−2 + φ−1 = 1 It is simple to see however that this condition is equivalent to the condition φ2 − φ − 1 = 0, which is the equation that defines the golden ratio. Thus it is apparent that for a general unequal-cost code with costs c1, c2, ...cn the kind of relationship that we would exploit to generalize the Kraft inequality would take the form:
  • 8. 8 DANIEL BULHOSA ξ−c1 + ξ−c2 + ... + ξ−cn = 1 This equation defines an algebrabic number ξ, which if real we can use to generalize the Kraft inequality in the form: words in C ξ−l(words) ≤ 1 The generalization would likely follow an identical prescription as that of Theorem 1. It is worth noting that every polynomial with odd degree and real coefficients has at least one real root, which means that if the highest cost cmax is an odd number such a ξ is guaranteed to exist. Thus at least certain classes of general unequal- cost codes are promising for this type of generalization. This is an interesting generalization from the standpoint of pure mathematics.