1
Data Compression
2
Introduction
 Compression is used to :
 reduce the volume of information to be
stored into storages
 reduce the communication bandwidth
required for its transmission over the
networks
3
4
Compression Principles
Entropy Encoding
1. Run-length encoding
 Lossless & Independent of the type of source
information
 Used when the source information comprises
long substrings of the same character or
binary digit
(string or bit pattern, # of occurrences), as
FAX
e.g) 000000011111111110000011……
⇒ 0,7 1,10 0,5 1,2…… ⇒ 7,10,5,2……
5
Entropy Encoding
2. Statistical encoding
 Based on the probability of occurrence of a
pattern
 The more probable, the shorter codeword
6
Compression Principles
 Huffman Encoding
 Entropy, H: theoretical min. avg. # of bits that are required
to transmit a particular stream
H = -Σ i=1
n
Pi log2Pi
where n: # of symbols, Pi: probability of symbol i
 Efficiency, E = H/H’
where, H’ = avr. # of bits per codeword = Σ i=1
n
Ni Pi
Ni: # of bits of symbol i
7
 E.g) symbols M(10), F(11), Y(010), N(011), 0(000),
1(001) with probabilities 0.25, 0.25, 0.125, 0.125,
0.125, 0.125
 H’ = Σ i=1
6
Ni Pi = (2(2×0.25) + 4(3×0.125)) = 2.5
bits/codeword
 H = -Σ i=1
6
Pi log2Pi = - (2(0.25log20.25) +
4(0.125log20.125)) = 2.5
 E = H/H’ =100 %
 3-bit/codeword if we use fixed-length codewords for six
symbols
8
Huffman Algorithm (Variable-Length
Encoding)
Method Konstruksi pohon encoding
• Full Binary Tree Representation
• Each edge of the tree has a value,
(0 is the left child, 1 is the right child)
• Data is at the leaves, not internal nodes
• Result: encoding tree
9
Huffman Algorithm
• 1. Maintain a forest of trees
• 2. Weight of tree = sum frequency of
leaves
• 3. For 0 to N-1
– Select two smallest weight trees
– Form a new tree
10
• Huffman coding
• variable length code whose length is inversely
proportional to that character’s frequency
• must satisfy nonprefix property to be uniquely
decodable
• two pass algorithm
– first pass accumulates the character frequency
and generate codebook
– second pass does compression with the
codebook
11
• create codes by constructing a binary tree
1. consider all characters as free nodes
2. assign two free nodes with lowest frequency to
a parent nodes with weights equal to sum of
their frequencies
3. remove the two free nodes and add the newly
created parent node to the list of free nodes
4. repeat step2 and 3 until there is one free node
left. It becomes the root of tree
Huffman coding
12
• Right of binary tree :1
• Left of Binary tree :0
• Prefix (example)
– e:”01”, b: “010”
– “01” is prefix of “010” ==> “e0”
• same frequency : need consistency of
left or right
13
Static Huffman Coding
 Huffman (Code) Tree
 Hitung jumlah symbols atau characters dan probabillitas relatif
prior
 Must hold “prefix property” among codes
Symbol Occurrence
A 4/8
B 2/8
C 1/8
D 1/8
Symbol Code
A 1
B 01
C 001
D 000
4×1 + 2×2 + 1×3 +
1×3 = 14 bits are
required to transmit
“AAAABBCD”
0 1
D
A
B
C
0 1
0 18
4
2
Leaf node
Root node
Branch node
Prefix Property !
14
• Contoh (Data dengan 64 karakter)
• R K K K K K K K
• K K K R R K K K
• K K R R R R G G
• K K B C C C R R
• G G G M C B R R
• B B B M Y B B R
• G G G G G G G R
• G R R R R G R R
15
• Character frequency Huffman code
• =================================
• R 19 00
• K 17 01
• G 14 10
• B 7 110
• C 4 1110
• M 2 11110
• Y 1 11111
16
17
Tujuan kompresi data adalah
untuk merepresentasikan suatu
data digital dengan sesedikit
mungkin bit.
Soal :
Tentukanlah kode masing-masing Karakter pada Text berikut dengan
menggunakan Huffman code

Komdat-Kompresi Data

  • 1.
  • 2.
    2 Introduction  Compression isused to :  reduce the volume of information to be stored into storages  reduce the communication bandwidth required for its transmission over the networks
  • 3.
  • 4.
    4 Compression Principles Entropy Encoding 1.Run-length encoding  Lossless & Independent of the type of source information  Used when the source information comprises long substrings of the same character or binary digit (string or bit pattern, # of occurrences), as FAX e.g) 000000011111111110000011…… ⇒ 0,7 1,10 0,5 1,2…… ⇒ 7,10,5,2……
  • 5.
    5 Entropy Encoding 2. Statisticalencoding  Based on the probability of occurrence of a pattern  The more probable, the shorter codeword
  • 6.
    6 Compression Principles  HuffmanEncoding  Entropy, H: theoretical min. avg. # of bits that are required to transmit a particular stream H = -Σ i=1 n Pi log2Pi where n: # of symbols, Pi: probability of symbol i  Efficiency, E = H/H’ where, H’ = avr. # of bits per codeword = Σ i=1 n Ni Pi Ni: # of bits of symbol i
  • 7.
    7  E.g) symbolsM(10), F(11), Y(010), N(011), 0(000), 1(001) with probabilities 0.25, 0.25, 0.125, 0.125, 0.125, 0.125  H’ = Σ i=1 6 Ni Pi = (2(2×0.25) + 4(3×0.125)) = 2.5 bits/codeword  H = -Σ i=1 6 Pi log2Pi = - (2(0.25log20.25) + 4(0.125log20.125)) = 2.5  E = H/H’ =100 %  3-bit/codeword if we use fixed-length codewords for six symbols
  • 8.
    8 Huffman Algorithm (Variable-Length Encoding) MethodKonstruksi pohon encoding • Full Binary Tree Representation • Each edge of the tree has a value, (0 is the left child, 1 is the right child) • Data is at the leaves, not internal nodes • Result: encoding tree
  • 9.
    9 Huffman Algorithm • 1.Maintain a forest of trees • 2. Weight of tree = sum frequency of leaves • 3. For 0 to N-1 – Select two smallest weight trees – Form a new tree
  • 10.
    10 • Huffman coding •variable length code whose length is inversely proportional to that character’s frequency • must satisfy nonprefix property to be uniquely decodable • two pass algorithm – first pass accumulates the character frequency and generate codebook – second pass does compression with the codebook
  • 11.
    11 • create codesby constructing a binary tree 1. consider all characters as free nodes 2. assign two free nodes with lowest frequency to a parent nodes with weights equal to sum of their frequencies 3. remove the two free nodes and add the newly created parent node to the list of free nodes 4. repeat step2 and 3 until there is one free node left. It becomes the root of tree Huffman coding
  • 12.
    12 • Right ofbinary tree :1 • Left of Binary tree :0 • Prefix (example) – e:”01”, b: “010” – “01” is prefix of “010” ==> “e0” • same frequency : need consistency of left or right
  • 13.
    13 Static Huffman Coding Huffman (Code) Tree  Hitung jumlah symbols atau characters dan probabillitas relatif prior  Must hold “prefix property” among codes Symbol Occurrence A 4/8 B 2/8 C 1/8 D 1/8 Symbol Code A 1 B 01 C 001 D 000 4×1 + 2×2 + 1×3 + 1×3 = 14 bits are required to transmit “AAAABBCD” 0 1 D A B C 0 1 0 18 4 2 Leaf node Root node Branch node Prefix Property !
  • 14.
    14 • Contoh (Datadengan 64 karakter) • R K K K K K K K • K K K R R K K K • K K R R R R G G • K K B C C C R R • G G G M C B R R • B B B M Y B B R • G G G G G G G R • G R R R R G R R
  • 15.
    15 • Character frequencyHuffman code • ================================= • R 19 00 • K 17 01 • G 14 10 • B 7 110 • C 4 1110 • M 2 11110 • Y 1 11111
  • 16.
  • 17.
    17 Tujuan kompresi dataadalah untuk merepresentasikan suatu data digital dengan sesedikit mungkin bit. Soal : Tentukanlah kode masing-masing Karakter pada Text berikut dengan menggunakan Huffman code