4. Video: 30 pictures per second
Each picture = 200,000 dots or pixels
8-bits to represent each primary color
For RGB = 28 x 28 x 28
Bits required for one second movie = 503316480 pixels
Two hour movie requires = 2 x 60 x 60 x 503316480
5.
6. Introduction
Compression is a way to reduce the number of bits in a
frame but retaining its meaning.
Decreases space, time to transmit, and cost
Technique is to identify redundancy and to eliminate it
If a file contains only capital letters, we may encode all
the 26 alphabets using 5-bit numbers instead of 8-bit
ASCII code
If the file had n-characters, then the savings = (8n-5n)/8n
=> 37.5%
8. Lossless Compression
In lossless data compression:-
o The integrity of the data is preserved.
o The original data and the data after compression and
decompression are exactly the same.
o No data loss.
o Redundant data is removed in compression and added
during decompression.
o Lossless compression methods are normally used
when we cannot afford to lose any data.
17. Now if we read these aloud it’s not
So weird
“Three apples, two pears, one banana, two oranges
and one apple”
.........And it saves SPACE
18. Now to translate into
computer terms...
A scan line contains a run of numbers...
55556987444425555611111988888222222222
...Using run-length Encoding
(4,5) (1,6) (1,9) (1,8) (1,7)
(4,4) (1,2) (4,5) (1,6) (5,1)
(1,9) (5,8) (9,2)
19. To Sum it up.....
In Wikipedia terms.....
Run-length encoding (RLE) is a very simple
form of data compression in which runs of data
(that is, sequences in which the same data
value occurs in many consecutive data
elements) are stored as a single data value
and count, rather than as the original run
20. Huffman Coding
Huffman coding is credited to David Albert Huffman
Huffman coding is an entropy encoding algorithm used
for lossless data compression.
Huffman coding is a method of storing strings of data as
binary code in efficient manner
Huffman coding uses variable length coding which
means that symbols in the data you are encoded are
converted in to a binary symbol based on how often that
symbol is used
There is a way to decide what binary code to give to each
character using trees
21. The (Real) Basic Algorithm
Scan text to be compressed and tally occurrence of all
characters.
Sort or prioritize characters based on number of
occurrences in text.
Build Huffman code tree based on prioritized list.
Perform a traversal of tree to determine all code words.
Scan text again and create new file using the Huffman
codes.
22. Building a Tree
Scan the original text
Consider the following short text:
Eerie eyes seen near lake.
Count up the occurrences of all characters in the text
CS 102
23. Building a Tree
Scan the original text
Eerie eyes seen near lake.
What characters are present?
E e r i space
y s n a r l k .
CS 102
24. Eerie eyes seen near lake.
What is the frequency of each character in the
text?
Char Freq
E 1
e 8
r 2
i 1
Space 4
y 1
s 2
n 2
CS 102
Char Freq
a 2
l 1
k 1
. 1
Building a Tree
Scan the original text
25. The queue after inserting all nodes
Null Pointers are not shown
CS 102
E
1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
Building a Tree
26. CS 102
E
1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
BUILDING A TREE
27. CS
102
E1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
2
BUILDING A TREE
28. CS
102
E1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
2
BUILDING A TREE
29. CS
102
E1
i
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
2
y
1
l
1
2
BUILDING A TREE
30. CS
102
E1
i
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
2
y
1
l
1
2
BUILDING A TREE
31. CS
102
BUILDING A TREE
E1
i
1
r
2
s
2
n
2
a
2
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
32. CS
102
BUILDING A TREE
E1
i
1
r
2
s
2
n
2
a
2
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
33. CS
102
BUILDING A TREE
E1
i
1
n
2
a
2
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
34. CS
102
E1
i
1
n
2
a
2
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
BUILDING A TREE
35. CS
102
E1
i
1
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
4
BUILDING A TREE
36. CS
102
E1
i
1
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
4
BUILDING A TREE
37. CS
102
BUILDING A TREE
E1
i
1
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
4
4
38. CS
102
4 4
E1
i
1
sp
4
e
2 8
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
BUILDING A TREE
39. CS
102
BUILDING A TREE
4 4
E1i
1
sp
4
e
2 8
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
6
40. CS
102
BUILDING A TREE
4 4 6
E1
i
1
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
What is happening to the characters with a low number of occurrences?
41. CS
102
E1
i
1
sp
4
e
2 8
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
4
4
6
8
BUILDING A TREE
42. CS
102
BUILDING A TREE
E1
i
1
sp
4
e
2 8
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
4
4
6 8
43. CS
102
E1
i
1
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2 4
4
6
8
10
BUILDING A TREE
44. CS
102
BUILDING A TREE
E1
i
1
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
r s
2
2
2
4
n2
a2 4 4
6
8 10
45. CS
102
E1
i
1
sp
4
e8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
4
4
6
8
10
16
BUILDING A TREE
46. CS
102
E1
i
1
sp
4
e
2 8
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n
2
a
2
4
4
6
8
10
16
BUILDING A TREE
47. CS
102
BUILDING A TREE
E1
i
1
sp
4
e
8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n
2
a
2
4
4
6
8
10
16
26
48. CS
102
E1
i
1
sp
4
e8
2
y
1
l
1
2
k
1
.
1
2
r
2
s
2
4
n2
a2
4
4
6
8
10
16
26
After enqueueing this node
there is only one node left
in priority queue.
BUILDING A TREE
49. CS 102
Perform a traversal of the
tree to obtain new code
words
Going left is a 0 going right
is a 1
code word is only
completed when a leaf
node is reached
E1
i
1
sp
4
e8
2
y
1
l
1
2
k
1
.
1
2
r
2
4
s
2
n2
a2
4
4
6
8
10
16
26
Encoding the File
Traverse Tree for Codes
50. CS 102
ENCODING THE FILE
TRAVERSE TREE FOR CODES
Char Code
E 0000
i 0001
y 0010
l 0011
k 0100
. 0101
space 011
e 10
r 1100
s 1101
n 1110
a 1111
E1
i
1
sp
4
e8
2
y
1
l
1
2
k
1
.
1
2
r
2
4
s
2
n2
a2
4
4
6
8
10
16
26
51. CS 102
ENCODING THE FILE
Rescan text and encode file
using new code words
Eerie eyes seen near lake.
Char Code
E 0000
i 0001
y 0010
l 0011
k 0100
. 0101
space 011
e 10
r 1100
s 1101
n 1110
a 1111
0000101100000110011100010101101101
00111110101111110001100111111010010
0101
Why is there no need for a
separator character?
.
52. CS 102
ENCODING THE FILE
RESULTS
Have we made things any
better?
73 bits to encode the text
ASCII would take 8 * 26 =
208 bits
0000101100000110011100010101101101
00111110101111110001100111111010010
0101
53. Lemple Ziv (LZ) Encoding
Data compression up until the late 1970's mainly directed
towards creating better methodologies for Huffman coding.
An innovative, radically different method was introduced
in1977 by Abraham Lempel and Jacob Ziv.
This technique ( called Lempel-Ziv) actually consists of two
considerably different algorithms, LZ77 and LZ78.
Due to patents, LZ77 and LZ78 led to many variants.
LZ77 LZR LZSS LZB LZH
Variants
LZ78 LZW LZC LZT LZMW LZJ LZFG
Variants
The zip and unzip use the LZH technique while UNIX's
compress methods belong to the LZW and LZC classes
54. EXAMPLE : LZ78 COMPRESSION
Encode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ78 algorithm.
The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)
Note: The above is just a representation, the commas and parentheses are not transmitted;
we will discuss the actual form of the compressed message later on in slide 12.
55. EXAMPLE : LZ78 COMPRESSION (CONT’D)
1. A is not in the Dictionary; insert it
2. B is not in the Dictionary; insert it
3. B is in the Dictionary.
BC is not in the Dictionary; insert it.
4. B is in the Dictionary.
BC is in the Dictionary.
BCA is not in the Dictionary; insert it.
5. B is in the Dictionary.
BA is not in the Dictionary; insert it.
6. B is in the Dictionary.
BC is in the Dictionary.
BCA is in the Dictionary.
BCAA is not in the Dictionary; insert it.
7. B is in the Dictionary.
BC is in the Dictionary.
BCA is in the Dictionary.
BCAA is in the Dictionary.
BCAAB is not in the Dictionary; insert it.
56. Lossy Compression Methods
Used for compressing images and video files
(our eyes cannot distinguish subtle changes, so
lossy data is acceptable).
These methods are cheaper, less time and
space.
Several methods:
JPEG: compress pictures and graphics
MPEG: compress video
MP3: compress audio
57. JPEG Encoding
Used to compress pictures and graphics.
In JPEG, a grayscale picture is divided into 8x8
pixel blocks to decrease the number of
calculations.
Basic idea:
Change the picture into a linear (vector) sets of numbers that
reveals the redundancies.
The redundancies is then removed by one of lossless
compression methods.
58. JPEG Encoding - DCT
DCT: Discrete Concise Transform
DCT transforms the 64 values in 8x8 pixel block
in a way that the relative relationships between
pixels are kept but the redundancies are
revealed.
Example:
A gradient grayscale
59. Quantization & Compression
Quantization:
After T table is created, the values are quantized to reduce the
number of bits needed for encoding.
Quantization divides the number of bits by a constant, then
drops the fraction. This is done to optimize the number of bits
and the number of 0s for each particular application.
• Compression:
Quantized values are read from the table and redundant 0s are
removed.
To cluster the 0s together, the table is read diagonally in an
zigzag fashion. The reason is if the table doesn’t have fine
changes, the bottom right corner of the table is all 0s.
JPEG usually uses lossless run-length encoding at the
compression phase.
61. MPEG Encoding
Used to compress video.
Basic idea:
Each video is a rapid sequence of a set of
frames. Each frame is a spatial combination
of pixels, or a picture.
Compressing video =
spatially compressing each frame
+
temporally compressing a set of
frames.
62. MPEG Encoding
• Spatial Compression
• Each frame is spatially compressed by JPEG.
• Temporal Compression
• Redundant frames are removed.
• For example, in a static scene in which someone is talking,
most frames are the same except for the segment around the
speaker’s lips, which changes from one frame to the next.
63. Audio Compression
Used for speech or music
Speech: compress a 64 kHz digitized signal
Music: compress a 1.411 MHz signal
Two categories of techniques:
Predictive encoding
Perceptual encoding
64. Audio Encoding
•Predictive Encoding
•Only the differences between samples are encoded, not
the whole sample values.
•Several standards: GSM (13 kbps), G.729 (8 kbps), and
G.723.3 (6.4 or 5.3 kbps)
•Perceptual Encoding: MP3
•CD-quality audio needs at least 1.411 Mbps and cannot
be sent over the Internet without compression.
•MP3 (MPEG audio layer 3) uses perceptual encoding
technique to compress audio.
65. Conclusion
Compression is used in all types of data
to save space and time. There are two
types of data compression-lossy and
lossless. Lossy techniques are used for
images, videos and audios, where we
can bear data loss. Lossless technique
is used for textual data it can be
encoded through run-length, Huffman
and Lempel Ziv.
66. References
http://www.csie.kuas.edu.tw/course/cs/englis
h/ch-15.ppt
CS157B-Lecture 19 by Professor Lee
http://cs.sjsu.edu/~lee/cs157b/cs157b.html
“The essentials of computer organization
and architecture” by Linda Null and Julia
Nobur
.
http://www.wekipedia.com