Upcoming SlideShare
×

# Python & Perl: Lecture 05

2,097 views

Published on

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
2,097
On SlideShare
0
From Embeds
0
Number of Embeds
926
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Python & Perl: Lecture 05

1. 1. Python & Perl Lecture 05 Vladimir Kulyukin Department of Computer Science Utah State Universitywww.youtube.com/vkedco www.vkedco.blogspot.com
2. 2. Outline ● Data Abstraction: Building Huffman Trees with Lists and Tuples ● List Manipulation with Built-In Methods: append(), extend(), reverse(), remove(), index(), count(), and sort()www.youtube.com/vkedco www.vkedco.blogspot.com
3. 3. Data Abstraction Building Huffman Trees with Lists and Tupleswww.youtube.com/vkedco www.vkedco.blogspot.com
4. 4. Background ● In information theory, coding refers to methods that represent data in terms of bit sequences (se- quences of 0s and 1s) ● Encoding is a method of taking data structures and mapping them to bit sequences ● Decoding is a method of taking bit sequences and outputting the corresponding data structurewww.youtube.com/vkedco www.vkedco.blogspot.com
5. 5. Example: Standard ASCII & Unicode ● Standard ASCII encodes each character as a 7-bit sequence 7 ● Using 7 bits allows us to encode 2 possible characters ● Unicode has three standards: UTF-8 (uses 8-bit sequences), UTF-16 (uses 16-bit sequences), and UTF-32 (uses 32-bit sequences) ● UTF stands for Unicode Transformation Format ● Python 2.Xs Unicode support: “Python represents Unicode strings as either 16- or 32-bit integers), depending on how the Python interpreter was compiled.”www.youtube.com/vkedco www.vkedco.blogspot.com
6. 6. Two Types of Codes ● There are two types of codes: fixed-length and variable- length ● Fixed-length (e.g., ASCII, Unicode) codes encode every character in terms of the same number of bits ● Variable-length codes (e.g., Morse, Huffman) encode characters in terms of variable numbers of bits: more fre- quent symbols are encoded with fewer bitswww.youtube.com/vkedco www.vkedco.blogspot.com
7. 7. Example: Fixed-Length Code ● A – 000 C – 010 E – 100 G – 110 ● B – 001 D – 011 F – 101 H – 111 ● AADF = 000000011101 ● The encoding of AADF is 12 bitswww.youtube.com/vkedco www.vkedco.blogspot.com
8. 8. Example: Variable-Length Code ● A–0 C – 1010 E – 1100 G – 1110 ● B – 100 D – 1011 F – 1101 H – 1111 ● AADF = 0010111101 ● The encoding of AADF is 10 bitswww.youtube.com/vkedco www.vkedco.blogspot.com
9. 9. Example: Variable-Length Code ● There are two types of codes: fixed-length and variable- length ● Fixed-length (e.g., ASCII, Unicode) codes encode every character in terms of the same number of bits ● Variable-length codes (e.g., Morse, Huffman) encode characters in terms of variable numbers of bits: more frequent symbols are encoded with fewer bitswww.youtube.com/vkedco www.vkedco.blogspot.com
10. 10. End of Character in Variable-Length Code ● One of the challenges in variable-length codes is knowing where one character ends and the one begins ● Morse uses a special character (separator code) ● Prefix coding is another solution: the prefix of every character is unique – no code of any character starts an- other characterwww.youtube.com/vkedco www.vkedco.blogspot.com
11. 11. Huffman Code ● Huffman code is a variable-length code that takes advan- tage of relative frequencies of characters ● Huffman code is named after David Huffman, the re- searcher who discovered it ● Huffman code is represented as a binary tree where leaves are individual characters and their frequencies ● Each non-leaf node is a set of characters in all of its sub- nodes and the sum of their relative frequencieswww.youtube.com/vkedco www.vkedco.blogspot.com
12. 12. Huffman Tree Example {A, B, C, D, E, F, G, H}: 17 1 0 {B, C, D, E, F, G, H}: 9 A: 8 1 0 {E, F, G, H}: 4 {B, C, D}: 5 0 1 0 1 {C, D}: 2 {E, F}: 2 {G, H}: 2 B: 3 0 1 0 1 0 1 C: 1 D: 1 E: 1 F: 1 G: 1 H: 1www.youtube.com/vkedco www.vkedco.blogspot.com
13. 13. Using Huffman Tree to Encode/Decode Characters ● The tree on the previous slide, these are the encodings: – A is encoded as 0 – B is encoded as 100 – C is encoded as 1010 – D is encoded as 1011 – E is encoded as 1100 – F is encoded as 1101 – G is encoded as 1110 – H is encoded as 1111www.youtube.com/vkedco www.vkedco.blogspot.com
14. 14. Building The Tree Bottom Upwww.youtube.com/vkedco www.vkedco.blogspot.com
15. 15. Constructing Leaves ### a leaf is a tuple whose first element is symbol ### represented as a string and whose second element is ### the symbols frequency def make_leaf(symbol, freq): return (symbol, freq) def is_leaf(x): return isinstance(x, tuple) and len(x) == 2 and isinstance(x[0], str) and isinstance(x[1], int)www.youtube.com/vkedco www.vkedco.blogspot.com
16. 16. Constructing Leaves ### return the character (symbol) of the leaf def get_leaf_symbol(leaf): return leaf[0] ### return the frequency of the leafs character def get_leaf_freq(leaf): return leaf[1]www.youtube.com/vkedco www.vkedco.blogspot.com
17. 17. Constructing Huffman Trees### A Non-Leaf node (internal node) is represented as### a list of four elements:### 1. left brach### 2. right branch### 3. list of symbols### 4. combined frequency of symbols [left_branch, right_branch, symbols, frequency]www.youtube.com/vkedco www.vkedco.blogspot.com
18. 18. Accessing Huffman Trees def get_leaf_symbol(leaf): return leaf[0] def get_leaf_freq(leaf): return leaf[1] def get_left_branch(huff_tree): return huff_tree[0] def get_right_branch(huff_tree): return huff_tree[1]www.youtube.com/vkedco www.vkedco.blogspot.com
19. 19. Accessing Huffman Trees def get_symbols(huff_tree): if is_leaf(huff_tree): return [get_leaf_symbol(huff_tree)] else: return huff_tree[2] def get_freq(huff_tree): if is_leaf(huff_tree): return get_leaf_freq(huff_tree) else: return huff_tree[3]www.youtube.com/vkedco www.vkedco.blogspot.com
20. 20. Constructing Huffman Trees ### A Huffman tree is constructed from its left branch, which can ### be a huffman tree or a leaf, and its right branch, another ### huffman tree or a leaf. The new tree has the symbols of the ### left branch and the right branch and the frequency of the left ### branch and the right branch def make_huffman_tree(left_branch, right_branch): return [left_branch, right_branch, get_symbols(left_branch) + get_symbols(right_branch), get_freq(left_branch) + get_freq(right_branch)]www.youtube.com/vkedco www.vkedco.blogspot.com
21. 21. MAKE_HUFFMAN_TREE Example ht01 = make_huffman_tree(make_leaf(A, 4), make_huffman_tree(make_leaf(B, 2), make_huffman_tree(make_leaf(D, 1), make_leaf(C, 1)))) {A, B, D, C}: 8 {B, D, C}: 4 A: 4 {D, C}: 2 B: 2 D: 1 C: 1www.youtube.com/vkedco www.vkedco.blogspot.com
22. 22. MAKE_HUFFMAN_TREE Example Python data structure that represents the Huffman tree below: [(A, 4), [(B, 2), [(D, 1), (C, 1), [D, C], 2], [B, D, C], 4], [A, B, D, C], 8] {A, B, D, C}: 8 {B, D, C}: 4 A: 4 B: 2 {D, C}: 2 D: 1 C: 1www.youtube.com/vkedco www.vkedco.blogspot.com
23. 23. List Manipulation with Built-In Methods append(), extend(), reverse(), remove(), index(), count(), sort()www.youtube.com/vkedco www.vkedco.blogspot.com
24. 24. list.append() ● The method append() adds an object at the end of the list >>> lst1 = [1, 2, 3] >>> lst1.append(a) >>> lst1 [1, 2, 3, a] >>> lst1.append([b, c]) >>> lst1 [1, 2, 3, a, [b, c]]www.youtube.com/vkedco www.vkedco.blogspot.com
25. 25. list.extend() ● The method extend() also destructively adds to the end of the list, but, unlike append(), does not work with non-iterable objects >>> lst1 = [1, 2, 3] >>> lst1.extend(4) # error >>> lst1.extend(“abc”) >>> lst1 [1, 2, 3, a, b, c]www.youtube.com/vkedco www.vkedco.blogspot.com
26. 26. list.append() vs. list.extend() ● Here is another difference b/w extend() and append(): >>> lst1 = [1, 2, 3] >>> lst1.append([a, b]) >>> lst1 [1, 2, 3, [a, b]] ### the last element is a list >>> lst1 = [1, 2, 3] >>> lst1.extend([a, b]) >>> lst1 [1, 2, 3, a, b] ### [a, b] is added at the end of ### lst1 is added element by elementwww.youtube.com/vkedco www.vkedco.blogspot.com
27. 27. len(), list.count(), list.index() ● Let lst be a list ● lst.len() returns the length of lst ● lst.count(x) returns number of is such that s[i] == x ● lst.index(x, [i, [,j]]) returns smallest k for which s[k] == x and i <= k < j ● Python documentation note: the notation [i, [, j]] means that parameters i and j are optional: they can both be absent, one of them can be absent, or they can both be presentwww.youtube.com/vkedco www.vkedco.blogspot.com
28. 28. len(), list.count(), list.index() >>> lst = [a, a, 1, 1, 1, b, 2, 2, 3] >>> len(lst) 9 >>> lst.count(a) 2 >>> lst.count(1) 3 >>> lst.index(a, 0, 2) 0 >>> lst.index(a, 1, 2) 1 >>> lst.index(a, 2) ValueError: list.index(x): x not in listwww.youtube.com/vkedco www.vkedco.blogspot.com
29. 29. list.remove(), list.reverse() ● Let lst be a list ● lst.remove(x) is the same as del lst[lst.index(x)] ● lst.reverse() reverses lst in place ● lst.reverse() does not return a valuewww.youtube.com/vkedco www.vkedco.blogspot.com
30. 30. list.remove(), list.reverse() >>> lst = [1, 2, 3, 4, 5] >>> lst.remove(1) >>> lst [2, 3, 4, 5] >>> lst = [1, 2, 3, 4, 5] >>> del lst[lst.index(1)] >>> lst [2, 3, 4, 5] >>> lst.reverse() >>> lst [5, 4, 3, 2]www.youtube.com/vkedco www.vkedco.blogspot.com
31. 31. Function Calls and Functions Objects ● Functions are objects in Python ● If f is a function, then f() is a function call while f is a function object. ● Function objects can be passed as arguments to methodswww.youtube.com/vkedco
32. 32. List Sorting ● Let lst be a list ● lst.sort([cmp, [key, [, reverse]]]) – sorts lst in place – destructively modifies lst – does not return any value ● If you want to have access to the original ordering, make a copy of lst before sortingwww.youtube.com/vkedco www.vkedco.blogspot.com
33. 33. cmp(x, y) ● cmp(x, y) is a built-in function ● cmp(x, y) returns – -1 if x < y – 1 if x > y – 0 if x == y ● cmp(x, y) provides the default comparison function for sort()www.youtube.com/vkedco www.vkedco.blogspot.com
34. 34. cmp(x, y) >>> cmp(1, 2) -1 >>> cmp(2, 1) 1 >>> cmp(1, 1) 0 >>> cmp(a, b) -1 >>> cmp(b, a) 1www.youtube.com/vkedco www.vkedco.blogspot.com
35. 35. Customizing sort() ● It is possible to customize sort() ● Customization requires two steps: – Define a new comparison function that takes two arguments and returns three integers according to the cmp() convention – Pass the function object (just the function name with no parenthesis) to sort()www.youtube.com/vkedco www.vkedco.blogspot.com
36. 36. Example 01: Customizing sort() ● Define a comparator function: def neg_cmp(x, y): return -cmp(x,y) ● Pass the comparator function to sort(): >>> z = [1, 2, 3, 4, 5] >>> z.sort(neg_cmp) >>> z [5, 4, 3, 2, 1]www.youtube.com/vkedco www.vkedco.blogspot.com
37. 37. Example 02: Customizing sort() def leaf_freq_comp(leaf1, leaf2): return cmp(get_leaf_freq(leaf1), get_leaf_freq(leaf2)) huff_leaves = [make_leaf(A, 8), make_leaf(C, 1), make_leaf(B, 3), make_leaf(D, 1), make_leaf(F, 1), make_leaf(E, 1), make_leaf(H, 1), make_leaf(G, 1)] print huff_leaves huff_leaves.sort(leaf_freq_comp) OUTPUT: [(A, 8), (C, 1), (B, 3), (D, 1), (F, 1), (E, 1), (H, 1), (G, 1)] [(C, 1), (D, 1), (F, 1), (E, 1), (H, 1), (G, 1), (B, 3), (A, 8)]www.youtube.com/vkedco www.vkedco.blogspot.com
38. 38. Example 03: Customizing sort() def leaf_symbol_comp(leaf1, leaf2): return cmp(get_leaf_symbol(leaf1), get_leaf_symbol(leaf2)) huff_leaves2 = [make_leaf(A, 8), make_leaf(C, 1), make_leaf(B, 3), make_leaf(D, 1), make_leaf(F, 1), make_leaf(E, 1), make_leaf(H, 1), make_leaf(G, 1)] print huff_leaves2 huff_leaves2.sort(leaf_symbol_comp) print huff_leaves2 OUTPUT: [(A, 8), (C, 1), (B, 3), (D, 1), (F, 1), (E, 1), (H, 1), (G, 1)] [(A, 8), (B, 3), (C, 1), (D, 1), (E, 1), (F, 1), (G, 1), (H, 1)]www.youtube.com/vkedco www.vkedco.blogspot.com
39. 39. Reading & References ● www.python.org ● http://docs.python.org/library/stdtypes.html#typesseq ● doc.python.org/howto/unicode.html ● Ch 02, M. L. Hetland. Beginning Python From Novice to Pro- fessional, 2nd Ed., APRESS ● Ch 02, H. Abelson and G. Sussman. Structure and Interpreta- tion of Computer Programs, MIT Press ● S. Roman, Coding and Information Theory, Springer-Verlagwww.youtube.com/vkedco www.vkedco.blogspot.com