Upcoming SlideShare
×

Python lecture 06

283 views

Published on

Published in: Education
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
283
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
4
0
Likes
1
Embeds 0
No embeds

No notes for slide

Python lecture 06

1. 1. Python & Perl Lecture 06 Department of Computer Science Utah State University
2. 2. Outline ● ● Data Abstraction: Building Huffman Trees with Lists and Tuples List Comprehension
3. 3. Data Abstraction Building Huffman Trees with Lists and Tuples
4. 4. Background ● ● ● In information theory, coding refers to methods that represent data in terms of bit sequences (sequences of 0's and 1's) Encoding is a method of taking data structures and mapping them to bit sequences Decoding is a method of taking bit sequences and outputting the corresponding data structure
5. 5. Example: Standard ASCII & Unicode ● Standard ASCII encodes each character as a 7-bit sequence ● Using 7 bits allows us to encode 27 possible characters ● ● ● Unicode has three standards: UTF-8 (uses 8-bit sequences), UTF-16 (uses 16-bit sequences), and UTF-32 (uses 32-bit sequences) UTF stands for Unicode Transformation Format Python 2.X's Unicode support: “Python represents Unicode strings as either 16- or 32-bit integers), depending on how the Python interpreter was compiled.”
6. 6. Two Types of Codes ● ● ● There are two types of codes: fixed-length and variable-length Fixed-length (e.g., ASCII, Unicode) codes encode every character in terms of the same number of bits Variable-length codes (e.g., Morse, Huffman) encode characters in terms of variable numbers of bits: more frequent symbols are encoded with fewer bits
7. 7. Example: Fixed-Length Code ● A – 000 C – 010 E – 100 G – 110 ● B – 001 D – 011 F – 101 H – 111 ● AADF = 000000011101 ● The encoding of AADF is 12 bits
8. 8. Example: Variable-Length Code ● A–0 C – 1010 ● B – 100 ● AADF = 0010111101 ● The encoding of AADF is 10 bits D – 1011 E – 1100 F – 1101 G – 1110 H – 1111
9. 9. End of Character in Variable-Length Code ● ● ● One of the challenges in variable-length codes is knowing where one character ends and the one begins Morse uses a special character (separator code) Prefix coding is another solution: the prefix of every character is unique – no code of any character starts another character
10. 10. Huffman Code ● ● ● ● Huffman code is a variable-length code that takes advantage of relative frequencies of characters Huffman code is named after David Huffman, the researcher who discovered it Huffman code is represented as a binary tree where leaves are individual characters and their frequencies Each non-leaf node is a set of characters in all of its subnodes and the sum of their relative frequencies
11. 11. Huffman Tree Example {A, B, C, D, E, F, G, H}: 17 1 0 A: 8 {B, C, D, E, F, G, H}: 9 1 0 {E, F, G, H}: 4 {B, C, D}: 5 1 0 {C, D}: 2 B: 3 0 C: 1 1 0 1 D: 1 {G, H}: 2 {E, F}: 2 0 E: 1 1 F: 1 0 G: 1 1 H: 1
12. 12. Using Huffman Tree to Encode/Decode Characters ● The tree on the previous slide, these are the encodings:  A is encoded as 0  B is encoded as 100  C is encoded as 1010  D is encoded as 1011  E is encoded as 1100  F is encoded as 1101  G is encoded as 1110  H is encoded as 1111
13. 13. Building The Huffman Tree
14. 14. Simple Huffman Tree {A, B, D, C}: 8 {B, D, C}: 4 A: 4 {D, C}: 2 B: 2 D: 1 C: 1
15. 15. Constructing Leaves ### a leaf is a tuple whose first element is symbol ### represented as a string and whose second element is ### the symbol's frequency def make_leaf(symbol, freq): return (symbol, freq) def is_leaf(x): return isinstance(x, tuple) and len(x) == 2 and isinstance(x[0], str) and isinstance(x[1], int)
16. 16. Constructing Leaves ### return the character (symbol) of the leaf def get_leaf_symbol(leaf): return leaf[0] ### return the frequency of the leaf's character def get_leaf_freq(leaf): return leaf[1]
17. 17. Constructing Huffman Trees ### A Non-Leaf node (internal node) is represented as ### a list of four elements: ### 1. left brach ### 2. right branch ### 3. list of symbols ### 4. combined frequency of symbols [left_branch, right_branch, symbols, frequency]
18. 18. Accessing Huffman Trees def get_leaf_symbol(leaf): return leaf[0] def get_leaf_freq(leaf): return leaf[1] def get_left_branch(huff_tree): return huff_tree[0] def get_right_branch(huff_tree): return huff_tree[1]
19. 19. Accessing Huffman Trees def get_symbols(huff_tree): if is_leaf(huff_tree): return [get_leaf_symbol(huff_tree)] else: return huff_tree[2] def get_freq(huff_tree): if is_leaf(huff_tree): return get_leaf_freq(huff_tree) else: return huff_tree[3]
20. 20. Constructing Huffman Trees ### A Huffman tree is constructed from its left branch, which can ### be a huffman tree or a leaf, and its right branch, another ### huffman tree or a leaf. The new tree has the symbols of the ### left branch and the right branch and the frequency of the left ### branch and the right branch def make_huffman_tree(left_branch, right_branch): return [left_branch, right_branch, get_symbols(left_branch) + get_symbols(right_branch), get_freq(left_branch) + get_freq(right_branch)]
21. 21. MAKE_HUFFMAN_TREE Example ht01 = make_huffman_tree(make_leaf('A', 4), make_huffman_tree(make_leaf('B', 2), make_huffman_tree(make_leaf('D', 1), make_leaf('C', 1)))) {A, B, D, C}: 8 {B, D, C}: 4 A: 4 {D, C}: 2 B: 2 D: 1 C: 1
22. 22. MAKE_HUFFMAN_TREE Example Python data structure that represents the Huffman tree below: [('A', 4), [('B', 2), [('D', 1), ('C', 1), ['D', 'C'], 2], ['B', 'D', 'C'], 4], ['A', 'B', 'D', 'C'], 8] {A, B, D, C}: 8 {B, D, C}: 4 A: 4 {D, C}: 2 B: 2 D: 1 C: 1
23. 23. Customizing sort() def leaf_freq_comp(leaf1, leaf2): return cmp(get_leaf_freq(leaf1), get_leaf_freq(leaf2)) huff_leaves = [make_leaf('A', 8), make_leaf('C', 1), make_leaf('B', 3), make_leaf('D', 1), make_leaf('F', 1), make_leaf('E', 1), make_leaf('H', 1), make_leaf('G', 1)] print huff_leaves huff_leaves.sort(leaf_freq_comp) OUTPUT: [('A', 8), ('C', 1), ('B', 3), ('D', 1), ('F', 1), ('E', 1), ('H', 1), ('G', 1)] [('C', 1), ('D', 1), ('F', 1), ('E', 1), ('H', 1), ('G', 1), ('B', 3), ('A', 8)]
24. 24. Customizing sort() def leaf_symbol_comp(leaf1, leaf2): return cmp(get_leaf_symbol(leaf1), get_leaf_symbol(leaf2)) huff_leaves2 = [make_leaf('A', 8), make_leaf('C', 1), make_leaf('B', 3), make_leaf('D', 1), make_leaf('F', 1), make_leaf('E', 1), make_leaf('H', 1), make_leaf('G', 1)] print huff_leaves2 huff_leaves2.sort(leaf_symbol_comp) print huff_leaves2 OUTPUT: [('A', 8), ('C', 1), ('B', 3), ('D', 1), ('F', 1), ('E', 1), ('H', 1), ('G', 1)] [('A', 8), ('B', 3), ('C', 1), ('D', 1), ('E', 1), ('F', 1), ('G', 1), ('H', 1)]
25. 25. Encoding & Decoding Messages with Huffman Trees
26. 26. Sample Huffman Tree {A, B, C, D, E, F, G, H}: 17 1 0 {B, C, D, E, F, G, H}: 9 A: 8 1 0 {E, F, G, H}: 4 {B, C, D}: 5 1 0 {C, D}: 2 B: 3 0 C: 1 1 0 1 D: 1 {G, H}: 2 {E, F}: 2 0 E: 1 1 F: 1 0 G: 1 1 H: 1
27. 27. Symbol Encoding 1. Given a symbol s and a Huffman tree ht, set current_node to the root node and encoding to an empty list (you can also check if s is in the root node's symbol leaf and, if not, signal error) 2. If current_node is a leaf, return encoding 3. Check if s is in current_node's left branch or right branch 4. If in the left, add 0 to encoding, set current_node to the root of the left branch, and go to step 2 5. If in the right, add 1 to encoding, set current_node to the root of the right branch, and go to step 2 6. If in neither branch, signal error
28. 28. Example ● Encode B with the sample Huffman tree ● Set current_node to the root node ● ● ● ● B is in current_node's the right branch, so add 1 to encoding & recurse into the right branch (current_node is set to the root of the right branch – {B, C, D, E, F, G, H}: 9) B is in current_node's left branch, so add 0 to encoding and recurse into the left branch (current_node is {B, C, D}: 5) B is in current_node's left branch, so add 0 to encoding & recurse into the left branch (current_node is B: 3) current_node is a leaf, so return 100 (value of encoding)
29. 29. Message Encoding ● ● ● Given a sequence of symbols message and a Huffman tree ht Concatenate the encoding of each symbol in message from left to right Return the concatenation of encodings
30. 30. Example ● Encode ABBA with the sample Huffman tree ● Encoding for A is 0 ● Encoding for B is 100 ● Encoding for B is 100 ● Encoding for A is 0 ● Concatenation of encodings is 01001000
31. 31. Message Decoding 1. Given a sequence of bits message and a Huffman tree ht, set current_node to the root and decoding to an empty list 2. If current_node is a leaf, add its symbol to decoding and set current_node to ht's root 3. If current_node is ht's root and message has no more bits, return decoding 4. If no more bits in message & current_node is not a leaf, signal error 5. If message's current bit is 0, set current_node to its left child, read the bit, & go to step 2 6. If message's current bit is 1, set current_node to its right child, read the bit, & go to step 2
32. 32. Example ● ● Decode 0100 with the sample Huffman tree Read 0, go left to A:8 & add A to decoding and reset current_node to the root ● Read 1, go right to {B, C, D, E, F, G, H}: 9 ● Read 0, go left to {B, C, D}:5 ● Read 0, go left to B:3 ● Add B to decoding & reset current_node to the root ● No more bits & current_node is the root, so return AB
33. 33. List Comprehension
34. 34. List Comprehension ● ● List comprehension is an syntactic construct in some programming languages for building lists from list specifications List comprehension derives its conceptual roots from the set-former (set-builder) notation in mathematics [Y for X in LIST] ● List comprehension is available in other programming languages such as Common Lisp, Haskell, and Ocaml
35. 35. Set-Former Notation Example 4  x | x  N , x   100  4  x is the output function  x is the variable  N is the input set 2  x  100 is the predicate 2
36. 36. Set-Former Notation Examples x  a, b | x  3is the set of all strings over a, b * whose length is 0, 1, 2, or 3. a b n n  | n  1 is the set of non - empty strings over a, b such that a ' s precede b' s and the number of a ' s is equal to the number of b' s. xy | x  a, b, y  aa, ccis the set of strings where a or b is followed by aa or cc.
37. 37. For-Loop Implementation ### building the list of the set-former example with forloop >>> rslt = [] >>> for x in xrange(201): if x ** 2 < 100: rslt.append(4 * x) >>> rslt [0, 4, 8, 12, 16, 20, 24, 28, 32, 36]
38. 38. List Comprehension Equivalent ### building the same list with list comprehension >>> s = [ 4 * x for x in xrange(201) if x ** 2 < 100] >>> s [0, 4, 8, 12, 16, 20, 24, 28, 32, 36]
39. 39. For-Loop ### building list of squares of even numbers in [0, 10] ### with for-loop >>> rslt = [] >>> for x in xrange(11): if x % 2 == 0: rslt.append(x**2) >>> rslt [0, 4, 16, 36, 64, 100]
40. 40. List Comprehension Equivalent ### building the same list with list comprehension >>> [x ** 2 for x in xrange(11) if x % 2 == 0] [0, 4, 16, 36, 64, 100]
41. 41. For-Loop ## building list of squares of odd numbers in [0, 10] >>> rslt = [] >>> for x in xrange(11): if x % 2 != 0: rslt.append(x**2) >>> rslt [1, 9, 25, 49, 81]
42. 42. List Comprehension Equivalent ## building list of squares of odd numbers [0, 10] ## with list comprehension >>> [x ** 2 for x in xrange(11) if x % 2 != 0] [1, 9, 25, 49, 81]
43. 43. List Comprehension with For-Loops
44. 44. For-Loop >>> rslt = [] >>> for x in xrange(6): if x % 2 == 0: for y in xrange(6): if y % 2 != 0: rslt.append((x, y)) >>> rslt [(0, 1), (0, 3), (0, 5), (2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)]
45. 45. List Comprehension Equivalent >>> [(x, y) for x in xrange(6) if x % 2 == 0 for y in xrange(6) if y % 2 != 0] [(0, 1), (0, 3), (0, 5), (2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)]
46. 46. List Comprehension with Matrices
47. 47. List Comprehension with Matrices ● List comprehension can be used to scan rows and columns in matrices >>> matrix = [ [10, 20, 30], [40, 50, 60], [70, 80, 90] ] ### extract all rows >>> [r for r in matrix] [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
48. 48. List Comprehension with Matrices >>> matrix = [ [10, 20, 30], [40, 50, 60], [70, 80, 90] ] ### extract column 0 >>> [r[0] for r in matrix] [10, 40, 70]
49. 49. List Comprehension with Matrices >>> matrix = [ [10, 20, 30], [40, 50, 60], [70, 80, 90] ] ### extract column 1 >>> [r[1] for r in matrix] [20, 50, 80]
50. 50. List Comprehension with Matrices >>> matrix = [ [10, 20, 30], [40, 50, 60], [70, 80, 90] ] ### extract column 2 >>> [r[2] for r in matrix] [30, 60, 90]
51. 51. List Comprehension with Matrices ### turn matrix columns into rows >>> rslt = [] >>> for c in xrange(len(matrix)): rslt.append([matrix[r][c] xrange(len(matrix))]) for >>> rslt [[10, 40, 70], [20, 50, 80], [30, 60, 90]] r in
52. 52. List Comprehension with Matrices ● List comprehension can work with iterables (e.g., dictionaries) >>> dict = {'a' : 'A', 'bb' : 'BB', 'ccc' : 'CCC'} >>> [(item[0], item[1], len(item[0]+item[1])) for item in dict.items()] [('a', 'A', 2), ('ccc', 'CCC', 6), ('bb', 'BB', 4)]
53. 53. List Comprehension ● If the expression inside [ ] is a tuple, parentheses are a must >>> cubes = [(x, x**3) for x in xrange(5)] >>> cubes [(0, 0), (1, 1), (2, 8), (3, 27), (4, 64)] ● Sequences can be unpacked in list comprehension >>> sums = [x + y for x, y in cubes] >>> sums [0, 2, 10, 30, 68]
54. 54. List Comprehension ● for-clauses in list comprehensions can iterate over any sequences: >>> rslt = [ c * n for c in 'math' for n in (1, 2, 3)] >>> rslt ['m', 'mm', 'mmm', 'a', 'aa', 'aaa', 't', 'tt','ttt', 'h', 'hh', 'hhh']
55. 55. List Comprehension & Loop Variables ● The loop variables used in the list comprehension for-loops (and in regular for-loops) stay after the execution. >>> for i in [1, 2, 3]: print i 1 2 3 >>> i + 4 7 >>> [j for j in xrange(10) if j % 2 == 0] [0, 2, 4, 6, 8] >>> j * 2 18
56. 56. When To Use List Comprehension ● For-loops are easier to understand and debug ● List comprehensions may be harder to understand ● ● ● List comprehensions are faster than for-loops in the interpreter List comprehensions are worth using to speed up simpler tasks For-loops are worth using when logic gets complex
57. 57. Reading & References ● www.python.org ● http://docs.python.org/library/stdtypes.html#typesseq ● doc.python.org/howto/unicode.html ● ● ● Ch 02, M. L. Hetland. Beginning Python From Novice to Professional, 2nd Ed., APRESS Ch 02, H. Abelson and G. Sussman. Structure and Interpretation of Computer Programs, MIT Press S. Roman, Coding and Information Theory, Springer-Verlag