Processing data with Pythonusing standard library modules you(probably) never knew about                               PyC...
This is an introduction to…•   Pythons data containers•   Choosing the best container for the job•   Iterators and compreh...
I am assuming…• You know how to program (at least a bit!)• Some familiarity with Python basics• Python 2.7 (but not Python...
Along the way, we will…•   Check out the AFL ladder•   Count yearly winners of the Sheffield Shield•   Parlez-vous Françai...
The examples and these notes• We will use a number of demo scripts  – This is an interactive tutorial  – If you have a lap...
ipython• A very flexible Python shell• Works on all platforms• Lots of really nice features:  – Command line editing with ...
Interlude 0
Simple Python data types• Numbers  answer = 42  pi = 3.141592654• Boolean  exit_flag = True  winning = False• Strings  my_...
Python data containers• Can hold more than one object• Have a consistent interface between  different types of containers•...
Python sequences•   A container that is ordered by position•   Store and access objects by position•   Python has 7 fundam...
Fundamental sequence opreationsName             Purposex in s           True if the object x is in the sequence sx not in ...
The string type• A string is a sequence of characters• Strings are delimited by quotes ( or ")  my_name = "Buffy"  her_nam...
string positions• The sequence starts at position 0• Use location[n] to access the character at  position n from the start...
string position example• A string with 26 characters• Positions index from 0 to 25  location = "Arkham Asylum, Gotham City...
string position slices• You can return a substring by slicing with  two positions• For example, return the string from pos...
Useful string functions & methodsName              Purposelen(s)            Calculate the length of the string s+         ...
Some string demosExample: string1.py
Iterating through a string• You can visit each character in a string• This process is called iteration• The for keyword is...
From here…• We didnt mention:  – Unicode  – The "is" methods, such as islower()  – Splitting and joining strings – thats c...
Interlude 1
list• A simple sequence of objects• Just like a string…  – All the methods you have already learnt work• Except…  – It is ...
Useful list functions & methods Name             Purpose len(x)           Calculate the length of the list x x.append(y)  ...
Some list demosExamples: list1.py → list4.py
list iteration• Same concept as string iteration• Example: list5.py
list of lists• A list can hold any other object• This includes other lists• Example: lol.py
list comprehensions• A compact way to create a list• Build the list by applying an expression to  each element in a sequen...
list comprehension advantages• Focus is on the logic, not loop mechanics• Less code, more readable code• Can nest comprehe...
list comprehension examples• Some simple examples: single_digits = [n for n in range(10)] squares = [x*x for x in range(10...
list traps• Copying → actually copying references• Passing lists in to functions/methods
Interlude 2
dictionary• A container that maps a key to a value• The key: any object that is hashable    – Its the hash that is the “re...
Working with dictionaries• Getting values and testing for keys:  – my_dict[key]  – my_dict.get(key)  – my_dict.get(key, de...
Iterating over dictionaries• Can iterate via keys, values or pairs of (key,  value)• Simple example: dict1.py• Detailed ex...
From here with dictionaries• Lots that we didnt mention!• Python docs have a good overview of:  – Methods  – Views• “Learn...
Interlude 3
tuple• Immutable, ordered sequence• Tuples are basically immutable lists• Delimited by round brackets and separated  by co...
tuple packing• Tuples can be packed and unpacked  to/from other objects:  x, y, z = (640, 480, 1.2)  my_point = x, y, z
tuple immutability• The tuple may be immutable• But objects inside may not!• Example: tuple1.py
tuples: why bother?• A limited, read-only kind of list• Less methods than lists• But:  – can pass data around with (more) ...
collections.namedtuple• Remembering the order of data in a tuple  can be tricky• collections.namedtuple names each field  ...
set• An unordered, mutable collection• No duplicate entries• Perfect for logic and set queries  – eg. union and intersecti...
frozenset• Same as set except it is immutable• So can be used as a key for dictionaries
Some less used collections• array         Fast, single type lists  – bytearray Create an array of bytes  – If you are usin...
Interlude 4
Useful inbuilt functions• Lets play with these sequence functions:  – enumerate()  – min() and max()  – range() and xrange...
Functional functions• A number of very useful functions to  process sequences:  – all()  – any()  – filter()  – map()  – r...
The collections module• Counter  – More elegant than using a dictionary and    manually counting  – Example: counter1.py• ...
The itertools module• Very useful standard library module• Ideal for fast, memory efficient iteration• http://www.doughell...
The pprint module•   Pretty-print complex data structures•   Flexible and customisable•   Uses __repr()__•   pprint()•   p...
Interlude 5
Containers redux•   list: for simple position-based storage•   tuple: read-only lists•   dict: when you need access by key...
For more information...• The Python tutorial  – http://python.org/• Python Module of the Week  – http://www.doughellmann.c...
Some good books…• “Learning Python”, Mark Lutz• “Hello Python”, Anthony Briggs
Upcoming SlideShare
Loading in …5
×

Processing data with Python, using standard library modules you (probably) never knew about

1,329 views

Published on

Tutorial #2 from PyCon AU 2012

You have data.
You have Python.
You also have a lot of choices about the best way to work with that data...

Ever wondered when you would use a tuple, list, dictionary, set, ordered dictionary, bucket, queue, counter or named tuple? Phew!
Do you know when to use a loop, iterator or generator to work through a data container?

Why are there so many different "containers" to hold data?
What are the best ways to work with these data containers?

This tutorial will give you all the basics to effectively working with data containers and iterators in Python. Along the way we will cover some very useful modules from the standard library that you may not have used before and will end up wondering how you ever did without them.

This tutorial is aimed at Python beginners. Bring along your laptop so you can interactively work through some of the examples in the tutorial. If you can, install ipython (http://ipython.org/) as we will use it for the demonstrations.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,329
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
28
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Processing data with Python, using standard library modules you (probably) never knew about

  1. 1. Processing data with Pythonusing standard library modules you(probably) never knew about PyCon AU 2012 Tutorial Graeme Cross
  2. 2. This is an introduction to…• Pythons data containers• Choosing the best container for the job• Iterators and comprehensions• Useful modules for working with data
  3. 3. I am assuming…• You know how to program (at least a bit!)• Some familiarity with Python basics• Python 2.7 (but not Python 3) – Note: the examples work with Python 2.7 – Some wont work with Python 2.6
  4. 4. Along the way, we will…• Check out the AFL ladder• Count yearly winners of the Sheffield Shield• Parlez-vous Français?• Find the suburb that uses the most water• Use set theory on the X-Men• Analyse Olympic medal results
  5. 5. The examples and these notes• We will use a number of demo scripts – This is an interactive tutorial – If you have a laptop, join in! – We will use ipython for the demos• Code is on USB sticks being passed around• Code & these notes are also at: http://www.curiousvenn.com/
  6. 6. ipython• A very flexible Python shell• Works on all platforms• Lots of really nice features: – Command line editing with history – Coloured syntax – Edit and run within the shell – Web notebooks, Visual Studio, Qt, … – %lsmagic• http://ipython.org/ http://ipython.org/presentation.html
  7. 7. Interlude 0
  8. 8. Simple Python data types• Numbers answer = 42 pi = 3.141592654• Boolean exit_flag = True winning = False• Strings my_name = "Malcolm Reynolds"
  9. 9. Python data containers• Can hold more than one object• Have a consistent interface between different types of containers• Fundamental to working in Python
  10. 10. Python sequences• A container that is ordered by position• Store and access objects by position• Python has 7 fundamental sequence types• Examples include: – string – list – tuple
  11. 11. Fundamental sequence opreationsName Purposex in s True if the object x is in the sequence sx not in s True if the object x is NOT in the sequence ss + t Concatenate two sequencess * n Concatentate sequence s n timess[i] Return the object in sequence s at position is[i:j] Return a slice of objects in the sequence s from position is[i:j:k] to j Can have an optional step value, klen(s) Return the number of objects in the sequence ss.index(x) Return the index of the first occurrence of x in the sequence ss.count(x) Return the number of occurrences of x in the sequence smin(s), max(s) Return smallest or largest element in sequence s
  12. 12. The string type• A string is a sequence of characters• Strings are delimited by quotes ( or ") my_name = "Buffy" her_name = Dawn• Strings are immutable – You cant assign to a position
  13. 13. string positions• The sequence starts at position 0• Use location[n] to access the character at position n from the start, first = 0• Use location[-n] to access the character that is at position n from the end, last = -1
  14. 14. string position example• A string with 26 characters• Positions index from 0 to 25 location = "Arkham Asylum, Gotham City" location[0]  A location[15]  G location[25] or location[-1]  y location[-4]  C
  15. 15. string position slices• You can return a substring by slicing with two positions• For example, return the string from position 15 up to (but not including) position 21 location = "Arkham Asylum, Gotham City" location[15]  G location[20]  m location[15:21]  Gotham
  16. 16. Useful string functions & methodsName Purposelen(s) Calculate the length of the string s+ Add two strings together* Repeat a strings.find(x) Find the first position of x in the string ss.count(x) Count the number of times x is in the string ss.upper() Return a new string that is all uppercase ors.lower() lowercases.replace(x, y) Return a new string that has replaced the substring x with the new substring ys.strip() Return a new string with whitespace stripped from the endss.format() Format a strings contents
  17. 17. Some string demosExample: string1.py
  18. 18. Iterating through a string• You can visit each character in a string• This process is called iteration• The for keyword is used to step or iterate through a sequence• Example: string2.py
  19. 19. From here…• We didnt mention: – Unicode – The "is" methods, such as islower() – Splitting and joining strings – thats coming! – Different ways of formatting strings – Strings from files, internet data, XML, JSON,…• The string module• The re module – Regular expression pattern matching
  20. 20. Interlude 1
  21. 21. list• A simple sequence of objects• Just like a string… – All the methods you have already learnt work• Except… – It is mutable (you can change the contents) – Not just for characters• Objects separated by commas• Wrapped inside square brackets
  22. 22. Useful list functions & methods Name Purpose len(x) Calculate the length of the list x x.append(y) Add the object y to the end of the list x x.extend(y) Extend the list x with the contents of the list y x.insert(n, y) Insert object y into the list x before position n x.pop() Remove and return the first object in the list x.pop(n) Remove and return the object at position n x.count(y) Count the number of times object y is in the list x.sort() Sorts the list x in-place sorted(x) Returns sorted version of x (does not change x) x.reverse() Reverses the list x in-place
  23. 23. Some list demosExamples: list1.py → list4.py
  24. 24. list iteration• Same concept as string iteration• Example: list5.py
  25. 25. list of lists• A list can hold any other object• This includes other lists• Example: lol.py
  26. 26. list comprehensions• A compact way to create a list• Build the list by applying an expression to each element in a sequence• Can contain logic and function calls• Uses square brackets (list)• Syntax:[output_func(var) for var in input_list if predicate]
  27. 27. list comprehension advantages• Focus is on the logic, not loop mechanics• Less code, more readable code• Can nest comprehensions, keeping logic in a single place• Easier to map algorithm to working code• Widely used in Python (and many other languages, especially functional languages)
  28. 28. list comprehension examples• Some simple examples: single_digits = [n for n in range(10)] squares = [x*x for x in range(10)] even_sq = [i*i for i in squares if i%2 == 0]• Example: comp1.py
  29. 29. list traps• Copying → actually copying references• Passing lists in to functions/methods
  30. 30. Interlude 2
  31. 31. dictionary• A container that maps a key to a value• The key: any object that is hashable – Its the hash that is the “real” key• The value: any object – Numbers, strings, lists, other dictionaries, ...• Perfect for look-up tables• It is not a sequence!• It is not ordered• Fundamental Python concept and type
  32. 32. Working with dictionaries• Getting values and testing for keys: – my_dict[key] – my_dict.get(key) – my_dict.get(key, default) – my_dict.has_key(key) – key in my_dict – my_dict.keys() – my_dict.values() – my_dict.items()
  33. 33. Iterating over dictionaries• Can iterate via keys, values or pairs of (key, value)• Simple example: dict1.py• Detailed example: dict2.py
  34. 34. From here with dictionaries• Lots that we didnt mention!• Python docs have a good overview of: – Methods – Views• “Learning Python” and “Hello Python” – Detailed coverage of dictionaries
  35. 35. Interlude 3
  36. 36. tuple• Immutable, ordered sequence• Tuples are basically immutable lists• Delimited by round brackets and separated by commas years = (1940, 1945, 1967, 1968, 1970)• Remember to use a comma with a tuple containing a single object numbers = (1.23,)
  37. 37. tuple packing• Tuples can be packed and unpacked to/from other objects: x, y, z = (640, 480, 1.2) my_point = x, y, z
  38. 38. tuple immutability• The tuple may be immutable• But objects inside may not!• Example: tuple1.py
  39. 39. tuples: why bother?• A limited, read-only kind of list• Less methods than lists• But: – can pass data around with (more) confidence – can use (hashable) tuples as dictionary keys
  40. 40. collections.namedtuple• Remembering the order of data in a tuple can be tricky• collections.namedtuple names each field in a tuple• More readable than a normal tuple• Less setup than a class for just data• Similar to a C struct• Example: namedtuple1.py
  41. 41. set• An unordered, mutable collection• No duplicate entries• Perfect for logic and set queries – eg. union and intersection tests• Is not a sequence – No indexing, slices, etc• Can convert to/from other sequences• Example: set1.py
  42. 42. frozenset• Same as set except it is immutable• So can be used as a key for dictionaries
  43. 43. Some less used collections• array Fast, single type lists – bytearray Create an array of bytes – If you are using array, check out numpy – http://numpy.scipy.org/• buffer Working with raw bytes• Queue Sharing data between threads• struct Convert between Python & C
  44. 44. Interlude 4
  45. 45. Useful inbuilt functions• Lets play with these sequence functions: – enumerate() – min() and max() – range() and xrange() – reversed() – sorted() – sum()
  46. 46. Functional functions• A number of very useful functions to process sequences: – all() – any() – filter() – map() – reduce() – zip()• Example: func1.py
  47. 47. The collections module• Counter – More elegant than using a dictionary and manually counting – Example: counter1.py• namedtuple – See the previous example
  48. 48. The itertools module• Very useful standard library module• Ideal for fast, memory efficient iteration• http://www.doughellmann.com/PyMOTW/itertools/• Highlights: – izip() → memory efficient version of zip() – imap() & starmap() → memory efficient and flexible versions of map() – count() → iterator for counting – dropwhile() & takewhile() → processing lists until condition is reached
  49. 49. The pprint module• Pretty-print complex data structures• Flexible and customisable• Uses __repr()__• pprint()• pformat()• Example: pprint1.py
  50. 50. Interlude 5
  51. 51. Containers redux• list: for simple position-based storage• tuple: read-only lists• dict: when you need access by key• Counter: for counting keys• namedtuple: tuple < namedtuple < class• array: lists with speed• set: for sets! :)
  52. 52. For more information...• The Python tutorial – http://python.org/• Python Module of the Week – http://www.doughellmann.com/PyMOTW/
  53. 53. Some good books…• “Learning Python”, Mark Lutz• “Hello Python”, Anthony Briggs

×