Your SlideShare is downloading. ×
0
Python & Stuff               All the things I like about Python, plus a bit more.Friday, November 4, 11
Jacob Perkins                         Python Text Processing with NLTK 2.0 Cookbook                         Co-Founder & C...
What I use Python for                          web development with Django                          web crawling with Scra...
Topics                         functional programming                         I/O                         Object Oriented ...
Functional Programming                         list comprehensions                         slicing                        ...
List Comprehensions                         >>>   [i for i in range(10) if i % 2]                         [1,   3, 5, 7, 9...
Slicing                         >>>   range(10)[:5]                         [0,   1, 2, 3, 4]                         >>> ...
Iterators                         >>> i = iter([1, 2, 3])                         >>> i.next()                         1  ...
Generators                         >>> def gen_ints(n):                         ...     for i in range(n):                ...
Higher Order Functions                          >>> def hof(n):                          ...      def addn(i):            ...
Decorators               >>> def print_args(f):               ...     def g(*args, **kwargs):               ...         pr...
Default & Optional Args               >>> def special_arg(special=None, *args, **kwargs):               ...     print spec...
switch/case emulation                             OPTS = {                                 “a”: all,                      ...
Object Oriented                         classes                         multiple inheritance                         speci...
Classes               >>>       class A(object):               ...           def __init__(self):               ...        ...
Multiple Inheritance               >>>       class B(object):               ...           def __init__(self):             ...
Special Methods                         __init__                         __len__                         __iter__         ...
collections                         high performance containers                         Abstract Base Classes             ...
defaultdict               >>> d = {}               >>> d[a] += 2               Traceback (most recent call last):         ...
I/O                         context managers                         file iteration                         gevent / eventl...
Context Managers               >>> with open(myfile, w) as f:               ...     f.write(hellonworld)               ......
File Iteration               >>> with open(myfile) as f:               ...     for line in f:               ...           ...
gevent / eventlet                         coroutine networking libraries                         greenlets: “micro-threads...
Scripting                         argparse                         __main__                         atexitFriday, November...
argparse   import argparse   parser = argparse.ArgumentParser(description=Train a   NLTK Classifier)   parser.add_argument...
__main__                         if __name__ == ‘__main__’:                             do_main_function()Friday, November...
atexit        def goodbye(name, adjective):            print Goodbye, %s, it was %s to meet you. % (name,        adjective...
Testing                         doctest                         unittest                         nose                     ...
doctest                         def fib(n):                             Return the nth fibonacci number.                  ...
doctesting modules                           if __name__ == ‘__main__’:                               import doctest      ...
unittest                         anything more complicated than function I/O                         clean state for each ...
nose                         http://readthedocs.org/docs/nose/en/latest/                         test runner              ...
fudge                         http://farmdev.com/projects/fudge/                         make fake objects                ...
py.test                         http://pytest.org/latest/                         similar to nose                         ...
Remoting Libraries                         Fabric                         execnetFriday, November 4, 11
Fabric                         http://fabfile.org                         run commands over ssh                         gre...
fabfile.py   from fabric.api import run   def host_type():       run(uname -s)                         fab command   $ fab ...
execnet                         http://codespeak.net/execnet/                         open python interpreters over ssh   ...
execnet example   >>> import execnet, os   >>> gw = execnet.makegateway("ssh=codespeak.net")   >>> channel = gw.remote_exe...
Parsing                         regular expressions                         NLTK                         SimpleParseFriday...
NLTK Tokenization          >>> from nltk import tokenize          >>> tokenize.word_tokenize("Jacobs presentation")       ...
nltk.grammar                         CFGs                         Chapter 9 of NLTK Book: http://                         ...
more NLTK                         stemming                         part-of-speech tagging                         chunking...
SimpleParse                         http://simpleparse.sourceforge.net/                         Parser generator          ...
Package Management                         import                         pip                         virtualenv          ...
import                 import module                 from module import function, ClassName                 from module im...
pip                          http://www.pip-installer.org/en/latest/                          easy_install replacement    ...
virtualenv                         http://www.virtualenv.org/en/latest/                         create self-contained pyth...
mercurial                         http://mercurial.selenic.com/                         Python based DVCS                 ...
Flexible Data Storage                         Redis                         MongoDBFriday, November 4, 11
Redis                         in-memory key-value storage server                         most operations O(1)             ...
MongoDB                         memory mapped document storage                         arbitrary document fields           ...
Python Performance                         CPU                         RAMFriday, November 4, 11
CPU                         probably fast enough if I/O or DB bound                         try PyPy: http://pypy.org/    ...
RAM                         don’t keep references longer than needed                         iterate over data            ...
import this                     >>> import this                     The Zen of Python, by Tim Peters                     B...
Upcoming SlideShare
Loading in...5
×

Python & Stuff

1,800

Published on

All the interesting things I like about Python, plus a bit more.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,800
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
64
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Python & Stuff"

  1. 1. Python & Stuff All the things I like about Python, plus a bit more.Friday, November 4, 11
  2. 2. Jacob Perkins Python Text Processing with NLTK 2.0 Cookbook Co-Founder & CTO @weotta Blog: http://streamhacker.com NLTK Demos: http://text-processing.com @japerk Python user for > 6 yearsFriday, November 4, 11
  3. 3. What I use Python for web development with Django web crawling with Scrapy NLP with NLTK argparse based scripts processing data in Redis & MongoDBFriday, November 4, 11
  4. 4. Topics functional programming I/O Object Oriented programming scripting testing remoting parsing package management data storage performanceFriday, November 4, 11
  5. 5. Functional Programming list comprehensions slicing iterators generators higher order functions decorators default & optional arguments switch/case emulationFriday, November 4, 11
  6. 6. List Comprehensions >>> [i for i in range(10) if i % 2] [1, 3, 5, 7, 9] >>> dict([(i, i*2) for i in range(5)]) {0: 0, 1: 2, 2: 4, 3: 6, 4: 8} >>> s = set(range(5)) >>> [i for i in range(10) if i in s] [0, 1, 2, 3, 4]Friday, November 4, 11
  7. 7. Slicing >>> range(10)[:5] [0, 1, 2, 3, 4] >>> range(10)[3:5] [3, 4] >>> range(10)[1:5] [1, 2, 3, 4] >>> range(10)[::2] [0, 2, 4, 6, 8] >>> range(10)[-5:-1] [5, 6, 7, 8]Friday, November 4, 11
  8. 8. Iterators >>> i = iter([1, 2, 3]) >>> i.next() 1 >>> i.next() 2 >>> i.next() 3 >>> i.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIterationFriday, November 4, 11
  9. 9. Generators >>> def gen_ints(n): ... for i in range(n): ... yield i ... >>> g = gen_ints(2) >>> g.next() 0 >>> g.next() 1 >>> g.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIterationFriday, November 4, 11
  10. 10. Higher Order Functions >>> def hof(n): ... def addn(i): ... return i + n ... return addn ... >>> f = hof(5) >>> f(3) 8Friday, November 4, 11
  11. 11. Decorators >>> def print_args(f): ... def g(*args, **kwargs): ... print args, kwargs ... return f(*args, **kwargs) ... return g ... >>> @print_args ... def add2(n): ... return n+2 ... >>> add2(5) (5,) {} 7 >>> add2(3) (3,) {} 5Friday, November 4, 11
  12. 12. Default & Optional Args >>> def special_arg(special=None, *args, **kwargs): ... print special:, special ... print args ... print kwargs ... >>> special_arg(special=hi) special: hi () {} >>> >>> special_arg(hi) special: hi () {}Friday, November 4, 11
  13. 13. switch/case emulation OPTS = { “a”: all, “b”: any } def all_or_any(lst, opt): return OPTS[opt](lst)Friday, November 4, 11
  14. 14. Object Oriented classes multiple inheritance special methods collections defaultdictFriday, November 4, 11
  15. 15. Classes >>> class A(object): ... def __init__(self): ... self.value = a ... >>> class B(A): ... def __init__(self): ... super(B, self).__init__() ... self.value = b ... >>> a = A() >>> a.value a >>> b = B() >>> b.value bFriday, November 4, 11
  16. 16. Multiple Inheritance >>> class B(object): ... def __init__(self): ... self.value = b ... >>> class C(A, B): pass ... >>> C().value a >>> class C(B, A): pass ... >>> C().value bFriday, November 4, 11
  17. 17. Special Methods __init__ __len__ __iter__ __contains__ __getitem__Friday, November 4, 11
  18. 18. collections high performance containers Abstract Base Classes Iterable, Sized, Sequence, Set, Mapping multi-inherit from ABC to mix & match implement only a few special methods, get rest for freeFriday, November 4, 11
  19. 19. defaultdict >>> d = {} >>> d[a] += 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: a >>> import collections >>> d = collections.defaultdict(int) >>> d[a] += 2 >>> d[a] 2 >>> l = collections.defaultdict(list) >>> l[a].append(1) >>> l[a] [1]Friday, November 4, 11
  20. 20. I/O context managers file iteration gevent / eventletFriday, November 4, 11
  21. 21. Context Managers >>> with open(myfile, w) as f: ... f.write(hellonworld) ...Friday, November 4, 11
  22. 22. File Iteration >>> with open(myfile) as f: ... for line in f: ... print line.strip() ... hello worldFriday, November 4, 11
  23. 23. gevent / eventlet coroutine networking libraries greenlets: “micro-threads” fast event loop monkey-patch standard library http://www.gevent.org/ http://www.eventlet.net/Friday, November 4, 11
  24. 24. Scripting argparse __main__ atexitFriday, November 4, 11
  25. 25. argparse import argparse parser = argparse.ArgumentParser(description=Train a NLTK Classifier) parser.add_argument(corpus, help=corpus name/path) parser.add_argument(--no-pickle, action=store_true, default=False, help="dont pickle") parser.add_argument(--trace, default=1, type=int, help=How much trace output you want) args = parser.parse_args() if args.trace: print ‘have args’Friday, November 4, 11
  26. 26. __main__ if __name__ == ‘__main__’: do_main_function()Friday, November 4, 11
  27. 27. atexit def goodbye(name, adjective): print Goodbye, %s, it was %s to meet you. % (name, adjective) import atexit atexit.register(goodbye, Donny, nice)Friday, November 4, 11
  28. 28. Testing doctest unittest nose fudge py.testFriday, November 4, 11
  29. 29. doctest def fib(n): Return the nth fibonacci number. >>> fib(0) 0 >>> fib(1) 1 >>> fib(2) 1 >>> fib(3) 2 >>> fib(4) 3 if n == 0: return 0 elif n == 1: return 1 else: return fib(n - 1) + fib(n - 2)Friday, November 4, 11
  30. 30. doctesting modules if __name__ == ‘__main__’: import doctest doctest.testmod()Friday, November 4, 11
  31. 31. unittest anything more complicated than function I/O clean state for each test test interactions between components can use mock objectsFriday, November 4, 11
  32. 32. nose http://readthedocs.org/docs/nose/en/latest/ test runner auto-discovery of tests easy plugin system plugins can generate XML for CI (Jenkins)Friday, November 4, 11
  33. 33. fudge http://farmdev.com/projects/fudge/ make fake objects mock thru monkey-patchingFriday, November 4, 11
  34. 34. py.test http://pytest.org/latest/ similar to nose distributed multi-platform testingFriday, November 4, 11
  35. 35. Remoting Libraries Fabric execnetFriday, November 4, 11
  36. 36. Fabric http://fabfile.org run commands over ssh great for “push” deployment not parallel yetFriday, November 4, 11
  37. 37. fabfile.py from fabric.api import run def host_type(): run(uname -s) fab command $ fab -H localhost,linuxbox host_type [localhost] run: uname -s [localhost] out: Darwin [linuxbox] run: uname -s [linuxbox] out: LinuxFriday, November 4, 11
  38. 38. execnet http://codespeak.net/execnet/ open python interpreters over ssh spawn local python interpreters shared-nothing model send code & data over channels interact with CPython, Jython, PyPy py.test distributed testingFriday, November 4, 11
  39. 39. execnet example >>> import execnet, os >>> gw = execnet.makegateway("ssh=codespeak.net") >>> channel = gw.remote_exec(""" ... import sys, os ... channel.send((sys.platform, sys.version_info, os.getpid())) ... """) >>> platform, version_info, remote_pid = channel.receive() >>> platform linux2 >>> version_info (2, 4, 2, final, 0)Friday, November 4, 11
  40. 40. Parsing regular expressions NLTK SimpleParseFriday, November 4, 11
  41. 41. NLTK Tokenization >>> from nltk import tokenize >>> tokenize.word_tokenize("Jacobs presentation") [Jacob, "s", presentation] >>> tokenize.wordpunct_tokenize("Jacobs presentation") [Jacob, "", s, presentation]Friday, November 4, 11
  42. 42. nltk.grammar CFGs Chapter 9 of NLTK Book: http:// nltk.googlecode.com/svn/trunk/doc/book/ ch09.htmlFriday, November 4, 11
  43. 43. more NLTK stemming part-of-speech tagging chunking classificationFriday, November 4, 11
  44. 44. SimpleParse http://simpleparse.sourceforge.net/ Parser generator EBNF grammars Based on mxTextTools: http:// www.egenix.com/products/python/mxBase/ mxTextTools/ (C extensions)Friday, November 4, 11
  45. 45. Package Management import pip virtualenv mercurialFriday, November 4, 11
  46. 46. import import module from module import function, ClassName from module import function as f always make sure package directories have __init__.pyFriday, November 4, 11
  47. 47. pip http://www.pip-installer.org/en/latest/ easy_install replacement install from requirements files $ pip install simplejson [... progress report ...] Successfully installed simplejsonFriday, November 4, 11
  48. 48. virtualenv http://www.virtualenv.org/en/latest/ create self-contained python installations dependency silos works great with pip (same author)Friday, November 4, 11
  49. 49. mercurial http://mercurial.selenic.com/ Python based DVCS simple & fast easy cloning works with Bitbucket, Github, GooglecodeFriday, November 4, 11
  50. 50. Flexible Data Storage Redis MongoDBFriday, November 4, 11
  51. 51. Redis in-memory key-value storage server most operations O(1) lists sets sorted sets hash objectsFriday, November 4, 11
  52. 52. MongoDB memory mapped document storage arbitrary document fields nested documents index on multiple fields easier (for programmers) than SQL capped collections (good for logging)Friday, November 4, 11
  53. 53. Python Performance CPU RAMFriday, November 4, 11
  54. 54. CPU probably fast enough if I/O or DB bound try PyPy: http://pypy.org/ use CPython optimized libraries like numpy write a CPython extensionFriday, November 4, 11
  55. 55. RAM don’t keep references longer than needed iterate over data aggregate to an optimized DBFriday, November 4, 11
  56. 56. import this >>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases arent special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless youre Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, its a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- lets do more of those!Friday, November 4, 11
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×