Python & Stuff

2,365 views

Published on

All the interesting things I like about Python, plus a bit more.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,365
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
68
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Python & Stuff

  1. 1. Python & Stuff All the things I like about Python, plus a bit more.Friday, November 4, 11
  2. 2. Jacob Perkins Python Text Processing with NLTK 2.0 Cookbook Co-Founder & CTO @weotta Blog: http://streamhacker.com NLTK Demos: http://text-processing.com @japerk Python user for > 6 yearsFriday, November 4, 11
  3. 3. What I use Python for web development with Django web crawling with Scrapy NLP with NLTK argparse based scripts processing data in Redis & MongoDBFriday, November 4, 11
  4. 4. Topics functional programming I/O Object Oriented programming scripting testing remoting parsing package management data storage performanceFriday, November 4, 11
  5. 5. Functional Programming list comprehensions slicing iterators generators higher order functions decorators default & optional arguments switch/case emulationFriday, November 4, 11
  6. 6. List Comprehensions >>> [i for i in range(10) if i % 2] [1, 3, 5, 7, 9] >>> dict([(i, i*2) for i in range(5)]) {0: 0, 1: 2, 2: 4, 3: 6, 4: 8} >>> s = set(range(5)) >>> [i for i in range(10) if i in s] [0, 1, 2, 3, 4]Friday, November 4, 11
  7. 7. Slicing >>> range(10)[:5] [0, 1, 2, 3, 4] >>> range(10)[3:5] [3, 4] >>> range(10)[1:5] [1, 2, 3, 4] >>> range(10)[::2] [0, 2, 4, 6, 8] >>> range(10)[-5:-1] [5, 6, 7, 8]Friday, November 4, 11
  8. 8. Iterators >>> i = iter([1, 2, 3]) >>> i.next() 1 >>> i.next() 2 >>> i.next() 3 >>> i.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIterationFriday, November 4, 11
  9. 9. Generators >>> def gen_ints(n): ... for i in range(n): ... yield i ... >>> g = gen_ints(2) >>> g.next() 0 >>> g.next() 1 >>> g.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIterationFriday, November 4, 11
  10. 10. Higher Order Functions >>> def hof(n): ... def addn(i): ... return i + n ... return addn ... >>> f = hof(5) >>> f(3) 8Friday, November 4, 11
  11. 11. Decorators >>> def print_args(f): ... def g(*args, **kwargs): ... print args, kwargs ... return f(*args, **kwargs) ... return g ... >>> @print_args ... def add2(n): ... return n+2 ... >>> add2(5) (5,) {} 7 >>> add2(3) (3,) {} 5Friday, November 4, 11
  12. 12. Default & Optional Args >>> def special_arg(special=None, *args, **kwargs): ... print special:, special ... print args ... print kwargs ... >>> special_arg(special=hi) special: hi () {} >>> >>> special_arg(hi) special: hi () {}Friday, November 4, 11
  13. 13. switch/case emulation OPTS = { “a”: all, “b”: any } def all_or_any(lst, opt): return OPTS[opt](lst)Friday, November 4, 11
  14. 14. Object Oriented classes multiple inheritance special methods collections defaultdictFriday, November 4, 11
  15. 15. Classes >>> class A(object): ... def __init__(self): ... self.value = a ... >>> class B(A): ... def __init__(self): ... super(B, self).__init__() ... self.value = b ... >>> a = A() >>> a.value a >>> b = B() >>> b.value bFriday, November 4, 11
  16. 16. Multiple Inheritance >>> class B(object): ... def __init__(self): ... self.value = b ... >>> class C(A, B): pass ... >>> C().value a >>> class C(B, A): pass ... >>> C().value bFriday, November 4, 11
  17. 17. Special Methods __init__ __len__ __iter__ __contains__ __getitem__Friday, November 4, 11
  18. 18. collections high performance containers Abstract Base Classes Iterable, Sized, Sequence, Set, Mapping multi-inherit from ABC to mix & match implement only a few special methods, get rest for freeFriday, November 4, 11
  19. 19. defaultdict >>> d = {} >>> d[a] += 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: a >>> import collections >>> d = collections.defaultdict(int) >>> d[a] += 2 >>> d[a] 2 >>> l = collections.defaultdict(list) >>> l[a].append(1) >>> l[a] [1]Friday, November 4, 11
  20. 20. I/O context managers file iteration gevent / eventletFriday, November 4, 11
  21. 21. Context Managers >>> with open(myfile, w) as f: ... f.write(hellonworld) ...Friday, November 4, 11
  22. 22. File Iteration >>> with open(myfile) as f: ... for line in f: ... print line.strip() ... hello worldFriday, November 4, 11
  23. 23. gevent / eventlet coroutine networking libraries greenlets: “micro-threads” fast event loop monkey-patch standard library http://www.gevent.org/ http://www.eventlet.net/Friday, November 4, 11
  24. 24. Scripting argparse __main__ atexitFriday, November 4, 11
  25. 25. argparse import argparse parser = argparse.ArgumentParser(description=Train a NLTK Classifier) parser.add_argument(corpus, help=corpus name/path) parser.add_argument(--no-pickle, action=store_true, default=False, help="dont pickle") parser.add_argument(--trace, default=1, type=int, help=How much trace output you want) args = parser.parse_args() if args.trace: print ‘have args’Friday, November 4, 11
  26. 26. __main__ if __name__ == ‘__main__’: do_main_function()Friday, November 4, 11
  27. 27. atexit def goodbye(name, adjective): print Goodbye, %s, it was %s to meet you. % (name, adjective) import atexit atexit.register(goodbye, Donny, nice)Friday, November 4, 11
  28. 28. Testing doctest unittest nose fudge py.testFriday, November 4, 11
  29. 29. doctest def fib(n): Return the nth fibonacci number. >>> fib(0) 0 >>> fib(1) 1 >>> fib(2) 1 >>> fib(3) 2 >>> fib(4) 3 if n == 0: return 0 elif n == 1: return 1 else: return fib(n - 1) + fib(n - 2)Friday, November 4, 11
  30. 30. doctesting modules if __name__ == ‘__main__’: import doctest doctest.testmod()Friday, November 4, 11
  31. 31. unittest anything more complicated than function I/O clean state for each test test interactions between components can use mock objectsFriday, November 4, 11
  32. 32. nose http://readthedocs.org/docs/nose/en/latest/ test runner auto-discovery of tests easy plugin system plugins can generate XML for CI (Jenkins)Friday, November 4, 11
  33. 33. fudge http://farmdev.com/projects/fudge/ make fake objects mock thru monkey-patchingFriday, November 4, 11
  34. 34. py.test http://pytest.org/latest/ similar to nose distributed multi-platform testingFriday, November 4, 11
  35. 35. Remoting Libraries Fabric execnetFriday, November 4, 11
  36. 36. Fabric http://fabfile.org run commands over ssh great for “push” deployment not parallel yetFriday, November 4, 11
  37. 37. fabfile.py from fabric.api import run def host_type(): run(uname -s) fab command $ fab -H localhost,linuxbox host_type [localhost] run: uname -s [localhost] out: Darwin [linuxbox] run: uname -s [linuxbox] out: LinuxFriday, November 4, 11
  38. 38. execnet http://codespeak.net/execnet/ open python interpreters over ssh spawn local python interpreters shared-nothing model send code & data over channels interact with CPython, Jython, PyPy py.test distributed testingFriday, November 4, 11
  39. 39. execnet example >>> import execnet, os >>> gw = execnet.makegateway("ssh=codespeak.net") >>> channel = gw.remote_exec(""" ... import sys, os ... channel.send((sys.platform, sys.version_info, os.getpid())) ... """) >>> platform, version_info, remote_pid = channel.receive() >>> platform linux2 >>> version_info (2, 4, 2, final, 0)Friday, November 4, 11
  40. 40. Parsing regular expressions NLTK SimpleParseFriday, November 4, 11
  41. 41. NLTK Tokenization >>> from nltk import tokenize >>> tokenize.word_tokenize("Jacobs presentation") [Jacob, "s", presentation] >>> tokenize.wordpunct_tokenize("Jacobs presentation") [Jacob, "", s, presentation]Friday, November 4, 11
  42. 42. nltk.grammar CFGs Chapter 9 of NLTK Book: http:// nltk.googlecode.com/svn/trunk/doc/book/ ch09.htmlFriday, November 4, 11
  43. 43. more NLTK stemming part-of-speech tagging chunking classificationFriday, November 4, 11
  44. 44. SimpleParse http://simpleparse.sourceforge.net/ Parser generator EBNF grammars Based on mxTextTools: http:// www.egenix.com/products/python/mxBase/ mxTextTools/ (C extensions)Friday, November 4, 11
  45. 45. Package Management import pip virtualenv mercurialFriday, November 4, 11
  46. 46. import import module from module import function, ClassName from module import function as f always make sure package directories have __init__.pyFriday, November 4, 11
  47. 47. pip http://www.pip-installer.org/en/latest/ easy_install replacement install from requirements files $ pip install simplejson [... progress report ...] Successfully installed simplejsonFriday, November 4, 11
  48. 48. virtualenv http://www.virtualenv.org/en/latest/ create self-contained python installations dependency silos works great with pip (same author)Friday, November 4, 11
  49. 49. mercurial http://mercurial.selenic.com/ Python based DVCS simple & fast easy cloning works with Bitbucket, Github, GooglecodeFriday, November 4, 11
  50. 50. Flexible Data Storage Redis MongoDBFriday, November 4, 11
  51. 51. Redis in-memory key-value storage server most operations O(1) lists sets sorted sets hash objectsFriday, November 4, 11
  52. 52. MongoDB memory mapped document storage arbitrary document fields nested documents index on multiple fields easier (for programmers) than SQL capped collections (good for logging)Friday, November 4, 11
  53. 53. Python Performance CPU RAMFriday, November 4, 11
  54. 54. CPU probably fast enough if I/O or DB bound try PyPy: http://pypy.org/ use CPython optimized libraries like numpy write a CPython extensionFriday, November 4, 11
  55. 55. RAM don’t keep references longer than needed iterate over data aggregate to an optimized DBFriday, November 4, 11
  56. 56. import this >>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases arent special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless youre Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, its a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- lets do more of those!Friday, November 4, 11

×