0
a taste of

Presented by Jordan Baker
    October 23, 2009
    DevDays Toronto
About Me

• Open Source Developer
• Founder of Open Source Web Application
  and CMS service provider: Scryent -
  www.scr...
Agenda

• About Python
• Show me your CODE
• A Spell Checker in 21 lines of code
• Why Python ROCKS
• Resources for furthe...
About Python




http://www.flickr.com/photos/schoffer/196079076/
About Python


• Gotta love a language named after Monty
  Python’s Flying Circus
• Used in more places than you might know
Significant Whitespace
C-like

if(x == 2) {
    do_something();
}
do_something_else();

Python

if x == 2:
    do_something...
Significant Whitespace

• less code clutter
• eliminates many common syntax errors
• proper code layout
• use an indentatio...
Python is Interactive

Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type ...
FIZZ BUZZ
1
2
FIZZ
4
BUZZ
...
14
FIZZ BUZZ
FIZZ BUZZ
    def fizzbuzz(n):
      for i in range(n + 1):
          if not i % 3:
              print "Fizz",
          ...
FIZZ BUZZ
    def fizzbuzz(n):
      for i in range(n + 1):
          if not i % 3:
              print "Fizz",
          ...
FIZZ BUZZ (OO)
   class FizzBuzzWriter(object):
    def __init__(self, limit):
        self.limit = limit
       
    def ...
A Spell Checker in 21
   Lines of Code
• Written by Peter Norvig
• Duplicated in many languages
• Simple Spellchecking alg...
The Approach
•   Census by frequency

•   Morph the word (werd)

    •   Insertions: waerd, wberd, werzd

    •   Deletion...
Norvig Spellchecker
import re, collections

def words(text):
    return re.findall('[a-z]+', text.lower())

def train(word...
Regular Expressions

def words(text):
    return re.findall('[a-z]+', text.lower())

>>> words("The cat in the hat!")
['th...
Dictionaries
>>> d = {'cat':1}
>>> d
{'cat': 1}
>>> d['cat']
1

>>> d['cat'] += 1
>>> d
{'cat': 2}

>>> d['dog'] += 1
Trac...
defaultdict
# Has a factory for missing keys
>>> d = collections.defaultdict(int)
>>> d['dog'] += 1
>>> d
{'dog': 1}

>>> ...
Reading the File
     >>> text = file('big.txt').read()
     >>> NWORDS = train(words(text))
     >>> NWORDS
     {'nunner...
Training the Probability
         Model
import re, collections

def words(text):
    return re.findall('[a-z]+', text.lowe...
List Comprehensions
# These two are equivalent:

result = []
for v in iter:
    if cond:
        result.append(expr)


[ e...
String Slicing
>>> word = "spam"
>>> word[:1]
's'
>>> word[1:]
'pam'

>>> (word[:1], word[1:])
('s', 'pam')

>>> range(len...
Deletions
>>> word = "spam"
>>> s = [(word[:i], word[i:]) for i in range(len(word) + 1)]

>>> deletes = [a + b[1:] for a, ...
Transpositions

For example: teh => the

>>> transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]

>>> transpo...
Replacements

>>> alphabet = "abcdefghijklmnopqrstuvwxyz"

>>> replaces = [a + c + b[1:]  for a, b in s for c in alphabet ...
Insertion

>>> alphabet = "abcdefghijklmnopqrstuvwxyz"

>>> inserts = [a + c + b  for a, b in s for c in alphabet]
>>> ins...
Find all Edits
alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
    s = [(word[:i], word[i:]) for i in range(len...
Known Words
def known(words):
       """ Return the known words from `words`. """
       return set(w for w in words if w ...
Correct
def known(words):
    """ Return the known words from `words`. """
    return set(w for w in words if w in NWORDS)...
Edit Distance 2
def known_edits2(word):
    return set(
        e2
            for e1 in edits1(word)
                for ...
import re, collections

def words(text):
    return re.findall('[a-z]+', text.lower())

def train(words):
    model = coll...
Comparing Python &
    Java Versions

• http://raelcunha.com/spell-correct.php
• 35 lines of Java
import java.io.*;
import java.util.*;
import java.util.regex.*;


class Spelling {

"   private final HashMap<String, Inte...
import re, collections

def words(text):
    return re.findall('[a-z]+', text.lower())

def train(words):
    model = coll...
IDE for Python

• IDE’s for Python include:
 • PyDev for Eclipse
 • WingIDE
 • IDLE for Windows/ Linux/ Mac
 • there’s more
Why Python ROCKS
• Elegant and readable language - “Executable
  Pseudocode”
• Standard Libraries - “Batteries Included”
•...
An Open Source
       Community

• Projects: Plone, Zope, Grok, BFG, Django,
  SciPy & NumPy, Google App Engine,
  PyGame
...
Resources
• PyGTA
• Toronto Plone Users
• Toronto Django Users
• Stackoverflow
• Dive into Python
• Python Tutorial
Thanks

• I’d love to hear your questions or
  comments on this presentation. Reach me
  at:
  • jbb@scryent.com
  • http:...
Upcoming SlideShare
Loading in...5
×

A Taste of Python - Devdays Toronto 2009

1,705

Published on

Explores Peter Norvig's spell corrector written in Python as an example of the language's elegance and readability

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,705
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
31
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "A Taste of Python - Devdays Toronto 2009"

  1. 1. a taste of Presented by Jordan Baker October 23, 2009 DevDays Toronto
  2. 2. About Me • Open Source Developer • Founder of Open Source Web Application and CMS service provider: Scryent - www.scryent.com • Founder of Toronto Plone Users Group - www.torontoplone.ca
  3. 3. Agenda • About Python • Show me your CODE • A Spell Checker in 21 lines of code • Why Python ROCKS • Resources for further exploration
  4. 4. About Python http://www.flickr.com/photos/schoffer/196079076/
  5. 5. About Python • Gotta love a language named after Monty Python’s Flying Circus • Used in more places than you might know
  6. 6. Significant Whitespace C-like if(x == 2) { do_something(); } do_something_else(); Python if x == 2: do_something() do_something_else()
  7. 7. Significant Whitespace • less code clutter • eliminates many common syntax errors • proper code layout • use an indentation aware editor or IDE • Get over it!
  8. 8. Python is Interactive Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
  9. 9. FIZZ BUZZ 1 2 FIZZ 4 BUZZ ... 14 FIZZ BUZZ
  10. 10. FIZZ BUZZ def fizzbuzz(n):     for i in range(n + 1):         if not i % 3:             print "Fizz",         if not i % 5:             print "Buzz",         if i % 3 and i % 5:             print i,         print fizzbuzz(50)
  11. 11. FIZZ BUZZ def fizzbuzz(n):     for i in range(n + 1):         if not i % 3:             print "Fizz",         if not i % 5:             print "Buzz",         if i % 3 and i % 5:             print i,         print fizzbuzz(50)
  12. 12. FIZZ BUZZ (OO) class FizzBuzzWriter(object):     def __init__(self, limit):         self.limit = limit             def run(self):         for n in range(1, self.limit + 1):             self.write_number(n)         def write_number(self, n):         if not n % 3:             print "Fizz",         if not n % 5:             print "Buzz",         if n % 3 and n % 5:             print n,         print         fizzbuzz = FizzBuzzWriter(50) fizzbuzz.run()
  13. 13. A Spell Checker in 21 Lines of Code • Written by Peter Norvig • Duplicated in many languages • Simple Spellchecking algorithm based on probability • http://norvig.com/spell-correct.html
  14. 14. The Approach • Census by frequency • Morph the word (werd) • Insertions: waerd, wberd, werzd • Deletions: wrd, wed, wer • Transpositions: ewrd, wred, wedr • Replacements: aerd, ward, wbrd, word, wzrd, werz • Find the one with the highest frequency: were
  15. 15. Norvig Spellchecker import re, collections def words(text):    return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)     for w in words:        model[w] += 1     return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes    = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts    = [a + c + b     for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) def known_edits2(word):    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words):    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]    return max(candidates, key=NWORDS.get)
  16. 16. Regular Expressions def words(text): return re.findall('[a-z]+', text.lower()) >>> words("The cat in the hat!") ['the', 'cat', 'in', 'the', 'hat']
  17. 17. Dictionaries >>> d = {'cat':1} >>> d {'cat': 1} >>> d['cat'] 1 >>> d['cat'] += 1 >>> d {'cat': 2} >>> d['dog'] += 1 Traceback (most recent call last):  File "<stdin>", line 1, in <module> KeyError: 'dog' 
  18. 18. defaultdict # Has a factory for missing keys >>> d = collections.defaultdict(int) >>> d['dog'] += 1 >>> d {'dog': 1} >>> int <type 'int'> >>> int() 0 def train(words):    model = collections.defaultdict(int)    for w in words:        model[w] += 1    return model >>> train(words("The cat in the hat!")) {'cat': 1, 'the': 2, 'hat': 1, 'in': 1}              
  19. 19. Reading the File    >>> text = file('big.txt').read()    >>> NWORDS = train(words(text))    >>> NWORDS    {'nunnery': 3, 'presnya': 1, 'woods': 22, 'clotted': 1, 'spiders': 1,    'hanging': 42, 'disobeying': 2, 'scold': 3, 'originality': 6,    'grenadiers': 8, 'pigment': 16, 'appropriation': 6, 'strictest': 1,    'bringing': 48, 'revelers': 1, 'wooded': 8, 'wooden': 37,    'wednesday': 13, 'shows': 50, 'immunities': 3, 'guardsmen': 4,    'sooty': 1, 'inevitably': 32, 'clavicular': 9, 'sustaining': 5,    'consenting': 1, 'scraped': 21, 'errors': 16, 'semicircular': 1,    'cooking': 6, 'spiroch': 25, 'designing': 1, 'pawed': 1,    'succumb': 12, 'shocks': 1, 'crouch': 2, 'chins': 1, 'awistocwacy': 1,    'sunbeams': 1, 'perforations': 6, 'china': 43, 'affiliated': 4,    'chunk': 22, 'natured': 34, 'uplifting': 1, 'slaveholders': 2,    'climbed': 13, 'controversy': 33, 'natures': 2, 'climber': 1,    'lency': 2, 'joyousness': 1, 'reproaching': 3, 'insecurity': 1,    'abbreviations': 1, 'definiteness': 1, 'music': 56, 'therefore': 186,    'expeditionary': 3, 'primeval': 1, 'unpack': 1, 'circumstances': 107,    ... (about 6500 more lines) ...    >>> NWORDS['the']    80030    >>> NWORDS['unusual']    32    >>> NWORDS['cephalopod']    0
  20. 20. Training the Probability Model import re, collections def words(text): return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)    for w in words:    model[w] += 1    return model NWORDS = train(words(file('big.txt').read()))
  21. 21. List Comprehensions # These two are equivalent: result = [] for v in iter: if cond:    result.append(expr) [ expr for v in iter if cond ] # You can nest loops also: result = [] for v1 in iter1:    for v2 in iter2:        if cond:            result.append(expr) [ expr for v1 in iter1 for v2 in iter2 if cond ]  
  22. 22. String Slicing >>> word = "spam" >>> word[:1] 's' >>> word[1:] 'pam' >>> (word[:1], word[1:]) ('s', 'pam') >>> range(len(word) + 1) [0, 1, 2, 3, 4] >>> [(word[:i], word[i:]) for i in range(len(word) + 1)] [('', 'spam'), ('s', 'pam'), ('sp', 'am'), ('spa', 'm'), ('spam', '')]
  23. 23. Deletions >>> word = "spam" >>> s = [(word[:i], word[i:]) for i in range(len(word) + 1)] >>> deletes = [a + b[1:] for a, b in s if b] >>> deletes ['pam', 'sam', 'spm', 'spa'] >>> a, b = ('s', 'pam') >>> a 's' >>> b 'pam' >>> bool('pam') True >>> bool('') False
  24. 24. Transpositions For example: teh => the >>> transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1] >>> transposes ['psam', 'sapm', 'spma']
  25. 25. Replacements >>> alphabet = "abcdefghijklmnopqrstuvwxyz" >>> replaces = [a + c + b[1:]  for a, b in s for c in alphabet if b] >>> replaces ['apam', 'bpam', ..., 'zpam', 'saam', ..., 'szam', ..., 'spaz']
  26. 26. Insertion >>> alphabet = "abcdefghijklmnopqrstuvwxyz" >>> inserts = [a + c + b  for a, b in s for c in alphabet] >>> inserts ['aspam', ..., 'zspam', 'sapam', ..., 'szpam', 'spaam', ..., 'spamz']
  27. 27. Find all Edits alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts = [a + c + b  for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) >>> edits1("spam") set(['sptm', 'skam', 'spzam', 'vspam', 'spamj', 'zpam', 'sbam', 'spham', 'snam', 'sjpam', 'spma', 'swam', 'spaem', 'tspam', 'spmm', 'slpam', 'upam', 'spaim', 'sppm', 'spnam', 'spem', 'sparm', 'spamr', 'lspam', 'sdpam', 'spams', 'spaml', 'spamm', 'spamn', 'spum', 'spamh', 'spami', 'spatm', 'spamk', 'spamd', ..., 'spcam', 'spamy'])
  28. 28. Known Words def known(words):        """ Return the known words from `words`. """        return set(w for w in words if w in NWORDS)
  29. 29. Correct def known(words):    """ Return the known words from `words`. """    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or [word]    return max(candidates, key=NWORDS.get) >>> bool(set([])) False >>> correct("computr") 'computer' >>> correct("computor") 'computer' >>> correct("computerr") 'computer'
  30. 30. Edit Distance 2 def known_edits2(word):    return set(        e2            for e1 in edits1(word)                for e2 in edits1(e1)                    if e2 in NWORDS        ) def correct(word):    candidates = known([word]) or known(edits1(word)) or        known_edits2(word) or [word]    return max(candidates, key=NWORDS.get) >>> correct("conpuler") 'computer' >>> correct("cmpuler") 'computer'
  31. 31. import re, collections def words(text):    return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)     for w in words:        model[w] += 1     return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes    = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts    = [a + c + b     for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) def known_edits2(word):    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words):    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]    return max(candidates, key=NWORDS.get)
  32. 32. Comparing Python & Java Versions • http://raelcunha.com/spell-correct.php • 35 lines of Java
  33. 33. import java.io.*; import java.util.*; import java.util.regex.*; class Spelling { " private final HashMap<String, Integer> nWords = new HashMap<String, Integer>(); " public Spelling(String file) throws IOException { " " BufferedReader in = new BufferedReader(new FileReader(file)); " " Pattern p = Pattern.compile("w+"); " " for(String temp = ""; temp != null; temp = in.readLine()){ " " " Matcher m = p.matcher(temp.toLowerCase()); " " " while(m.find()) nWords.put((temp = m.group()), nWords.containsKey(temp) ? nWords.get(temp) + 1 : 1); " " } " " in.close(); " } " private final ArrayList<String> edits(String word) { " " ArrayList<String> result = new ArrayList<String>(); " " for(int i=0; i < word.length(); ++i) result.add(word.substring(0, i) + word.substring(i+1)); " " for(int i=0; i < word.length()-1; ++i) result.add(word.substring(0, i) + word.substring(i+1, i+2) + word.substring(i, i+1) + word.substring(i+2)); " " for(int i=0; i < word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i+1)); " " for(int i=0; i <= word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i)); " " return result; " } " public final String correct(String word) { " " if(nWords.containsKey(word)) return word; " " ArrayList<String> list = edits(word); " " HashMap<Integer, String> candidates = new HashMap<Integer, String>(); " " for(String s : list) if(nWords.containsKey(s)) candidates.put(nWords.get(s),s); " " if(candidates.size() > 0) return candidates.get(Collections.max(candidates.keySet())); " " for(String s : list) for(String w : edits(s)) if(nWords.containsKey(w)) candidates.put(nWords.get(w),w); " " return candidates.size() > 0 ? candidates.get(Collections.max(candidates.keySet())) : word; " } " public static void main(String args[]) throws IOException { " " if(args.length > 0) System.out.println((new Spelling("big.txt")).correct(args[0])); " } }
  34. 34. import re, collections def words(text):    return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)     for w in words:        model[w] += 1     return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes    = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts    = [a + c + b     for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) def known_edits2(word):    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words):    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]    return max(candidates, key=NWORDS.get)
  35. 35. IDE for Python • IDE’s for Python include: • PyDev for Eclipse • WingIDE • IDLE for Windows/ Linux/ Mac • there’s more
  36. 36. Why Python ROCKS • Elegant and readable language - “Executable Pseudocode” • Standard Libraries - “Batteries Included” • Very High level Datatypes • Dynamically Typed • It’s FUN!
  37. 37. An Open Source Community • Projects: Plone, Zope, Grok, BFG, Django, SciPy & NumPy, Google App Engine, PyGame • PyCon
  38. 38. Resources • PyGTA • Toronto Plone Users • Toronto Django Users • Stackoverflow • Dive into Python • Python Tutorial
  39. 39. Thanks • I’d love to hear your questions or comments on this presentation. Reach me at: • jbb@scryent.com • http://twitter.com/hexsprite
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×