A Taste of Python - Devdays Toronto 2009

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    2 Favorites

    A Taste of Python - Devdays Toronto 2009 - Presentation Transcript

    1. a taste of Presented by Jordan Baker October 23, 2009 DevDays Toronto
    2. About Me • Open Source Developer • Founder of Open Source Web Application and CMS service provider: Scryent - www.scryent.com • Founder of Toronto Plone Users Group - www.torontoplone.ca
    3. Agenda • About Python • Show me your CODE • A Spell Checker in 21 lines of code • Why Python ROCKS • Resources for further exploration
    4. About Python http://www.flickr.com/photos/schoffer/196079076/
    5. About Python • Gotta love a language named after Monty Python’s Flying Circus • Used in more places than you might know
    6. Significant Whitespace C-like if(x == 2) { do_something(); } do_something_else(); Python if x == 2: do_something() do_something_else()
    7. Significant Whitespace • less code clutter • eliminates many common syntax errors • proper code layout • use an indentation aware editor or IDE • Get over it!
    8. Python is Interactive Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
    9. FIZZ BUZZ 1 2 FIZZ 4 BUZZ ... 14 FIZZ BUZZ
    10. FIZZ BUZZ def fizzbuzz(n):     for i in range(n + 1):         if not i % 3:             print "Fizz",         if not i % 5:             print "Buzz",         if i % 3 and i % 5:             print i,         print fizzbuzz(50)
    11. FIZZ BUZZ def fizzbuzz(n):     for i in range(n + 1):         if not i % 3:             print "Fizz",         if not i % 5:             print "Buzz",         if i % 3 and i % 5:             print i,         print fizzbuzz(50)
    12. FIZZ BUZZ (OO) class FizzBuzzWriter(object):     def __init__(self, limit):         self.limit = limit             def run(self):         for n in range(1, self.limit + 1):             self.write_number(n)         def write_number(self, n):         if not n % 3:             print "Fizz",         if not n % 5:             print "Buzz",         if n % 3 and n % 5:             print n,         print         fizzbuzz = FizzBuzzWriter(50) fizzbuzz.run()
    13. A Spell Checker in 21 Lines of Code • Written by Peter Norvig • Duplicated in many languages • Simple Spellchecking algorithm based on probability • http://norvig.com/spell-correct.html
    14. The Approach • Census by frequency • Morph the word (werd) • Insertions: waerd, wberd, werzd • Deletions: wrd, wed, wer • Transpositions: ewrd, wred, wedr • Replacements: aerd, ward, wbrd, word, wzrd, werz • Find the one with the highest frequency: were
    15. Norvig Spellchecker import re, collections def words(text):    return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)     for w in words:        model[w] += 1     return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes    = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts    = [a + c + b     for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) def known_edits2(word):    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words):    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]    return max(candidates, key=NWORDS.get)
    16. Regular Expressions def words(text): return re.findall('[a-z]+', text.lower()) >>> words("The cat in the hat!") ['the', 'cat', 'in', 'the', 'hat']
    17. Dictionaries >>> d = {'cat':1} >>> d {'cat': 1} >>> d['cat'] 1 >>> d['cat'] += 1 >>> d {'cat': 2} >>> d['dog'] += 1 Traceback (most recent call last):  File "<stdin>", line 1, in <module> KeyError: 'dog' 
    18. defaultdict # Has a factory for missing keys >>> d = collections.defaultdict(int) >>> d['dog'] += 1 >>> d {'dog': 1} >>> int <type 'int'> >>> int() 0 def train(words):    model = collections.defaultdict(int)    for w in words:        model[w] += 1    return model >>> train(words("The cat in the hat!")) {'cat': 1, 'the': 2, 'hat': 1, 'in': 1}              
    19. Reading the File    >>> text = file('big.txt').read()    >>> NWORDS = train(words(text))    >>> NWORDS    {'nunnery': 3, 'presnya': 1, 'woods': 22, 'clotted': 1, 'spiders': 1,    'hanging': 42, 'disobeying': 2, 'scold': 3, 'originality': 6,    'grenadiers': 8, 'pigment': 16, 'appropriation': 6, 'strictest': 1,    'bringing': 48, 'revelers': 1, 'wooded': 8, 'wooden': 37,    'wednesday': 13, 'shows': 50, 'immunities': 3, 'guardsmen': 4,    'sooty': 1, 'inevitably': 32, 'clavicular': 9, 'sustaining': 5,    'consenting': 1, 'scraped': 21, 'errors': 16, 'semicircular': 1,    'cooking': 6, 'spiroch': 25, 'designing': 1, 'pawed': 1,    'succumb': 12, 'shocks': 1, 'crouch': 2, 'chins': 1, 'awistocwacy': 1,    'sunbeams': 1, 'perforations': 6, 'china': 43, 'affiliated': 4,    'chunk': 22, 'natured': 34, 'uplifting': 1, 'slaveholders': 2,    'climbed': 13, 'controversy': 33, 'natures': 2, 'climber': 1,    'lency': 2, 'joyousness': 1, 'reproaching': 3, 'insecurity': 1,    'abbreviations': 1, 'definiteness': 1, 'music': 56, 'therefore': 186,    'expeditionary': 3, 'primeval': 1, 'unpack': 1, 'circumstances': 107,    ... (about 6500 more lines) ...    >>> NWORDS['the']    80030    >>> NWORDS['unusual']    32    >>> NWORDS['cephalopod']    0
    20. Training the Probability Model import re, collections def words(text): return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)    for w in words:    model[w] += 1    return model NWORDS = train(words(file('big.txt').read()))
    21. List Comprehensions # These two are equivalent: result = [] for v in iter: if cond:    result.append(expr) [ expr for v in iter if cond ] # You can nest loops also: result = [] for v1 in iter1:    for v2 in iter2:        if cond:            result.append(expr) [ expr for v1 in iter1 for v2 in iter2 if cond ]  
    22. String Slicing >>> word = "spam" >>> word[:1] 's' >>> word[1:] 'pam' >>> (word[:1], word[1:]) ('s', 'pam') >>> range(len(word) + 1) [0, 1, 2, 3, 4] >>> [(word[:i], word[i:]) for i in range(len(word) + 1)] [('', 'spam'), ('s', 'pam'), ('sp', 'am'), ('spa', 'm'), ('spam', '')]
    23. Deletions >>> word = "spam" >>> s = [(word[:i], word[i:]) for i in range(len(word) + 1)] >>> deletes = [a + b[1:] for a, b in s if b] >>> deletes ['pam', 'sam', 'spm', 'spa'] >>> a, b = ('s', 'pam') >>> a 's' >>> b 'pam' >>> bool('pam') True >>> bool('') False
    24. Transpositions For example: teh => the >>> transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1] >>> transposes ['psam', 'sapm', 'spma']
    25. Replacements >>> alphabet = "abcdefghijklmnopqrstuvwxyz" >>> replaces = [a + c + b[1:]  for a, b in s for c in alphabet if b] >>> replaces ['apam', 'bpam', ..., 'zpam', 'saam', ..., 'szam', ..., 'spaz']
    26. Insertion >>> alphabet = "abcdefghijklmnopqrstuvwxyz" >>> inserts = [a + c + b  for a, b in s for c in alphabet] >>> inserts ['aspam', ..., 'zspam', 'sapam', ..., 'szpam', 'spaam', ..., 'spamz']
    27. Find all Edits alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts = [a + c + b  for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) >>> edits1("spam") set(['sptm', 'skam', 'spzam', 'vspam', 'spamj', 'zpam', 'sbam', 'spham', 'snam', 'sjpam', 'spma', 'swam', 'spaem', 'tspam', 'spmm', 'slpam', 'upam', 'spaim', 'sppm', 'spnam', 'spem', 'sparm', 'spamr', 'lspam', 'sdpam', 'spams', 'spaml', 'spamm', 'spamn', 'spum', 'spamh', 'spami', 'spatm', 'spamk', 'spamd', ..., 'spcam', 'spamy'])
    28. Known Words def known(words):        """ Return the known words from `words`. """        return set(w for w in words if w in NWORDS)
    29. Correct def known(words):    """ Return the known words from `words`. """    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or [word]    return max(candidates, key=NWORDS.get) >>> bool(set([])) False >>> correct("computr") 'computer' >>> correct("computor") 'computer' >>> correct("computerr") 'computer'
    30. Edit Distance 2 def known_edits2(word):    return set(        e2            for e1 in edits1(word)                for e2 in edits1(e1)                    if e2 in NWORDS        ) def correct(word):    candidates = known([word]) or known(edits1(word)) or        known_edits2(word) or [word]    return max(candidates, key=NWORDS.get) >>> correct("conpuler") 'computer' >>> correct("cmpuler") 'computer'
    31. import re, collections def words(text):    return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)     for w in words:        model[w] += 1     return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes    = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts    = [a + c + b     for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) def known_edits2(word):    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words):    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]    return max(candidates, key=NWORDS.get)
    32. Comparing Python & Java Versions • http://raelcunha.com/spell-correct.php • 35 lines of Java
    33. import java.io.*; import java.util.*; import java.util.regex.*; class Spelling { " private final HashMap<String, Integer> nWords = new HashMap<String, Integer>(); " public Spelling(String file) throws IOException { " " BufferedReader in = new BufferedReader(new FileReader(file)); " " Pattern p = Pattern.compile("w+"); " " for(String temp = ""; temp != null; temp = in.readLine()){ " " " Matcher m = p.matcher(temp.toLowerCase()); " " " while(m.find()) nWords.put((temp = m.group()), nWords.containsKey(temp) ? nWords.get(temp) + 1 : 1); " " } " " in.close(); " } " private final ArrayList<String> edits(String word) { " " ArrayList<String> result = new ArrayList<String>(); " " for(int i=0; i < word.length(); ++i) result.add(word.substring(0, i) + word.substring(i+1)); " " for(int i=0; i < word.length()-1; ++i) result.add(word.substring(0, i) + word.substring(i+1, i+2) + word.substring(i, i+1) + word.substring(i+2)); " " for(int i=0; i < word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i+1)); " " for(int i=0; i <= word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i)); " " return result; " } " public final String correct(String word) { " " if(nWords.containsKey(word)) return word; " " ArrayList<String> list = edits(word); " " HashMap<Integer, String> candidates = new HashMap<Integer, String>(); " " for(String s : list) if(nWords.containsKey(s)) candidates.put(nWords.get(s),s); " " if(candidates.size() > 0) return candidates.get(Collections.max(candidates.keySet())); " " for(String s : list) for(String w : edits(s)) if(nWords.containsKey(w)) candidates.put(nWords.get(w),w); " " return candidates.size() > 0 ? candidates.get(Collections.max(candidates.keySet())) : word; " } " public static void main(String args[]) throws IOException { " " if(args.length > 0) System.out.println((new Spelling("big.txt")).correct(args[0])); " } }
    34. import re, collections def words(text):    return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)     for w in words:        model[w] += 1     return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes    = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts    = [a + c + b     for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) def known_edits2(word):    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words):    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]    return max(candidates, key=NWORDS.get)
    35. IDE for Python • IDE’s for Python include: • PyDev for Eclipse • WingIDE • IDLE for Windows/ Linux/ Mac • there’s more
    36. Why Python ROCKS • Elegant and readable language - “Executable Pseudocode” • Standard Libraries - “Batteries Included” • Very High level Datatypes • Dynamically Typed • It’s FUN!
    37. An Open Source Community • Projects: Plone, Zope, Grok, BFG, Django, SciPy & NumPy, Google App Engine, PyGame • PyCon
    38. Resources • PyGTA • Toronto Plone Users • Toronto Django Users • Stackoverflow • Dive into Python • Python Tutorial
    39. Thanks • I’d love to hear your questions or comments on this presentation. Reach me at: • jbb@scryent.com • http://twitter.com/hexsprite
    SlideShare Zeitgeist 2009

    + hexspritehexsprite Nominate

    custom

    537 views, 2 favs, 2 embeds more stats

    Explores Peter Norvig's spell corrector written in more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 537
      • 484 on SlideShare
      • 53 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 11
    Most viewed embeds
    • 29 views on http://www.globalnerdy.com
    • 24 views on http://blogs.msdn.com

    more

    All embeds
    • 29 views on http://www.globalnerdy.com
    • 24 views on http://blogs.msdn.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories