cs1120 Fall 2011   Class 31:
David Evans
4 November 2011    Deanonymizing
Plan
Dictionaries in Python
History of Object-Oriented Programming




                                         2
Python Dictionary
Dictionary abstraction provides a lookup table.
  Each entry in a dictionary is a
           <key, value>
  pair. The key must be an immutable object.
  The value can be anything.
dictionary[key] evaluates to the value associated
  with key. Running time is approximately
  constant!       Recall: Class 9 on consistent hashing. Python
                       dictionaries use (inconsistent) hashing to make
                       lookups nearly constant time.
Dictionary Example
                  Create a new, empty dictionary
>>> d = {}
>>> d['UVa'] = 1818 Add an entry: key ‘UVa’, value 1818
>>> d['UVa'] = 1819 Update the value: key ‘UVa’, value 1819
>>> d['Cambridge'] = 1209
>>> d['UVa']
1819
>>> d['Oxford']

Traceback (most recent call last):
 File "<pyshell#93>", line 1, in <module>
  d['Oxford']
KeyError: 'Oxford'
6
Histogramming
 Define a procedure histogram that takes a text
 string as its input, and returns a dictionary that
 maps each word in the input text to the
 number of occurrences in the text.

Useful string method: split([separator])
        outputs a list of the words in the string

>>> 'here we go'.split()
['here', 'we', 'go']
>>> "Simula, Nygaard and Dahl, Norway, 1962".split(",")
['Simula', ' Nygaard and Dahl', ' Norway', ' 1962']
>>> histogram("""
                         "Mathematicians stand on each
                         others' shoulders and computer
def histogram(text):     scientists stand on each others' toes."
                         Richard Hamming""")
  d = {}                 {'and': 1, 'on': 2, 'shoulders':
                         1, 'computer': 1, 'Richard':
  words = text.split()   1, 'scientists': 1, "others'": 2, 'stand':
                         2, 'Hamming': 1, 'each':
                         2, '"Mathematicians': 1, 'toes."': 1}




                                                                  8
>>> declaration =
def histogram(text):     urllib.urlopen('http://www.cs.virginia.edu/cs11
  d = {}                 20/readings/declaration.html').read()
                         >>> histogram(declaration)
  words = text.split()   {'government,': 1, 'all':
  for w in words:        11, 'forbidden': 1, '</title>':
                         1, '1776</b>': 1, 'hath': 1, 'Caesar':
    if w in d:           1, 'invariably': 1, 'settlement':
       d[w] = d[w] + 1   1, 'Lee,': 2, 'causes': 1, 'whose':
                         2, 'hold': 3, 'duty,': 1, 'ages,':
    else:                2, 'Object': 1, 'suspending': 1, 'to':
       d[w] = 1          66, 'present': 1, 'Providence,':
                         1, 'under': 1, '<dd>For': 9, 'should.':
  return d               1, 'sent': 1, 'Stone,': 1, 'paralleled':
                         1, …
Sorting the Histogram
Expression ::= lambda Parameters : Expression   sorted(collection, cmp)
                                                Returns a new sorted list of the
     Makes a procedure, just like Scheme’s
     lambda (instead of listing parameters in
                                                elements in collection ordered by cmp.
     (), separate with :)
                                                cmp specifies a comparison function of
                                                two arguments which should return a
  >>> sorted([1,5,3,2,4],<)                     negative, zero or positive number
  SyntaxError: invalid syntax
                                                depending on whether the first
  >>> <
                                                argument is considered smaller
  SyntaxError: invalid syntax
  >>> sorted([1,5,3,2,4], lambda a, b: a > b)   than, equal to, or larger than the
  [1, 5, 3, 2, 4]                               second argument.
  >>> sorted([1,5,3,2,4], lambda a, b: a - b)
  [1, 2, 3, 4, 5]




                                                                                         10
Showing the Histogram

def show_histogram(d):
  keys = d.keys()
  okeys = sorted(keys,


  for k in okeys:
    print str(k) + ": " + str(d[k])
Showing the Histogram

def show_histogram(d):
  keys = d.keys()
  okeys = sorted(keys,
                   lambda k1, k2: d[k2] - d[k1])
  for k in okeys:
     print str(k) + ": " + str(d[k])
Author Fingerprinting
 (aka Plagarism Detection)
“The program identifies phrases of three words
or more in an author’s known work and searches
for them in unattributed plays. In tests where
authors are known to be different, there are up
to 20 matches because some phrases are in
common usage. When Edward III was tested
against Shakespeare’s works published before
1596 there were 200 matches.”
                      The Times, 12 October 2009
def histogram(text):
                                                  d = {}
                                                  words = text.split()
                                                  for w in words:
                                                    if w in d:
                                                       d[w] = d[w] + 1
def phrase_collector(text, plen):
                                                    else:
  d = {}
                                                       d[w] = 1
  words = text.split()
                                                  return d
  words = map(lambda s: s.lower(), words)
  for windex in range(0, len(words) - plen):
    phrase = tuple(words[windex:windex+plen])
    if phrase in d:                          Dictionary keys must be
       d[phrase] = d[phrase] + 1             immutable: convert the
                                             (mutable) list to an
    else:                                    immutable tuple.
       d[phrase]= 1
  return d
def common_phrases(d1, d2):
  keys = d1.keys()
  common = {}
  for k in keys:
     if k in d2:
        common[k] = (d1[k], d2[k])
  return common
 myhomepage = urllib.urlopen('http://www.cs.virginia.edu/evans/index.html').read()
 declaration = urllib.urlopen('http://www.cs.virginia.edu/cs1120/readings/declaration.html').read()

                                  >>> ptj = phrase_collector(declaration, 3)
                                  >>> ptj
                                  {('samuel', 'adams,', 'john'): 1, ('to', 'pass', 'others'):
                                  1, ('absolute', 'despotism,', 'it'): 1, ('a', 'firm', 'reliance'):
                                  1, ('with', 'his', 'measures.'): 1, ('are', 'his.', '<p>'):
                                  1, ('the', 'ruler', 'of'): 1, …
                                  >>> pde = phrase_collector(myhomepage, 3)
                                  >>> common_phrases(ptj, pde)
                                  {('from', 'the', '<a'): (1, 1)}
>>> pde = phrase_collector(myhomepage, 2)
>>> ptj = phrase_collector(declaration, 2)
>>> show_phrases(common_phrases(ptj, pde))
('of', 'the'): (12, 5)
('in', 'the'): (7, 7)
('the', '<a'): (1, 10)
('for', 'the'): (6, 5)
('to', 'the'): (7, 3)
('from', 'the'): (3, 5)
('to', 'be'): (6, 1)
('<p>', 'i'): (1, 6)
('on', 'the'): (5, 2)
('we', 'have'): (5, 1)
('of', 'their'): (4, 1)
('and', 'the'): (3, 1)
('to', 'provide'): (1, 3)
('of', 'a'): (2, 2)
('the', 'state'): (2, 2)
('by', 'their'): (3, 1)
('the', 'same'): (2, 1)
…
History of
Object-Oriented
  Programming
Computing in World War II
Cryptanalysis (Lorenz: Collossus at Bletchley
  Park, Enigma: Bombes at Bletchley, NCR in US)
Ballistics Tables, calculations for Hydrogen
  bomb (ENIAC at U. Pennsylvania)
Batch processing: submit a program and its
  data, wait your turn, get a result

Building a flight simulator required a different type of computing:
                        interactive computing
Pre-History:
          MIT’s Project Whirlwind (1947-1960s)




Jay Forrester
Whirlwind Innovations




Magnetic Core Memory
(first version used vacuum tubes)   IBM 704 (used by John McCarthy to
                                    create LISP) commercialized this
August 29, 1949:
  First Soviet
  Atomic Test
Short or Endless Golden Age of
                           Nuclear Weapons?
             60000
                                        Tsar Bomba (50 Mt, largest ever = 10x all of WWII)
             50000


             40000


             30000
kilotons




             20000

                                 First H-Bomb (10Mt)             B83 (1.2Mt), largest
             10000                                               in currently active arsenal

                 0
                  1940   1950    1960     1970    1980    1990     2000     2010    2020

           Hiroshima (12kt), Nagasaki (20kt)
Semi-Automatic Ground
Environment (SAGE)
MIT/IBM, 1950-1982
Coordinate radar
  stations in real-time to
  track incoming
  bombers
Total cost: $55B
  (more than Manhattan
  Project)
First intercontinental ballistic missile
               First successful test: August 21, 1957




R-7 Semyorka
                        Sputnik: launched by R-7, October 4, 1957
Sketchpad
                              Ivan Sutherland’s 1963 PhD thesis
                               (supervised by Claude Shannon)




Interactive drawing program
Light pen
Components in
  Sketchpad
Objects in Sketchpad
In the process of making the Sketchpad system operate, a few very general
functions were developed which make no reference at all to the specific types
of entities on which they operate. These general functions give the Sketchpad
system the ability to operate on a wide range of problems. The motivation for
making the functions as general as possible came from the desire to get as much
result as possible from the programming effort involved. For example, the general
function for expanding instances makes it possible for Sketchpad to handle any
fixed geometry subpicture. The rewards that come from implementing general
functions are so great that the author has become reluctant to write any
programs for specific jobs. Each of the general functions implemented in the
Sketchpad system abstracts, in some sense, some common property of pictures
independent of the specific subject matter of the pictures themselves.
                                                           Ivan Sutherland,
             Sketchpad: a Man-Machine Graphical Communication System, 1963
Simula
Considered the first
  “object-oriented”
  programming language
Language designed for
  simulation by Kristen
  Nygaard and Ole-Johan
  Dahl (Norway, 1962)
Had special syntax for
  defining classes that
  packages state and
  procedures together
Counter in Simula
class counter;
  integer count;
  begin
   procedure reset(); count := 0; end;
   procedure next();
       count := count + 1; end;
   integer procedure current();
      current := count; end;
  end
                Does this have everything we need for
                “object-oriented programming”?
Object-Oriented Programming
Object-Oriented Programming is a state of mind
   where you program by thinking about objects
It is difficult to reach that state of mind if your
   language doesn’t have mechanisms for
   packaging state and procedures (Python has
   class, Scheme has lambda expressions)
Other things can help: dynamic
   dispatch, inheritance, automatic memory
   management, mixins, good donuts, etc.
Charge
Monday: continue OOP history
PS6 Due Monday
Next week:
  PS7 Due next Monday: only one week!
  building a (mini) Scheme interpreter
      (in Python and Java)
              Reminder: Peter has office hours now!
                      (going over to Rice)

                                                      31

Class 31: Deanonymizing

  • 1.
    cs1120 Fall 2011 Class 31: David Evans 4 November 2011 Deanonymizing
  • 2.
    Plan Dictionaries in Python Historyof Object-Oriented Programming 2
  • 3.
    Python Dictionary Dictionary abstractionprovides a lookup table. Each entry in a dictionary is a <key, value> pair. The key must be an immutable object. The value can be anything. dictionary[key] evaluates to the value associated with key. Running time is approximately constant! Recall: Class 9 on consistent hashing. Python dictionaries use (inconsistent) hashing to make lookups nearly constant time.
  • 4.
    Dictionary Example Create a new, empty dictionary >>> d = {} >>> d['UVa'] = 1818 Add an entry: key ‘UVa’, value 1818 >>> d['UVa'] = 1819 Update the value: key ‘UVa’, value 1819 >>> d['Cambridge'] = 1209 >>> d['UVa'] 1819 >>> d['Oxford'] Traceback (most recent call last): File "<pyshell#93>", line 1, in <module> d['Oxford'] KeyError: 'Oxford'
  • 6.
  • 7.
    Histogramming Define aprocedure histogram that takes a text string as its input, and returns a dictionary that maps each word in the input text to the number of occurrences in the text. Useful string method: split([separator]) outputs a list of the words in the string >>> 'here we go'.split() ['here', 'we', 'go'] >>> "Simula, Nygaard and Dahl, Norway, 1962".split(",") ['Simula', ' Nygaard and Dahl', ' Norway', ' 1962']
  • 8.
    >>> histogram(""" "Mathematicians stand on each others' shoulders and computer def histogram(text): scientists stand on each others' toes." Richard Hamming""") d = {} {'and': 1, 'on': 2, 'shoulders': 1, 'computer': 1, 'Richard': words = text.split() 1, 'scientists': 1, "others'": 2, 'stand': 2, 'Hamming': 1, 'each': 2, '"Mathematicians': 1, 'toes."': 1} 8
  • 9.
    >>> declaration = defhistogram(text): urllib.urlopen('http://www.cs.virginia.edu/cs11 d = {} 20/readings/declaration.html').read() >>> histogram(declaration) words = text.split() {'government,': 1, 'all': for w in words: 11, 'forbidden': 1, '</title>': 1, '1776</b>': 1, 'hath': 1, 'Caesar': if w in d: 1, 'invariably': 1, 'settlement': d[w] = d[w] + 1 1, 'Lee,': 2, 'causes': 1, 'whose': 2, 'hold': 3, 'duty,': 1, 'ages,': else: 2, 'Object': 1, 'suspending': 1, 'to': d[w] = 1 66, 'present': 1, 'Providence,': 1, 'under': 1, '<dd>For': 9, 'should.': return d 1, 'sent': 1, 'Stone,': 1, 'paralleled': 1, …
  • 10.
    Sorting the Histogram Expression::= lambda Parameters : Expression sorted(collection, cmp) Returns a new sorted list of the Makes a procedure, just like Scheme’s lambda (instead of listing parameters in elements in collection ordered by cmp. (), separate with :) cmp specifies a comparison function of two arguments which should return a >>> sorted([1,5,3,2,4],<) negative, zero or positive number SyntaxError: invalid syntax depending on whether the first >>> < argument is considered smaller SyntaxError: invalid syntax >>> sorted([1,5,3,2,4], lambda a, b: a > b) than, equal to, or larger than the [1, 5, 3, 2, 4] second argument. >>> sorted([1,5,3,2,4], lambda a, b: a - b) [1, 2, 3, 4, 5] 10
  • 11.
    Showing the Histogram defshow_histogram(d): keys = d.keys() okeys = sorted(keys, for k in okeys: print str(k) + ": " + str(d[k])
  • 12.
    Showing the Histogram defshow_histogram(d): keys = d.keys() okeys = sorted(keys, lambda k1, k2: d[k2] - d[k1]) for k in okeys: print str(k) + ": " + str(d[k])
  • 13.
    Author Fingerprinting (akaPlagarism Detection) “The program identifies phrases of three words or more in an author’s known work and searches for them in unattributed plays. In tests where authors are known to be different, there are up to 20 matches because some phrases are in common usage. When Edward III was tested against Shakespeare’s works published before 1596 there were 200 matches.” The Times, 12 October 2009
  • 14.
    def histogram(text): d = {} words = text.split() for w in words: if w in d: d[w] = d[w] + 1 def phrase_collector(text, plen): else: d = {} d[w] = 1 words = text.split() return d words = map(lambda s: s.lower(), words) for windex in range(0, len(words) - plen): phrase = tuple(words[windex:windex+plen]) if phrase in d: Dictionary keys must be d[phrase] = d[phrase] + 1 immutable: convert the (mutable) list to an else: immutable tuple. d[phrase]= 1 return d
  • 15.
    def common_phrases(d1, d2): keys = d1.keys() common = {} for k in keys: if k in d2: common[k] = (d1[k], d2[k]) return common myhomepage = urllib.urlopen('http://www.cs.virginia.edu/evans/index.html').read() declaration = urllib.urlopen('http://www.cs.virginia.edu/cs1120/readings/declaration.html').read() >>> ptj = phrase_collector(declaration, 3) >>> ptj {('samuel', 'adams,', 'john'): 1, ('to', 'pass', 'others'): 1, ('absolute', 'despotism,', 'it'): 1, ('a', 'firm', 'reliance'): 1, ('with', 'his', 'measures.'): 1, ('are', 'his.', '<p>'): 1, ('the', 'ruler', 'of'): 1, … >>> pde = phrase_collector(myhomepage, 3) >>> common_phrases(ptj, pde) {('from', 'the', '<a'): (1, 1)}
  • 16.
    >>> pde =phrase_collector(myhomepage, 2) >>> ptj = phrase_collector(declaration, 2) >>> show_phrases(common_phrases(ptj, pde)) ('of', 'the'): (12, 5) ('in', 'the'): (7, 7) ('the', '<a'): (1, 10) ('for', 'the'): (6, 5) ('to', 'the'): (7, 3) ('from', 'the'): (3, 5) ('to', 'be'): (6, 1) ('<p>', 'i'): (1, 6) ('on', 'the'): (5, 2) ('we', 'have'): (5, 1) ('of', 'their'): (4, 1) ('and', 'the'): (3, 1) ('to', 'provide'): (1, 3) ('of', 'a'): (2, 2) ('the', 'state'): (2, 2) ('by', 'their'): (3, 1) ('the', 'same'): (2, 1) …
  • 17.
  • 18.
    Computing in WorldWar II Cryptanalysis (Lorenz: Collossus at Bletchley Park, Enigma: Bombes at Bletchley, NCR in US) Ballistics Tables, calculations for Hydrogen bomb (ENIAC at U. Pennsylvania) Batch processing: submit a program and its data, wait your turn, get a result Building a flight simulator required a different type of computing: interactive computing
  • 19.
    Pre-History: MIT’s Project Whirlwind (1947-1960s) Jay Forrester
  • 20.
    Whirlwind Innovations Magnetic CoreMemory (first version used vacuum tubes) IBM 704 (used by John McCarthy to create LISP) commercialized this
  • 21.
    August 29, 1949: First Soviet Atomic Test
  • 22.
    Short or EndlessGolden Age of Nuclear Weapons? 60000 Tsar Bomba (50 Mt, largest ever = 10x all of WWII) 50000 40000 30000 kilotons 20000 First H-Bomb (10Mt) B83 (1.2Mt), largest 10000 in currently active arsenal 0 1940 1950 1960 1970 1980 1990 2000 2010 2020 Hiroshima (12kt), Nagasaki (20kt)
  • 23.
    Semi-Automatic Ground Environment (SAGE) MIT/IBM,1950-1982 Coordinate radar stations in real-time to track incoming bombers Total cost: $55B (more than Manhattan Project)
  • 24.
    First intercontinental ballisticmissile First successful test: August 21, 1957 R-7 Semyorka Sputnik: launched by R-7, October 4, 1957
  • 25.
    Sketchpad Ivan Sutherland’s 1963 PhD thesis (supervised by Claude Shannon) Interactive drawing program Light pen
  • 26.
    Components in Sketchpad
  • 27.
    Objects in Sketchpad Inthe process of making the Sketchpad system operate, a few very general functions were developed which make no reference at all to the specific types of entities on which they operate. These general functions give the Sketchpad system the ability to operate on a wide range of problems. The motivation for making the functions as general as possible came from the desire to get as much result as possible from the programming effort involved. For example, the general function for expanding instances makes it possible for Sketchpad to handle any fixed geometry subpicture. The rewards that come from implementing general functions are so great that the author has become reluctant to write any programs for specific jobs. Each of the general functions implemented in the Sketchpad system abstracts, in some sense, some common property of pictures independent of the specific subject matter of the pictures themselves. Ivan Sutherland, Sketchpad: a Man-Machine Graphical Communication System, 1963
  • 28.
    Simula Considered the first “object-oriented” programming language Language designed for simulation by Kristen Nygaard and Ole-Johan Dahl (Norway, 1962) Had special syntax for defining classes that packages state and procedures together
  • 29.
    Counter in Simula classcounter; integer count; begin procedure reset(); count := 0; end; procedure next(); count := count + 1; end; integer procedure current(); current := count; end; end Does this have everything we need for “object-oriented programming”?
  • 30.
    Object-Oriented Programming Object-Oriented Programmingis a state of mind where you program by thinking about objects It is difficult to reach that state of mind if your language doesn’t have mechanisms for packaging state and procedures (Python has class, Scheme has lambda expressions) Other things can help: dynamic dispatch, inheritance, automatic memory management, mixins, good donuts, etc.
  • 31.
    Charge Monday: continue OOPhistory PS6 Due Monday Next week: PS7 Due next Monday: only one week! building a (mini) Scheme interpreter (in Python and Java) Reminder: Peter has office hours now! (going over to Rice) 31