Class 31: Deanonymizing

464 views

Published on

Python Dictionaries
Plagarism Detection
(Pre-)History of Object-Oriented Programming

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
464
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Class 31: Deanonymizing

  1. 1. cs1120 Fall 2011 Class 31:David Evans4 November 2011 Deanonymizing
  2. 2. PlanDictionaries in PythonHistory of Object-Oriented Programming 2
  3. 3. Python DictionaryDictionary abstraction provides a lookup table. Each entry in a dictionary is a <key, value> pair. The key must be an immutable object. The value can be anything.dictionary[key] evaluates to the value associated with key. Running time is approximately constant! Recall: Class 9 on consistent hashing. Python dictionaries use (inconsistent) hashing to make lookups nearly constant time.
  4. 4. Dictionary Example Create a new, empty dictionary>>> d = {}>>> d[UVa] = 1818 Add an entry: key ‘UVa’, value 1818>>> d[UVa] = 1819 Update the value: key ‘UVa’, value 1819>>> d[Cambridge] = 1209>>> d[UVa]1819>>> d[Oxford]Traceback (most recent call last): File "<pyshell#93>", line 1, in <module> d[Oxford]KeyError: Oxford
  5. 5. 6
  6. 6. Histogramming Define a procedure histogram that takes a text string as its input, and returns a dictionary that maps each word in the input text to the number of occurrences in the text.Useful string method: split([separator]) outputs a list of the words in the string>>> here we go.split()[here, we, go]>>> "Simula, Nygaard and Dahl, Norway, 1962".split(",")[Simula, Nygaard and Dahl, Norway, 1962]
  7. 7. >>> histogram(""" "Mathematicians stand on each others shoulders and computerdef histogram(text): scientists stand on each others toes." Richard Hamming""") d = {} {and: 1, on: 2, shoulders: 1, computer: 1, Richard: words = text.split() 1, scientists: 1, "others": 2, stand: 2, Hamming: 1, each: 2, "Mathematicians: 1, toes.": 1} 8
  8. 8. >>> declaration =def histogram(text): urllib.urlopen(http://www.cs.virginia.edu/cs11 d = {} 20/readings/declaration.html).read() >>> histogram(declaration) words = text.split() {government,: 1, all: for w in words: 11, forbidden: 1, </title>: 1, 1776</b>: 1, hath: 1, Caesar: if w in d: 1, invariably: 1, settlement: d[w] = d[w] + 1 1, Lee,: 2, causes: 1, whose: 2, hold: 3, duty,: 1, ages,: else: 2, Object: 1, suspending: 1, to: d[w] = 1 66, present: 1, Providence,: 1, under: 1, <dd>For: 9, should.: return d 1, sent: 1, Stone,: 1, paralleled: 1, …
  9. 9. Sorting the HistogramExpression ::= lambda Parameters : Expression sorted(collection, cmp) Returns a new sorted list of the Makes a procedure, just like Scheme’s lambda (instead of listing parameters in elements in collection ordered by cmp. (), separate with :) cmp specifies a comparison function of two arguments which should return a >>> sorted([1,5,3,2,4],<) negative, zero or positive number SyntaxError: invalid syntax depending on whether the first >>> < argument is considered smaller SyntaxError: invalid syntax >>> sorted([1,5,3,2,4], lambda a, b: a > b) than, equal to, or larger than the [1, 5, 3, 2, 4] second argument. >>> sorted([1,5,3,2,4], lambda a, b: a - b) [1, 2, 3, 4, 5] 10
  10. 10. Showing the Histogramdef show_histogram(d): keys = d.keys() okeys = sorted(keys, for k in okeys: print str(k) + ": " + str(d[k])
  11. 11. Showing the Histogramdef show_histogram(d): keys = d.keys() okeys = sorted(keys, lambda k1, k2: d[k2] - d[k1]) for k in okeys: print str(k) + ": " + str(d[k])
  12. 12. Author Fingerprinting (aka Plagarism Detection)“The program identifies phrases of three wordsor more in an author’s known work and searchesfor them in unattributed plays. In tests whereauthors are known to be different, there are upto 20 matches because some phrases are incommon usage. When Edward III was testedagainst Shakespeare’s works published before1596 there were 200 matches.” The Times, 12 October 2009
  13. 13. def histogram(text): d = {} words = text.split() for w in words: if w in d: d[w] = d[w] + 1def phrase_collector(text, plen): else: d = {} d[w] = 1 words = text.split() return d words = map(lambda s: s.lower(), words) for windex in range(0, len(words) - plen): phrase = tuple(words[windex:windex+plen]) if phrase in d: Dictionary keys must be d[phrase] = d[phrase] + 1 immutable: convert the (mutable) list to an else: immutable tuple. d[phrase]= 1 return d
  14. 14. def common_phrases(d1, d2): keys = d1.keys() common = {} for k in keys: if k in d2: common[k] = (d1[k], d2[k]) return common myhomepage = urllib.urlopen(http://www.cs.virginia.edu/evans/index.html).read() declaration = urllib.urlopen(http://www.cs.virginia.edu/cs1120/readings/declaration.html).read() >>> ptj = phrase_collector(declaration, 3) >>> ptj {(samuel, adams,, john): 1, (to, pass, others): 1, (absolute, despotism,, it): 1, (a, firm, reliance): 1, (with, his, measures.): 1, (are, his., <p>): 1, (the, ruler, of): 1, … >>> pde = phrase_collector(myhomepage, 3) >>> common_phrases(ptj, pde) {(from, the, <a): (1, 1)}
  15. 15. >>> pde = phrase_collector(myhomepage, 2)>>> ptj = phrase_collector(declaration, 2)>>> show_phrases(common_phrases(ptj, pde))(of, the): (12, 5)(in, the): (7, 7)(the, <a): (1, 10)(for, the): (6, 5)(to, the): (7, 3)(from, the): (3, 5)(to, be): (6, 1)(<p>, i): (1, 6)(on, the): (5, 2)(we, have): (5, 1)(of, their): (4, 1)(and, the): (3, 1)(to, provide): (1, 3)(of, a): (2, 2)(the, state): (2, 2)(by, their): (3, 1)(the, same): (2, 1)…
  16. 16. History ofObject-Oriented Programming
  17. 17. Computing in World War IICryptanalysis (Lorenz: Collossus at Bletchley Park, Enigma: Bombes at Bletchley, NCR in US)Ballistics Tables, calculations for Hydrogen bomb (ENIAC at U. Pennsylvania)Batch processing: submit a program and its data, wait your turn, get a resultBuilding a flight simulator required a different type of computing: interactive computing
  18. 18. Pre-History: MIT’s Project Whirlwind (1947-1960s)Jay Forrester
  19. 19. Whirlwind InnovationsMagnetic Core Memory(first version used vacuum tubes) IBM 704 (used by John McCarthy to create LISP) commercialized this
  20. 20. August 29, 1949: First Soviet Atomic Test
  21. 21. Short or Endless Golden Age of Nuclear Weapons? 60000 Tsar Bomba (50 Mt, largest ever = 10x all of WWII) 50000 40000 30000kilotons 20000 First H-Bomb (10Mt) B83 (1.2Mt), largest 10000 in currently active arsenal 0 1940 1950 1960 1970 1980 1990 2000 2010 2020 Hiroshima (12kt), Nagasaki (20kt)
  22. 22. Semi-Automatic GroundEnvironment (SAGE)MIT/IBM, 1950-1982Coordinate radar stations in real-time to track incoming bombersTotal cost: $55B (more than Manhattan Project)
  23. 23. First intercontinental ballistic missile First successful test: August 21, 1957R-7 Semyorka Sputnik: launched by R-7, October 4, 1957
  24. 24. Sketchpad Ivan Sutherland’s 1963 PhD thesis (supervised by Claude Shannon)Interactive drawing programLight pen
  25. 25. Components in Sketchpad
  26. 26. Objects in SketchpadIn the process of making the Sketchpad system operate, a few very generalfunctions were developed which make no reference at all to the specific typesof entities on which they operate. These general functions give the Sketchpadsystem the ability to operate on a wide range of problems. The motivation formaking the functions as general as possible came from the desire to get as muchresult as possible from the programming effort involved. For example, the generalfunction for expanding instances makes it possible for Sketchpad to handle anyfixed geometry subpicture. The rewards that come from implementing generalfunctions are so great that the author has become reluctant to write anyprograms for specific jobs. Each of the general functions implemented in theSketchpad system abstracts, in some sense, some common property of picturesindependent of the specific subject matter of the pictures themselves. Ivan Sutherland, Sketchpad: a Man-Machine Graphical Communication System, 1963
  27. 27. SimulaConsidered the first “object-oriented” programming languageLanguage designed for simulation by Kristen Nygaard and Ole-Johan Dahl (Norway, 1962)Had special syntax for defining classes that packages state and procedures together
  28. 28. Counter in Simulaclass counter; integer count; begin procedure reset(); count := 0; end; procedure next(); count := count + 1; end; integer procedure current(); current := count; end; end Does this have everything we need for “object-oriented programming”?
  29. 29. Object-Oriented ProgrammingObject-Oriented Programming is a state of mind where you program by thinking about objectsIt is difficult to reach that state of mind if your language doesn’t have mechanisms for packaging state and procedures (Python has class, Scheme has lambda expressions)Other things can help: dynamic dispatch, inheritance, automatic memory management, mixins, good donuts, etc.
  30. 30. ChargeMonday: continue OOP historyPS6 Due MondayNext week: PS7 Due next Monday: only one week! building a (mini) Scheme interpreter (in Python and Java) Reminder: Peter has office hours now! (going over to Rice) 31

×