This document provides an overview of classical distributed computing studies and discusses Gene Amdahl and his contributions. It summarizes Amdahl's Law which states that improvements in parallelizable portions of a program only yield speedups proportional to the fraction of the program that is parallelized. The document also mentions Rear Admiral Grace Hopper's work and Karen Spärck Jones' invention of inverse document frequency. It concludes with a discussion of SparkSQL and strategies for optimizing SQL queries.
11. 11
When he filed a patent on Floating Point he found out that
von Neumann had already done so.
http://pages.cs.wisc.edu/~bezenek/Stuff/amdahl_thesis
.pdf
14. 14
Worked on STRETCH
the first transistorized IBM computer
via https://en.wikipedia.org/wiki/IBM_7030_Stretch
15. 15
photo CC by https://www.flickr.com/photos/jurvetson/
Then founded in partnership with Fujitsu
Air cooled Amdahl 470
The first IBM clone of the IBM S/370!
16. 16
Memo while still at IBM:
Validity of the single processor approach to achieving large
scale computing capabilities
Creates what is known as Amdahl’s Law
17. 17
No equation in the memo, which has led to it
being written many different ways.
But it’s easiest to understand graphically.
19. 19
If your familiar with the Critical Path Method from
business or operations research
or if you’ve ever worked in a restaurant
or on an assembly line
Amdahl’s law should be common sense
29. 29
Karen Spärck Jones FBA
(1935-2007)
Invented Inverse Document
Frequency
http://nlp.cs.swarthmore.edu/~richardw/papers/sparckjones1972-statistica
“The specificity of a term can be
quantified as an inverse function of
the number of documents in which it
occurs.”
35. 35
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDa
FROM Orders
JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
But what if Customers is on your local HDFS and Orders is at
your on a data center at your warehouse?
36. 36
Computerized query planning is the future, but for the time
being you the user are going to have to recognize your
latency issues.
41. 41
We should forget about small efficiencies, say about
97% of the time: premature optimization is the root of
all evil.
Yet we should not pass up our opportunities in that
critical 3%.
A good programmer will not be lulled into
complacency by such reasoning, he will be wise to
look carefully at the critical code; but only after that
code has been identified.
Donald Knuth
ACM Computing Surveys, Vol 6, No. 4, Dec. 1974
Structured Programming with go to Statements