Life and Work of Jim Gray | Turing100@Persistent

  • 1,139 views
Uploaded on

Dr. Anand Deshpande, Chairman, Managing Director & CEO, Persistent systems Ltd talks about Life and Work of Jim Gray ( 1998 Turing Award Recipient) during 6th Turing Session …

Dr. Anand Deshpande, Chairman, Managing Director & CEO, Persistent systems Ltd talks about Life and Work of Jim Gray ( 1998 Turing Award Recipient) during 6th Turing Session

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,139
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
5
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Jim Gray refined the notion of a Database Transaction.He explained that application initiated data manipulationactions can be classified as “unprotected”, “protected”,and “real” actions [Gray 1981b]. Unprotected actionsinvolve transient and internal state, such as temporaryfiles. Protected actions, on the other hand, are groupedinto transactions and are reflected in the state of thetransaction outcome. The outcome of a transaction mustbe to either commit the effects of its protected actions tothe system state, or to abort and remove the protectedactions’ effects from the system state. This means thatprotected actions must be undone on transaction failure orabort and their effects must be ensured in the case oftransaction commit. Real actions involve sensors,actuators, and messages outside the DBMS. While realactions cannot be “undone”, they can be compensated.For example, if the missile is fired, the compensationcould be “debit quantity on hand and send apologies”.In order to achieve durable transaction atomicity (all ornothing for protected actions) in the presence ofprocessor, memory, storage, communication, orenvironmental failures, multiple copies of the stored datamust be maintained and a record of the protected actionsequence is needed to complete or undo transactionsinterrupted by system failures. To achieve durabletransaction atomicity, the transition to the “committed”state must be accomplished by a single write to nonvolatilestorage. To these ends Jim Gray defined the WriteAhead Log (WAL) protocol [Gray 1978, Gray 1981a]while at IBM Research. The WAL protocol records theold and new states induced by protected actions separatelyfrom the actual state changes. The logged changes arewritten to stable storage before the actual changes arewritten back to stable storage (that’s the “Write Ahead”part). Transactions are committed by simply appendingand writing a ‘commit’ record to the recovery log. Loggedchanges are used to undo protected actions of abortedtransactions and of transactions in progress at the time ofa system failure. Log records are also used to redocommitted actions whose actual changes have not beenwritten back to stable storage at the time of a systemfailure. The WAL protocol allows changed data to bewritten to their stable storage home at any time after thelog records describing the changes have been written intothe stable log. This gives the Database Manager greatflexibility in managing the contents of its volatile databuffer pools.The recovery techniques developed by Jim Grayand the System R team have been instrumental to thedeployment of on-line transaction processing applications.With the ability to recover from equipment andenvironmental failures, without loss of committed,protected actions, along with atomic (all-or-nothing)transaction completion, on-line business criticalapplications become reliable enough to replace batch andpaper-based transaction processing. The impact of Dr.Gray’s recovery technologies for transaction reliabilitycannot be overstated – without adequate reliability anddurability for transactional applications, the transition toon-line transaction processing would not have beenpossible.

Transcript

  • 1. Life and Work of JimGrayJanuary 5, 2013 1
  • 2. 2
  • 3. JAMES ("JIM") NICHOLAS GRAYUnited States – 1998CITATIONFor fundamental contributions to database andtransaction processing research and technicalleadership in system implementation from researchprototypes to commercial products. The transactionis the fundamental abstraction underlying databasesystem concurrency and failure recovery. Gray’swork [defined] the key transaction properties:atomicity, consistency, isolation and durability, andhis locking and recovery work demonstrated how tobuild … systems that exhibit these properties. 3
  • 4. E. F. Codd invented theRelational Databases in1970 and created what is a100+ Billion Dollar/yearIndustry today.
  • 5. Codd’s Relational Model● Simple model● Data stored in relational tables● Data Independence – separation of data storage and data access● Declarative Queries● Algebra to mathematically reason about data objects – made query optimization possible● Ad-hoc queries through SQL.● Embedded in operational systems. 5
  • 6. ACID properties arefundamental toRelational Systems andnecessary for on-linetransaction processing (OLTP)systems Atomicity● Jim Gray defined ACID properties to guarantee Consistency database transactions are Isolation processed reliably. Durability 6
  • 7. From Transactions to Transaction Processing Systems - II Reality Abstraction DB Change Transaction Q u ery DB AnswerThe real state is represented by an abstraction, called the database, and thetransformation of the real state is mirrored by the execution of a program, called atransaction, that transforms the database. 7
  • 8. Gray definedData Manipulation Actions as• transient and • grouped into • involve sensors, internal state transactions and actuators etc. They reflected in the cannot be undone state of transaction they can be outcome compensated.Unprotected Protected Real 8
  • 9. Definitions● A transaction is a sequence of operations that form a single unit of work● A transaction is often initiated by an application program – begin a transaction START TRANSACTION – end a transaction COMMIT (if successful) or ROLLBACK (if errors)● Either the whole transaction must succeed or the effect of all operations has to be undone (rollback)● To achieve durable transaction atomicity, the transition to the ―committed‖ state must be accomplished by the single write to non-volatile storage. 9
  • 10. Structure of a Transaction Program BEGIN WORK () ROLL BACK WORK () WORK ROLL BACK WORK () COMMIT WORK () 10
  • 11. While at IBM San Jose ResearchLaboratoryOctober 1972 to December 1980● Jim Gray developed three key ideas related to transaction concurrency control: – The notion of transaction – Serializability; degrees of consistency; – Multi-granularity locking.● There are two main transaction issues – concurrent execution of multiple transactions – recovery after hardware failures and system crashes 11
  • 12. Write Ahead Log (WAL) protocol● The WAL protocol records the old and new states induced by protected actions separately from the actual state changes.● The logged changes are written to stable storage before the actual changes are written back to stable storage (that‘s the ―Write Ahead‖ part).● Transactions are committed by simply appending and writing a ‗commit‘ record to the recovery log. Logged changes are used to undo protected actions of aborted transactions and of transactions in progress at the time of a system failure. 12
  • 13. Write Ahead Log (WAL) protocol● Log records are also used to redo committed actions whose actual changes have not been written back to stable storage at the time of a system failure.● The WAL protocol allows changed data to be written to their stable storage home at any time after the log records describing the changes have been written into the stable log.● This gives the Database Manager great flexibility in managing the contents of its volatile data buffer pools. 13
  • 14. ACID Properties: First Definition● Atomicity: A transaction‘s changes to the state are atomic: either all happen or none happen. These changes include database changes, messages, and actions on transducers.● Consistency: A transaction is a correct transformation of the state. The actions taken as a group do not violate any of the integrity constraints associated with the state. This requires that the transaction be a correct program.● Isolation: Even though transactions execute concurrently, it appears to each transaction T, that others executed either before T or after T, but not both.● Durability: Once a transaction completes successfully (commits), its changes to the state survive failures.14
  • 15. [Gray 1993] Jim Gray and Andreas Reuter,Transaction Processing: Concepts andTechniques, Morgan Kaufmann, SanMateo, CA (1993). 15
  • 16. In 1985, Jim and a number of othersenior leaders in the field of transactionprocessing started the HPTS (HighPerformance Transaction Systems)Workshop [HPTS]. This is a biennialgathering of folks interested intransaction systems (and things relatedto scalable systems). It includes peoplefrom competing companies in industryand also from academia. Over the last22 years, it has evolved to include manydifferent topics as high-end computingmorphed from the mainframe to theInternet. 16
  • 17. The early years …● Born January 12, 1944● 1961 graduated from Westmoor High School in San Francisco.● 1966 graduated from the University of California at Berkeley with bachelor‘s degree in mathematics and engineering. 17
  • 18. James Nicholas Gray was born in SanFrancisco, California on 12 January1944.● In 1961 Gray graduated from Westmoor High School in San Francisco.● He graduated from the University of California at Berkeley bachelor‘s degree in mathematics and engineering in 1966.● After spending a year in New Jersey working at Bell Laboratories in Murray Hill and attending classes at the Courant Institute in New York City, he returned to Berkeley and enrolled in the newly-formed computer science department, earning a Ph.D. in 1969 for work on context-free grammars and formal language theory. 18
  • 19. 5-minute rulefor Memory vs. Disk Access (1987)When does it make economic sense tohold pages in memory versus doing IOevery time data from the page isaccessed? THE FIVE MINUTE RULE Pages referenced every five minutes should be memory resident. 19
  • 20. From Tandem Report 1987:Jim Gray and Gianfranco Putzolu● The argument goes as follows: A Tandem disc, and half a controller comfortably deliver 15 accesses per second and are priced at 15K$ for a small disc and 20K$ for a large disc (180Mb and 540Mb respectively).● So the price per access per second is about 1K$. The extra CPU and channel cost for supporting a disc are lK$/a/s. So one disc access per second costs about 2K$ on a Tandem system.● A megabyte of Tandem main memory costs 5K$, so a kilobyte costs 5$. 20
  • 21. ● If making a 1Kb record resident saves 1a/s, then it saves about 2K$ worth of disc accesses at a cost of 5$, a good deal. If it saves 0.1 a/s then it saves about 200$, still a good deal. Continuing this, the break even point is an access every 2000/5 - 400 seconds.● So, any 1KB record accessed more frequently than every 400 seconds should live in main memory. 400 seconds is "about" 5 minutes, hence the name: the Five Minute Rule. 21
  • 22. 5-minute rule● The five-minute rule is based on the tradeoff between the cost of RAM and the cost of disk accesses. 22
  • 23. 5-minute rule● The five-minute rule is based on the tradeoff between the cost of RAM and the cost of disk accesses. 23
  • 24. 1997 – Ten years later 24
  • 25. New Storage Metrics: Kaps, Maps, SCAN● Kaps: How many kilobyte objects served per second – The file server, transaction processing metric – This is the OLD metric.● Maps: How many megabyte objects served per sec – The Multi-Media metric● SCAN: How long to scan all the data – the data mining and utility metric● And – Kaps/$, Maps/$, TBscan/$ 25
  • 26. Disk Changes● Disks got cheaper: 20k$ -> 1K$ (or even 200$) – $/Kaps etc improved 100x (Moore‘s law!) (or even 500x) – One-time event (went from mainframe prices to PC prices)● Disk data got cooler (10x per decade): – 1990 disk ~ 1GB and 50Kaps and 5 minute scan – 2000 disk ~70GB and 120Kaps and 45 minute scan● So – 1990: 1 Kaps per 20 MB – 2000: 1 Kaps per 500 MB – disk scans take longer (10x per decade)● Backup/restore takes a long time (too long) 26
  • 27. Storage Ratios Changed● 10x better access time ● DRAM/disk media price● 10x more bandwidth ratio changed● 100x more capacity – 1970-1990 100:1 – 1990-1995 10:1● Data 25x cooler – 1995-1997 50:1 (1Kaps/20MB vs – today 1Kaps/500MB) ~ 0.03$/MB disk 100:1● 4,000x lower media price 3$/MB dram● 20x to 100x lower disk price● Scan takes 10x longer (3 min vs 45 min) 27
  • 28. The Five Minute Rule● Trade DRAM for Disk Accesses● Cost of an access (DriveCost / Access_per_second)● Cost of a DRAM page ( $/MB / pages_per_MB)● Break even has two terms:● Technology term and an Economic term● Grew page size to compensate for changing ratios.● Still at 5 minute for random, 1 minute sequentialFrom his presentations in 2000 28
  • 29. Data on DiskCan Move to RAM in 10 years Storage Price vs Time Megabytes per kilo-dollar 10,000. 1,000. 100. MB/k$100:1 10. 10 years 1. 0.1 1980 1990 2000 Ye ar 29
  • 30. Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs Size vs Speed Price vs Speed 1015 Nearline Cache 102 TapeTypical System (bytes) Offline Main 1012 Disc Tape Secondary 100 Online $/MB Secondary Online Tape Tape Disc 109 Main 10-2 Nearline Offline Tape Tape 106 10-4 Cache 103 10-6 10-9 10-6 10-3 10 0 10 3 10-9 10-6 10-3 10 0 10 3 Access Time (seconds) Access Time (seconds) 30
  • 31. 5-minute rule holds in 1997● In summary, the five-minute rule still seems to apply to randomly accessed pages, primarily because page sizes have grown from 1KB to 8KB to compensate for changing technology ratios. 31
  • 32. Storage Latency: How Far Away is the Data? Andromeda 9 10 Tape /Optical 2,000 Years Robot 106 Disk Pluto 2 Years Olympia 1.5 hr 100 Memory 10 On Board Cache This Hotel 10 min 2 On Chip Cache This Room 1 Registers My Head 1 min 32From Jim Gray‟s Rules of Thumb in Data Engineering Presentation
  • 33. What’s TeraByte? ● 1 Terabyte: – 1,000,000,000 business letters 150 miles of book shelf – 100,000,000 book pages 15 miles of book shelf – 50,000,000 FAX images 7 miles of book shelf – 10,000,000 TV pictures (mpeg) 10 days of video – 4,000 LandSat images 16 earth images (100m) – 100,000,000 web page 10 copies of the web HTML ● Library of Congress (in ASCII) is 25 TB – 1980: $200 million of disc 10,000 discs – $5 million of tape silo 10,000 tapes – 1997: 200 k$ of magnetic disc 48 discs – 30 k$ nearline tape 20 tapesJim Gray‘s presentations 1995 Terror Byte ! 33
  • 34. Yotta How Much Information Is there? Everything! Zetta ● Soon everything can be Recorded recorded and indexed All Books Exa ● Most data never be seen by MultiMedia humans Peta All LoC books ● Precious Resource: (words) Tera Human attention – Auto-Summarization .Movie – Auto-Search Giga is key technology. http://www.lesk.com/mlesk/ksg97 A Photo /ksg.html Mega24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli A Book 34 Kilo
  • 35. 2007: Twenty Years Later 35
  • 36. The 5-minute rule holds in 2007● The old five-minute rule for RAM and disk now applies to 64KB page sizes (334 seconds). – Five minutes had been the approximate break-even interval for 1KB in 198715and for 8KB in 1997.14● The five-minute break-even interval also applies to RAM and the expensive flash memory of 2007 for page sizes of 64KB and above (365 seconds and 339 seconds). – As the price premium for flash memory decreases, so does the break-even interval (146 seconds and 136 seconds). 36
  • 37. Flash memory falls betweentraditional RAM andpersistent mass storagebased on rotating disks interms of acquisition cost,access latency, transferbandwidth, spatial density,power consumption, andcooling costs. 37
  • 38. 20 years out:Summary and Conclusion● The 20-year-old five-minute rule for RAM and disks still holds, but for ever-larger disk pages.● It should be augmented by two new five-minute rules: – for small pages moving between RAM and flash memory and – for large pages moving between flash memory and traditional disks.● For small pages moving between RAM and disk, Gray and Putzolu were amazingly accurate in predicting a five-hour break-even point 20 years into the future. 38
  • 39. 39
  • 40. 40
  • 41. Data Cube 41
  • 42. Aggregates in SQL● The SQL standard [Melton, Simon] provides five SUM() aggregate functions: COUNT, SUM, MIN, MAX, AVG SELECT [DISTINCT] AVG(Temp) FROM Weather;● Aggregate functions return a single value. In addition, SQL allows aggregation over distinct values. Table attribute SUM() A● Using GROUP BY , SQL can create a table of aggregate A A A values indexed by a set of attributes. B B B B B SELECT Time, Altitude, AVG(Temp) B C C FROM Weather C C C C GROUP BY Time, Altitude; D D D 42
  • 43. Problems With This Design● Users Want Histograms● Users want sub-totals and totals sum – drill-down & roll-up reports F() G() H()● Users want CrossTabs● Conventional wisdom – These are not relational operators AIR M T W T F S S • – They are in many report writers and HOTEL FOOD query engines MISC • 43
  • 44. Other Variants – Illustra● init(&handle): – Allocates the handle and initializes the aggregate computation.● iter(&handle, value): – Aggregates the next value into the current aggregate.● value = final(&handle): – Computes and returns the resulting aggregate by using data saved in the handle. This invocation deallocates the handle. 44
  • 45. Agg reg at eDATA CUBE and Gro up B y Su mROLLUP (wit h t ot al) By Colo r RED WHIT E BLUESELECT Model, Year, Color Su m SUM(Sales) AS total, C ro ss Ta b By Colo r SUM(Sales) / total(ALL,ALL,ALL) RED Chevy FordFROM Sales WHIT E BLUEWHERE Model IN {‘Ford’, ‘Chevy’} By Make Th e Da ta C ube a nd Su m AND Year Between 1990 AND 1992 Th e Su b- Space Agg re ga te s CH FO RD 0 EV 1 9 9 91GROUP BY CUBE(Model, Year, Color); Y 1 9 92 19 3 199 By Year By Make By Make & Year RED WHIT E BLUE By Colo r & Year By Make & Col or Su m By Colo r 45
  • 46. 46
  • 47. 47
  • 48. A Dozen Information Technology Research Goals1. Scalability: Devise a software and hardware architecture that scales up by a factor of 106. That is, an application‘s storage and processing capacity can automatically grow by a factor of million, doing jobs faster (106 x speedup) or doing larger jobs in the same time (106 x scale-up), just by adding more resources.2. The Turing Test: Build a computer system that wins the imitation game at least 30% of the time.3. Speech to text: Hear as well as a native speaker.4. Text to speech: Speak as well as a native speaker.5. See as well as person: Recognize objects and motion. 48
  • 49. A Dozen Information Technology Research Goals6. Personal Memex: Record every thing a person sees and hears and quickly re retrieve any iteration on request.7. World Memex: Build a system that given a text corpus, can answer questions about and summarize the text as precisely and quickly as a human expert in that field. Do the same for music, images, art and cinema.8. Telepresence: Simulate being some other place retrospectively as an observer. (Teleobserver): hear and see as well as actually being there and as well as participant. Simulate being some other place as a participant (Telepresent): interacting with others and with the environment as though you are actually there. 49
  • 50. A Dozen Information Technology Research Goals9. Trouble-Free Systems: Built a system used by millions of people each day and yet administered and managed by a single part-time person.10. Secure System: Assure that the system of problem 9 services only authorized users, service cannot be denied by unauthorized users and information cannot be stolen (and prove it).11. Always Up: Assure that the system is unavailable for less than one second per hundred years – eight s of availability (and prove it). 50
  • 51. A Dozen Information Technology Research Goals12. Automatic Programmer: Devise a specification language or user interface that – Makes it easy for people to express designs (1,000x easier), – Computer can compile, and – Can describe all applications (is complete). The system should reason about application, asking questions about exception cases and incomplete specification. But is should not be onerous to use. 51
  • 52. Computer Industry Laws(Rules of thumb)● Metcalf‘s law● Moore‘s first law● Bell‘s computer classes (7 price tiers)● Bell‘s platform evolution● Bell‘s platform economics● Bill‘s law● Software economics● Grove‘s law● Moore‘s second law● Is info-demand infinite?● The death of Grosch‘s law 52
  • 53. Gordon Bell’s Seven Price Tiers 10$: wrist watch computers 100$: pocket/ palm computers 1,000$: portable computers 10,000$: personal computers (desktop) • 100,000$: departmental computers (closet) 1,000,000$: site computers (glass house) 10,000,000$: regional computers (glass castle) Super server: costs more than $100,000 “Mainframe”: costs more than $1 million Must be an array of processors, disks, tapes, comm ports 53
  • 54. Information at your fingertips.Bill Gates is known for his long-standingbelief that, as he once put it, ‖any piece ofinformation you want should be availableto you. -- Putting Information at YourFingertips.‖Gates championed it as early as 1989,and he was in a position to do somethingabout it. It remained his overriding goalfor the next two decades. 54
  • 55. The Vision: Global Data Federation ● Massive datasets live near their owners: – Near the instrument‘s software pipeline – Near the applications – Near data knowledge and curation ● Each Archive publishes a (web) service – Schema: documents the data – Methods on objects (queries) ● Scientists get ―personalized‖ extracts ● Uniform access to multiple Archives – A common global schema Federation 55
  • 56. Gray and Bellworked closely atDigital and atMicrosoft’s BayArea ResearchCenter since 1994● MyLifeBits● Terra Server 56
  • 57. Gordon Bell’s: MyLifeBits● MylifeBits is a lifetime store of everything. It is the fulfillment of Vannevar Bush‘s 1945 Memex vision including full-text search, text and audio annotations, and hyperlinks.● The experiment: Gordon Bell has captured a lifetimes worth of articles, books, cards, CDs, letters, memos, papers, photos, pictures, presentations, home movies, videotaped lectures, and voice recordings and stored them digitally. He is now paperless, and is beginning to capture phone calls, IM transcripts, television, and radio. 57
  • 58. 58
  • 59. TerraServerIn late spring of 1996, Paul Flessner, the General Manager of theSQL Server team asked our lab to build a database applicationthat would test and demonstrate the scalability of the next releaseof SQL Server code named ―Sphinx‖.One of Jim‘s greatest abilities was to clearly define and articulatethe problem. The SQL team gave us two goals:1. Test SQL‘s ability to scale up to support a database of one terabyte or larger.2. An internet application where SQL marketing could demonstrate Windows and SQL Server‘s scalability. 59
  • 60. About moving research to production―ideas don’t transfer, people transfer…” 60
  • 61. TerraServer Requirements● BIG —1 TB of data including catalog, temporary space, etc.● PUBLIC — available on the world wide web● INTERESTING — to a wide audience● ACCESSIBLE — using standard browsers (IE, Netscape)● REAL — a LOB application (users can buy imagery)● FREE —cannot require NDA or money to a user to access● FAST — usable on low-speed (56kbps) and high speeds(T-1+)● EASY — we do not want a large group to develop, deploy, or maintain the application● CHEAP – An unwritten requirement (1) because TerraServer was only a prototype, test, and free demonstration; and (2) Jim Gray was a very frugal person! 61
  • 62. SOVINFORMSPUTNIK (the Russian Space Agency) and Aerial Images United States Geological An Interesting Internet Survey (USGS) Serverhttp://msdn.microsoft.com/en-us/library/aa226316(v=sql.70).aspx 62
  • 63. Thesis: Scaleable Servers● Scaleable Servers – Commodity hardware allows new applications – New applications need huge servers – Clients and servers are built of the same ―stuff‖ • Commodity software and • Commodity hardware● Servers should be able to – Scale up (grow node by adding CPUs, disks, networks) – Scale out (grow by adding nodes) – Scale down (can start small)● Key software technologies – Objects, Transactions, Clusters, Parallelism 63
  • 64. Thesis: Scaleable Servers● Scaleable Servers – Commodity hardware allows new applications – New applications need huge servers – Clients and servers are built of the same ―stuff‖ • Commodity software and • Commodity hardware● Servers should be able to – Scale up (grow node by adding CPUs, disks, networks) – Scale out (grow by adding nodes) – Scale down (can start small)● Key software technologies – Objects, Transactions, Clusters, Parallelism 64
  • 65. Scaleable Servers BOTH SMP And Cluster Grow up with SMP; 4xP6SMP super is now standardserver Grow out with cluster Cluster has inexpensive partsDepartmentalserver Cluster of PCsPersonalsystem 65
  • 66. SMPs Have Advantages● Single system image easier to manage, easier to program threads in shared memory, SMP super disk, Net server● 4x SMP is commodity● Software capable of 16x Departmental● Problems: server – >4 not commodity – Scale-down problem (starter systems expensive) Personal● There is a BIGGEST one system 66
  • 67. Grow UP and OUT 1 Terabyte DB Cluster: •a collection of nodes •as easy to program and manage asSMP super a single nodeserverDepartmental 1 billionserver transactions per dayPersonalsystem 67
  • 68. Clusters Have Advantages● Clients and servers made from the same stuff● Inexpensive: – Built with commodity components● Fault tolerance: – Spare modules mask failures● Modular growth – Grow by adding small modules● Unlimited growth: no biggest one 68
  • 69. Windows NT Clusters● Microsoft & 60 vendors defining NT clusters – Almost all big hardware and software vendors involved● No special hardware needed - but it may help● Fault-tolerant first, scaleable second – Microsoft, Oracle, SAP giving demos today● Enables – Commodity fault-tolerance – Commodity parallelism (data mining, virtual reality…) – Also great for workgroups! 69
  • 70. ParallelismThe OTHER aspect of clusters● Clusters of machines allow two kinds of parallelism – Many little jobs: online transaction processing • TPC-A, B, C… – A few big jobs: data search and analysis • TPC-D, DSS, OLAP● Both give automatic parallelism 70
  • 71. Kinds of Parallel Execution Any Any Sequential Sequential Pipeline Program Program Partition Any Sequential Any Sequential Program Program outputs split N ways inputs merge M ways 71Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
  • 72. Data Rivers Split + Merge Streams N X M Data Streams M Consumers N producers River Producers add records to the river, Consumers consume records from the river Purely sequential programming. River does flow control and buffering does partition and merge of data records River = Split/Merge in Gamma = Exchange operator in Volcano. 72Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
  • 73. Partitioned Execution Spreads computation and IO among processors Count Count Count Count Count Count A Table A...E F...J K...N O...S T...Z Partitioned data gives NATURAL parallelism 73Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
  • 74. N x M way Parallelism Merge Merge Merge Sort Sort Sort Sort Sort Join Join Join Join Join A...E F...J K...N O...S T...Z N inputs, M outputs, no bottlenecks. Partitioned Data Partitioned and Pipelined Data Flows 74Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
  • 75. Year 2000 The Year 2000 commodity PC 4B Machine 1 Bips Processor ●Billion Instructions/Sec ● .1 Billion Bytes RAM .1 B byte RAM ●Billion Bits/s Net 10 GB byte Disk ● 10 B Bytes Disk ●Billion Pixel display – 3000 x 3000 x 24 ● 1,000 $ 75Jim Gray & Gordon Bell: 1997 presentations
  • 76. Super Server: 4T Machine ● Array of 1,000 4B machines – 1 b ips processors – 1 B B DRAM CPU – 10 B B disks 50 GB Disc – 1 Bbps comm lines 5 GB RAM – 1 TB tape robot ● A few megabucks ● Challenge: Cyber Brick – Manageability a 4B machine – Programmability – Security Future servers are CLUSTERS – Availability of processors, discs – Scaleability – Affordability Distributed database techniques make clusters work ● As easy as a single system 76Jim Gray & Gordon Bell: 1997 presentations
  • 77. Jim Gray’s quest for real problems andreal data … led to a collaboration withAstronomers. Why Astronomy Data? ● It has no commercial value – No privacy concerns – Can freely share results with others – Great for experimenting with algorithms ● It is real and well documented – High-dimensional data (with confidence intervals) – Spatial data – Temporal data ● Many different instruments from many different places and many different times ● Federation is a goal Alex Szalay ● There is a lot of it (petabytes) 77
  • 78. Availability and ability to handle very large volumes of storage and complex computingis redefining how we do Science 78
  • 79. Galileo and his telescopeFirst Paradigm:For thousands of years, Science was aboutempirically describing natural phenomenon 79
  • 80. Second Paradigm:Theoretical Science using models andgeneralization Newton Kepler Maxwell 80
  • 81. Third Paradigm:Computational Science: SimulatingComplex Phenomenon Over the last 25 years Scientists have used computer simulation to validate theories. A hurricane computer simulation. 81
  • 82. Fourth Paradigm:Data Intensive ScienceThe scientific method was traditionally driven by hypothesis.First scientists predict a good response, then collectexperimental data to validate the data against its predictions.However, in the new data-driven approach researchers startwith collecting data and analyze data later. 82
  • 83. Scientists are collecting dataHow to codify data and extract insights andknowledge? Experiments and Instruments Simulations Question Literature Answer Other Archives 83
  • 84. Astronomy● Help build world-wide telescope – All astronomy data and literature online and cross indexed – Tools to analyze the data● Built SkyServer.SDSS.org● Built Analysis system – MyDB – CasJobs (batch job)● Results: – It works and is used every day – Spatial extensions in SQL 2005 – A good example of Data Grid – Good examples of Web Services.
  • 85. World Wide TelescopeVirtual Observatoryhttp://www.us-vo.org/ http://www.ivoa.net/● Premise: Most data is (or could be online)● So, the Internet is the world‘s best telescope: – It has data on every part of the sky – In every measured spectral band: optical, x-ray, radio.. – As deep as the best instruments (2 years ago). – It is up when you are up. The ―seeing‖ is always great (no working at night, no clouds no moons no..). – It‘s a smart telescope: links objects and data to literature on them.
  • 86. SkyServer.SDSS.org● A modern archive – Access to Sloan Digital Sky Survey Spectroscopic and Optical surveys – Raw Pixel data lives in file servers – Catalog data (derived objects) lives in Database – Online query to any and all● Also used for education – 150 hours of online Astronomy – Implicitly teaches data analysis● Interesting things – Spatial data search – Client query interface via Java Applet – Query from Emacs, Python, …. – Cloned by other surveys (a template design) – Web services are core of it.
  • 87. SkyServer SkyServer.SDSS.org● Like the TerraServer, but looking the other way: a picture of ¼ of the universe● Sloan Digital Sky Survey Data: Pixels + Data Mining● About 400 attributes per ―object‖● Spectrograms for 1% of objects
  • 88. SkyQuery 88
  • 89. SkyQuery (http://skyquery.net/)● Distributed Query tool using a set of web services● Many astronomy archives from Pasadena, Chicago, Baltimore, Cambridge (England)● Has grown from 4 to 15 archives, now becoming international standard WebService Poster Child●SELECT o.objId, o.r, o.type, t.objId● Allows queries like: FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2
  • 90. SkyServer/SkyQuery Evolution MyDB and Batch JobsProblem: need multi-step data analysis (not just single query).Solution: Allow personal databases on portalProblem: some queries are monstersSolution: ―Batch schedule‖ on portal. Deposits answer in personal database.
  • 91. Ecosystem Sensor Net LifeUnderYourFeet.Org● Small sensor net monitoring soil● Sensors feed to a database● Helping build system to collect & organize data.● Working on data analysis tools● Prototype for other LIMS Laboratory Information Management Systems
  • 92. RNA Structural Genomics● Goal: Predict secondary and tertiary structure from sequence. Deduce tree of life.● Technique: Analyze sequence variations sharing a common structure across tree of life● Representing structurally aligned sequences is a key challenge● Creating a database-driven alignment workbench accessing public and private sequence data
  • 93. VHA Health Informatics● VHA: largest standardized electronic medical records system in US.● Design, populate and tune a ~20 TB Data Warehouse and Analytics environment● Evaluate population health and treatment outcomes,● Support epidemiological studies – 7 million enrollees – 5 million patients – Example Milestones: • 1 Billionth Vital Sign loaded in April „06 • 30-minutes to population-wide obesity analysis (next slide) • Discovered seasonality in blood pressure -- NEJM fall „06
  • 94. HDR Vitals Based Body Mass Index Calculation on VHA FY04 Population Source: VHA Corporate Data Warehouse V H A P a tie n ts in B M I C a te g o rie s (B a s e d u p o n v ita ls fro m F Y 0 4 )W t/H t 5 ft 0 in 5 ft 1 in 5 ft 2 in 5 ft 3 in 5 ft 4 in 5 ft 5 in 5 ft 6 in 5 ft 7 in 5 ft 8 in 5 ft 9 in 5 ft 1 0 in 5 ft 1 1 in 6 ft 0 in 6 ft 1 in 6 ft 2 in 6 ft 3 in 6 ft 4 in 6 ft 5 in L eg en d100 230 211 334 276 316 364 346 300 244 172 114 73 58 16 11 3 1 1 B M I < 1 8 U n d e rw e ig h t105 339 364 518 532 558 561 584 515 436 284 226 144 102 25 13 4 4 1 B M I 1 8 -2 4 .9 H e a lth y W e ig h t110 488 489 836 815 955 972 1 ,0 3 1 899 680 521 395 256 161 70 23 10 6 4 B M I 2 5 -2 9 .9 O ve rw e ig h t115 526 614 1 ,0 1 8 1 ,0 9 8 1 ,3 2 6 1 ,3 2 5 1 ,6 0 7 1 ,4 2 6 1 ,1 7 5 903 598 451 264 84 59 17 6 4 B M I 3 0 + O b ese120 644 714 1 ,4 1 9 1 ,5 8 3 1 ,9 6 4 2 ,1 5 3 2 ,6 1 2 2 ,3 7 4 1 ,9 3 3 1 ,4 5 0 1 ,0 8 5 690 501 153 95 38 13 9125 672 855 1 ,6 8 2 1 ,9 3 3 2 ,6 2 8 3 ,0 0 5 3 ,5 2 1 3 ,4 0 5 2 ,9 2 9 2 ,1 9 7 1 ,5 3 8 1 ,1 4 4 756 253 114 46 32 8130 753 944 1 ,9 8 4 2 ,3 9 2 3 ,4 6 2 3 ,9 6 8 5 ,0 3 9 4 ,8 2 7 4 ,2 8 5 3 ,2 2 3 2 ,3 7 8 1 ,7 6 5 1 ,1 8 2 429 214 81 41 12135 753 1 ,0 6 2 2 ,1 7 3 2 ,8 5 2 4 ,1 0 5 4 ,9 1 2 6 ,5 3 5 6 ,5 3 5 5 ,7 9 7 4 ,5 0 0 3 ,3 9 3 2 ,4 6 7 1 ,6 6 8 596 309 108 70 15140 754 1 ,0 7 3 2 ,3 0 0 3 ,1 7 7 4 ,9 3 7 6 ,2 8 6 8 ,7 6 9 8 ,7 5 0 7 ,9 3 9 6 ,3 0 3 4 ,8 3 7 3 ,4 9 3 2 ,5 3 4 977 513 144 106 22 Total Patients 23,876 (0.7%)145 748 1 ,0 5 3 2 ,2 5 4 3 ,3 8 9 5 ,4 1 2 7 ,3 3 4 1 0 ,4 8 5 1 1 ,0 0 4 1 0 ,5 7 6 8 ,0 8 4 6 ,5 1 1 4 ,6 8 6 3 ,3 4 4 1 ,2 0 7 680 221 140 41150 730 1 ,0 7 7 2 ,3 6 1 3 ,5 9 6 6 ,1 5 2 8 ,6 6 5 1 2 ,7 7 2 1 4 ,3 3 5 1 3 ,8 6 6 1 1 ,2 5 5 9 ,2 5 0 6 ,5 4 5 4 ,7 9 6 1 ,7 9 2 979 350 162 48155 683 923 2 ,1 7 8 3 ,3 9 1 6 ,0 3 1 8 ,8 9 1 1 4 ,1 8 1 1 5 ,8 9 9 1 6 ,5 9 4 1 3 ,5 1 7 1 1 ,4 8 9 8 ,0 5 6 5 ,7 4 1 2 ,1 5 5 1 ,2 0 3 472 249 70160 671 872 2 ,1 0 6 3 ,5 3 2 6 ,1 8 4 9 ,5 8 0 1 5 ,4 9 3 1 8 ,8 6 9 1 9 ,9 3 9 1 7 ,0 4 6 1 4 ,6 5 0 1 0 ,3 6 6 7 ,7 0 8 2 ,8 3 1 1 ,6 1 8 615 341 100165 627 772 1 ,8 9 4 3 ,0 7 4 5 ,7 7 3 9 ,5 4 9 1 6 ,3 3 2 2 0 ,0 8 0 2 2 ,5 0 7 1 9 ,6 9 2 1 7 ,7 2 9 1 2 ,5 8 8 9 ,5 5 8 3 ,5 4 8 2 ,0 3 2 716 399 117170 596 750 1 ,7 1 6 2 ,9 0 0 5 ,4 2 8 9 ,0 8 0 1 6 ,6 3 3 2 1 ,5 5 0 2 5 ,0 5 1 2 2 ,5 6 8 2 1 ,1 9 8 1 5 ,5 5 2 1 2 ,0 9 3 4 ,5 4 8 2 ,6 2 6 944 489 124175 493 674 1 ,5 2 1 2 ,5 5 1 4 ,8 1 6 8 ,4 1 7 1 5 ,9 0 0 2 1 ,4 2 0 2 6 ,2 6 2 2 4 ,2 7 7 2 3 ,7 5 6 1 8 ,1 9 4 1 3 ,8 1 7 5 ,3 6 1 3 ,1 7 8 1 ,1 5 2 586 144180 486 599 1 ,4 1 1 2 ,3 2 3 4 ,5 8 4 7 ,8 5 5 1 5 ,4 8 2 2 0 ,8 7 3 2 6 ,9 2 2 2 6 ,0 6 7 2 6 ,3 1 3 2 0 ,3 5 8 1 6 ,4 5 9 6 ,4 5 1 3 ,8 4 8 1 ,4 4 1 737 207185 420 546 1 ,1 9 5 1 ,9 8 5 3 ,9 0 5 6 ,9 1 8 1 3 ,4 0 6 1 9 ,3 6 2 2 5 ,8 1 8 2 5 ,6 2 0 2 7 ,0 3 7 2 1 ,7 9 9 1 8 ,1 7 2 7 ,2 0 6 4 ,4 5 8 1 ,5 4 8 867 247190 424 495 1 ,0 7 3 1 ,7 2 9 3 ,3 8 3 5 ,9 0 9 1 1 ,9 1 8 1 7 ,6 4 0 2 4 ,2 7 7 2 5 ,2 6 3 2 7 ,3 9 8 2 2 ,6 9 7 1 9 ,9 7 7 8 ,3 4 4 4 ,9 3 7 1 ,8 5 8 963 287195 341 463 913 1 ,4 7 4 2 ,8 0 3 5 ,2 0 7 1 0 ,5 8 4 1 5 ,7 2 7 2 2 ,1 3 7 2 3 ,8 6 0 2 6 ,3 7 3 2 2 ,5 1 3 2 0 ,1 6 3 8 ,7 5 4 5 ,6 8 3 2 ,1 7 8 1 ,1 2 0 309200 315 384 763 1 ,3 3 8 2 ,6 0 2 4 ,5 5 1 9 ,4 1 3 1 4 ,1 4 9 2 0 ,6 0 8 2 2 ,5 4 1 2 5 ,4 5 2 2 3 ,3 5 8 2 1 ,5 4 8 9 ,2 8 4 6 ,2 2 1 2 ,2 9 4 1 ,2 9 5 372205 265 338 633 1 ,0 2 6 1 ,9 9 3 3 ,7 3 6 7 ,7 6 5 1 1 ,9 4 0 1 7 ,5 0 1 1 9 ,9 4 4 2 3 ,0 6 5 2 1 ,0 9 4 2 0 ,3 5 4 9 ,2 7 0 6 ,3 5 0 2 ,5 9 7 1 ,3 2 2 376210 275 284 543 853 1 ,7 9 4 3 ,1 4 8 6 ,8 0 4 1 0 ,5 4 0 1 5 ,6 4 7 1 8 ,1 2 9 2 1 ,8 6 2 2 0 ,5 4 0 2 0 ,2 7 1 9 ,5 6 6 6 ,8 1 6 2 ,7 8 6 1 ,5 0 9 418215 205 244 501 746 1 ,3 8 9 2 ,6 4 5 5 ,7 4 7 8 ,7 1 2 1 3 ,0 6 4 1 5 ,5 6 0 1 9 ,0 8 9 1 8 ,1 9 1 1 9 ,0 6 3 9 ,0 1 9 6 ,6 7 5 2 ,7 9 8 1 ,5 0 9 454220 168 208 415 652 1 ,2 3 1 2 ,3 2 6 4 ,9 5 0 7 ,7 5 1 1 1 ,6 4 5 1 3 ,9 0 0 1 7 ,5 7 7 1 7 ,2 3 9 1 7 ,5 8 3 8 ,8 9 6 6 ,8 1 8 2 ,9 4 8 1 ,6 3 5 484 701,089 (21.6%)225 156 160 325 522 968 1 ,8 7 3 4 ,0 1 5 6 ,3 4 0 9 ,7 9 4 1 1 ,8 9 0 1 4 ,8 9 8 1 5 ,0 9 7 1 5 ,7 4 1 8 ,3 3 2 6 ,4 4 1 2 ,9 1 5 1 ,6 4 7 452230 141 160 259 486 880 1 ,6 5 3 3 ,3 3 4 5 ,4 1 0 8 ,6 5 7 1 0 ,5 0 0 1 3 ,5 3 2 1 3 ,4 8 8 1 4 ,8 1 5 7 ,9 0 1 6 ,2 5 8 2 ,8 5 9 1 ,7 0 1 496235 115 119 244 373 738 1 ,2 5 1 2 ,7 9 5 4 ,5 7 0 7 ,1 9 2 8 ,7 8 4 1 1 ,4 8 9 1 1 ,8 5 7 1 2 ,7 9 6 7 ,1 1 3 5 ,5 4 4 2 ,7 4 4 1 ,6 1 7 465240 72 116 214 313 562 1 ,0 9 9 2 ,4 2 2 3 ,8 6 1 6 ,0 4 4 7 ,6 5 2 9 ,9 8 2 1 0 ,6 9 2 1 1 ,8 2 5 6 ,4 9 6 5 ,3 9 2 2 ,6 0 6 1 ,5 8 1 449245 71 76 169 253 509 888 1 ,8 5 8 3 ,1 6 7 5 ,0 7 6 6 ,4 4 6 8 ,3 1 2 8 ,6 4 7 9 ,9 1 0 5 ,6 3 8 4 ,7 4 2 2 ,2 6 3 1 ,4 7 9 469250 70 55 152 226 452 753 1 ,6 4 7 2 ,8 2 6 4 ,5 0 5 5 ,5 0 9 7 ,5 6 9 8 ,0 6 4 8 ,9 0 0 5 ,1 8 3 4 ,3 1 9 2 ,1 7 7 1 ,4 5 1 469255 59 61 128 174 316 599 1 ,2 8 9 2 ,1 3 0 3 ,4 6 8 4 ,5 4 0 5 ,9 5 7 6 ,4 5 1 7 ,4 3 8 4 ,3 2 0 3 ,7 4 1 1 ,9 0 3 1 ,2 7 1 443260 50 64 117 167 281 493 1 ,1 0 7 1 ,9 2 9 2 ,9 6 3 3 ,9 4 7 5 ,1 9 0 5 ,7 9 7 6 ,7 2 5 3 ,9 0 0 3 ,4 2 9 1 ,8 2 8 1 ,2 1 8 481265270 37 47 34 42 88 67 122 119 234 203 454 367 894 800 1 ,4 4 9 1 ,2 9 1 2 ,4 5 7 2 ,1 1 0 3 ,1 5 2 2 ,7 4 0 4 ,3 7 4 3 ,8 7 8 4 ,8 1 8 4 ,1 3 3 5 ,7 2 9 5 ,0 7 5 3 ,3 5 0 2 ,9 3 4 2 ,9 8 4 2 ,6 8 5 1 ,5 3 9 1 ,4 6 8 1 ,0 2 8 918 406 403 1,177,093 (36.2%)275 22 34 44 85 184 291 662 1 ,0 6 4 1 ,7 6 7 2 ,2 3 5 3 ,1 1 3 3 ,4 1 2 4 ,2 6 7 2 ,5 9 8 2 ,3 6 2 1 ,2 4 7 837 334280 21 20 51 69 139 286 548 903 1 ,5 1 3 1 ,9 5 5 2 ,7 7 0 3 ,1 2 6 3 ,6 0 4 2 ,2 7 3 2 ,0 2 0 1 ,1 5 2 763 300285 12 12 36 68 118 201 451 720 1 ,3 1 8 1 ,6 1 3 2 ,2 0 8 2 ,3 9 4 3 ,1 3 2 1 ,9 2 4 1 ,7 8 0 994 677 241290 16 14 47 38 92 182 387 667 1 ,0 5 0 1 ,3 0 1 1 ,9 0 4 2 ,1 5 0 2 ,6 5 5 1 ,7 4 9 1 ,5 2 9 881 688 252295 9 12 22 53 92 127 341 493 838 1 ,1 6 2 1 ,5 7 7 1 ,8 2 3 2 ,3 3 8 1 ,4 4 5 1 ,3 3 3 813 533 202300 12 10 30 43 59 117 309 434 764 988 1 ,4 2 8 1 ,5 8 8 1 ,9 8 9 1 ,2 5 5 1 ,2 1 2 709 479 205 DRAFT
  • 95. Jim Gray’s work on Fourth Paradigmand eScience has had a profoundimpact on the scientific community.This work continues … 95
  • 96. Jim Gray eScience AwardEach year, Microsoft Research presents the Jim Gray eScienceAward to a researcher who has made an outstandingcontribution to the field of data-intensive computing. Theaward recognizes innovators whose work truly makes scienceeasier for scientists. 96
  • 97. 97
  • 98. Jim Gray’s Legacy● The Prolific Writer – Jim Gray‘s two rules for authorship: • The person who types puts their name first, and • It‘s easier to add a name to the list of authors Ideas than deal with someone‘s hurt feelings.● The Masterful Presenter● The Sense of Community● The Patient Listener Community People 98
  • 99. Jim’s Life was aText Book on Mentoring● Making time● Simply Listening ● Promoting the Young● Inspiring Self-Confidence ● Sharing Knowledge Selflessly● Lighting the Way ● Displaying Professional● Nurturing and Pushing Integrity● Following the Muse ● Advocating for the Field● Connecting Good People and ● Keeping things in Perspective Good Ideas Without ● Being a friend Boundaries 99
  • 100. 100
  • 101. Lost at Sea …. January 28, 2007 101
  • 102. The Search for Jim Gray 102
  • 103. The University ofCalifornia, Berkeley andGrays family hosted atribute to him on May31, 2008.http://www.youtube.com/user/UCBerkeleyEvents/videos?query=jim+gray 103
  • 104. 104
  • 105. Good references● Microsoft Faculty Summit 2011 – http://research.microsoft.com/en-us/events/fs2011/ – Tony Hey‘s presentations at the event – http://research.microsoft.com/en- us/events/fs2011/welcome_introduction_hey_faculitysummit_071811.pdf● The Fourth Paradigm book – http://research.microsoft.com/en- us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf● Jim Gray‘s work – http://research.microsoft.com/en-us/um/people/gray/● Alex Szalay‘s work on Large Databases and Science – http://www.sdss.jhu.edu/~szalay/servers.html 105