Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey


Published on

I’ve championed or developed four distinct disruptive technologies in database management. I started working on databases for the ARPAnet - the precursor of the Internet which had 47 nodes and was the largest network on earth. I advocated relational technology when it was considered an academic curiosity and introduced a new concurrency control technology that made consistency practical. More recently I created a radically new architecture for distributed ACID SQL databases. Now, my project is a critical re-evaluation of where we are, how we got here, and where we should be going. It’s going to be a wild ride.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim Starkey

  1. 1. My Life as a Disruptor Jim Starkey July 9, 2016
  2. 2. My Personal Disruptions • Getting a relational DBMS into DEC – Or anywhere… • Inventing Multi-Version Concurrent Control – Or the peaceful co-existence of interactive users • Breaking the single-node barrier with NuoDB – Newton or Einstein: Pick one • Shedding the relation model with Amorphous – Are tables really necessary?
  3. 3. My Resume in Brief • 1973, Computer Corporation of America, ARPAnet Datacomputer Project • 1975, Digital Equipment Corporation, DBMS-11, Datatrieve, Rdb/ELN • 1984, Founded Interbase Software • 1991, Founded Harbor Software (oh, well) • 1997, Founded Netfrastructure • 2006, Founded NuoDB
  4. 4. My Career, in Short • Started where the network hit the disk • Still there and in a deep rut • And apparently can’t hold a job.
  5. 5. Me, at a Younger Age
  6. 6. Database Systems Role • Database systems do three things: 1. Store and retrieve data (quickly!) 2. Manage concurrent access to data 3. Manage relationships among data records • The rest is just detail and art
  7. 7. Data Models: Hierarchical Stuff More Even More Stuff Stuff
  8. 8. Data Models: CODASYL
  9. 9. Data Models: Relational Jim Jim Shearwater Shearwater
  10. 10. Discovering Relational • Working on CCA’s Datacomputer project • Read Codd’s early papers on relational • Convinced the idea was right • Went off to sell the world on relational!
  11. 11. Selling Relational: CCA • Company sold formless Model 204 • Research project for hierarchical Datacomputer • Relational: Too academic
  12. 12. Selling Relational: DEC • Relational, oh maybe, sign up and we’ll see • Hired, DEC committed to CODASYL – Classical “bait and switch” • Did DBMS-11, Datatrieve (almost relational) • Relational: Oh, not commercial
  13. 13. Selling Relational: DEC • Created a prototype relational system (JRD) • Matched VAX DBMS performance on first try – Shock waves went through the company! – Relational can perform! – The world wants relational! • So they put a relational front end on the CODASYL VAX DBMS • But I shipped JRD anyway as a prototype database machine
  14. 14. Multi-Version Concurrency Control • Traditional technology is two-phase locking – Visit a record, get a read lock – Update a record, get a write lock – If you can’t get a lock, wait until you can (or deadlock) • But: – Readers block writers – Writers block readers – Mixed interaction and production virtually impossible – So everybody gives up on consistency and uses read committed
  15. 15. Multi-Version Concurrency Control • Record updates create new record version linked to older version • Each record version tagged with transaction id • A transaction takes a snapshot of transaction states when starting • Transaction sees only versions that existed when it started (and it’s own updates) • Non-blocking except for actual update conflicts
  16. 16. Multi-Version Concurrency Control • MVCC is now implemented in every commercial RDBMS except DB2 • Every implementation is different • Consistent without serializability • Still ignored by the academics
  17. 17. NuoDB: The Next Generation • Problem: Processors aren’t getting faster – More cores & more and faster memory – Also more contention – To increase DBMS performance, something needs to give • Scale-out is the only solution
  18. 18. NuoDB: The Next Generation • Shared nothing distributed architecture • Scalable: Supports 100+ nodes, if needed • Elastic: Nodes can be added or dropped • Designed for 100% availability – No single point of failure – Arbitrary (and controllable) redundancy – Rolling upgrades while pumping transactions • Peer to peer distributed object replication • SQL compliant with ACID transactions
  19. 19. NuoDB: Atom • An atom is a distributed object • Atoms can be serialized to network or disk • Each atom instance knows of other instances • Data replication is at the atom level – Replication strictly atom to atom – Each atom broadcast local changes • Replication is below the SQL layer
  20. 20. NuoDB: The Next Generation • Two node types: – Transaction nodes: execute SQL – Storage managers: serialize atoms to persistent storage • Concurrency control: Distributed MVCC • Two significant limitations: – Each transaction executes on a single node – Data moves to the transaction
  21. 21. What is Amorphous? • No commercial DBMS has ever implemented Codd’s relational model – Relation algebra is based on set theory – Sets can’t have duplicate members – Implicit duplicate elimination is too error prone to even consider • Are tables really important? • The Amorphous Data Model: No data model at all (every data record is self-describing)
  22. 22. What is Amorphous? • Start with relational database • Drop the concept of table leaving only records • Add an optional mutable record type (mostly syntactic sugar) • Every record is self-describing (no schema) • Treat all character strings as searchable text • Numbers are just numbers, really! • And dates, URLs, email addresses, etc.
  23. 23. Tables vs. Record Types • The following are semantically equivalent: – Fetch * from EMPLOYEES WITH … – Fetch * with RECORD_TYPE EQ “EMPLOYEES” AND … • And these: – INSERT INTO EMPLOYEES (NAME= “Jim”) – INSERT (RECORD_TYPE = “EMPLOYEE, NAME=“Jim”) • But record type is optional and can be changed
  24. 24. What is Amorphous? • An API that fits on a single page of paper • Replace SQL with simpler, more powerful block structured language • Support arbitrarily complex single round-trip transactions • Return results as self-describing trees of values • Support only ACID transactions!
  25. 25. Amorphous vs. NuoDB • Each based on peer to peer replicating atoms • NuoDB has differentiated node types • Amorphous has unified storage/execution nodes • Amorphous request execution: – Requests decomposed for parallel execution – Fragments sent to nodes where the data exists – Results returned and merged
  26. 26. Fetch * with “Jim Starkey” • first_name=“Jim”, last_name=“Starkey” • name=“Jim Starkey” • name=“JIM STARKEY” • name=“James Starkey” • name=“Starkey, Jim” • abstract=“Our third speaker this afternoon is Jim Starkey, who …” – In text, Word, or PDF
  27. 27. Lessons • Disruption is long, hard work • Perseverance is essential • Expect to hear: – “It can’t be done” before an idea takes hold – “It was intuitively obvious” afterwards • Bottom line: – Evolution is important – But sometimes a revolution is required