Claremont Report on Database Research: In Depth Talk (Eric A. Brewer)


Published on

This is a set of slides from the Claremont Report on Database Research, see for more details. These particular slides are from a more in depth talk by "Eric A. Brewer." (Uploaded for discussion at the Stanford InfoBlog,

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Claremont Report on Database Research: In Depth Talk (Eric A. Brewer)

  1. 1. Thought about it… Most of my wish list hasn’t changed much An outside view… Sigmod 97 keynote about search CIDR 2003 keynote about new areas that Prof. Eric A. Brewer don’t fit DBMS well UC Berkeley So, some review, some new stuff Intel Research (until July) Proposal: Layered Database Example: Search Engines Pros: No use of database technology Enable new database-like things Things that would have been helpful: Faster innovation for components High availability and replication Many parallel experiments (like Linux) Atomic version vectors Should be public domain ideally Tools for new declarative languages Cons: Join machinery Hard to ensure global properties Not needed: But those that care will get them… Complex locks, Query Optimization Closest is Berkeley DB (?) Transactions, Redo, Undo Example: Scientific Computing Other Misfits Uses databases, but not a good fit Bioinformatics: Data often stored in files Wrong operators Most operators are outside the DBMS Need error propagation Database is an expensive replicated file system Versioning, read mostly (in/out but no joins) App Servers: Things that layered system might provide: Session state, session migration Multi-version storage system New operators App server will be a small database Tools for new declarative languages Workflow 1
  2. 2. So what happened? Directions I’d like to see… Accepted: one size does not fit all… Integrated notion of statistics Couldn’t get much traction on layered Store the noise (rather than clean it) database Create cleaner views Built our own from scratch Core probabilistic queries Stasis, Rusty Sears Move away from update-in-place Open source, could be something special Many inputs are sacred (e.g. science) But big picture largely unchanged Transactional versioning Too hard to explore the fun spaces Provenance & annotation But layering DID happen! But whole database is now just transactional storage Directions (2) Many Core Better integration into PL Hard to get any performance benefit for BASE semantics (not just ACID) I/O bound applications Repeated automatic extraction Main memory DB?? Web crawlers do this Limited by off-chip bandwidth Much of MapReduce workload Need dataflow optimizations on/off chip Need to integrate with versioning, provenance, statistics Import is a continuous process, not an event Backup 1) Layering enables competition Examples from OS community: X86, SPEC benchmarks, Virtual machines SCSI disks, RAID, NAS Routers, Firewalls, Proxies Some layers commodities (raw disks) Some layers innovative (replication) Always have unexpected uses 2
  3. 3. 2) Many more experiments 3) Reduces Time to Market Centralized planning tries very few Lower cost of entry things More important: Just good enough! Layering enables many more bets Few global properties in early versions Also enables VC funding The web, search engines, even e-commerce Ex: IP layer, ASICs => networking startups P2P Enables niche markets (lower cost of entry) WebMethods Easier path for XML, bio, spatial, …. Global properties added over time! Most bets fail, but some succeed Ugly but fast wins the race… Claims Conclusions If you can’t control, then enable Can’t control (or predict) the future… This is the lesson from OS work for CIDR better to enable broad innovation Unix, TCP enabled the web Neither attempted to control usage Control Make global properties tractable HTTP in turn enabled P2P But limits innovation DB research suffers from “Albatross 9i” Artifact hides the enabling technology A public domain layered database: CIDR exists for this reason Would enable more innovation Allow a broader range of properties Rate of Innovation Claim: layering increases innovation 1) Enables competition 2) Many more experiments 3) Reduces time to market 3