Successfully reported this slideshow.

Claremont Report on Database Research: Research Directions (Raghu Ramakrishnan)

2,114 views

Published on

This is a set of slides from the Claremont Report on Database Research, see http://db.cs.berkeley.edu/claremont/ for more details. These particular slides are from a "Research Directions" talk by "Raghu Ramakrishnan." (Uploaded for discussion at the Stanford InfoBlog, http://infoblog.stanford.edu/.)

Published in: Technology, Education
  • Be the first to comment

Claremont Report on Database Research: Research Directions (Raghu Ramakrishnan)

  1. 1. Web Data Management Raghu Ramakrishnan
  2. 2. QUIQ Lessons <ul><li>Structured data management powers scalable collaboration environments </li></ul><ul><li>ASP </li></ul><ul><li>Multi-tenancy </li></ul><ul><li>Massively distributed </li></ul><ul><li>Fine-grained permissions, hierarchical acls </li></ul><ul><li>RDBMSs were a lousy fit </li></ul>
  3. 3. Cloud Computing: Computing as a Service Cloud Computing CPU Intensive Data Intensive Analytic E.g., SSDS, Hadoop Packaged Software High-throughput E.g., Condor “ Transactional” Storage & Serving E.g., PNUTS, S3, SSDS, UDB
  4. 4. Implications <ul><li>Data management as a service </li></ul><ul><ul><li>Scientists and others who’ve resisted (installing, maintaining, and) using DBMSs will find it much easier to reap the benefits </li></ul></ul><ul><ul><li>“ Data centers” and “Computing Centers” will come into vogue again </li></ul></ul><ul><li>Hosted back-ends and RAD tools will make Web application development accessible to all </li></ul><ul><ul><li>The Web is becoming open </li></ul></ul><ul><ul><ul><li>E.g., OpenSocial, OpenID </li></ul></ul></ul><ul><ul><ul><li>Ideas will be the most valuable currency, not the wherewithal to build complex systems </li></ul></ul></ul><ul><li>Paradigm shifts possible for how we do research in many fields </li></ul><ul><ul><li>Build applications that embed your algorithms and test them directly in the field—Computer Scientists can interact directly with users (ironically, this would still be a breakthrough of sorts after four decades!) </li></ul></ul><ul><ul><li>Many other disciplines (e.g., Sociology, microeconomics) can design and conduct online experiments involving unprecedented numbers of participants </li></ul></ul>
  5. 5. PNUTS: DB in the Cloud CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database Geographic replication Indexes and views Structured, flexible schema Hosted, managed infrastructure E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E
  6. 6. <ul><li>Goal: </li></ul><ul><li>Make it easier for applications to reason about updates and cope with asynchrony—alternative to “transactions” in an asynchronous world </li></ul><ul><li>What happens to a record with primary key “Brian”? </li></ul><ul><li>Guarantees: </li></ul><ul><li>Every reader will always see some consistent, but possibly stale version </li></ul><ul><li>Readers can request a more up-to-date version, but may pay extra latency </li></ul><ul><ul><li>Special case: Critical read (writer/readers see their own writes) </li></ul></ul><ul><li>Writers can verify that the record is still at the version they expect </li></ul>Basic Consistency Model Time Record inserted Update Update Delete v. 1 v. 2 v. 3 Generation 1 Record inserted Update Update Delete v. 1 v. 2 v. 4 Generation 2 Update v. 3 Record inserted Delete v. 1 Generation 3
  7. 7. Lots of Issues to Re-think <ul><li>Massive distribution & replication </li></ul><ul><ul><li>Asynchrony </li></ul></ul><ul><ul><li>Availability </li></ul></ul><ul><ul><li>Consistency </li></ul></ul><ul><li>DBA to the world </li></ul><ul><ul><li>Auto-tuning </li></ul></ul><ul><ul><li>Multi-tenancy </li></ul></ul><ul><ul><li>Access control (granularity, online ids) </li></ul></ul><ul><ul><li>Encryption </li></ul></ul><ul><li>App-support </li></ul><ul><ul><li>Caching </li></ul></ul>
  8. 8. Querying the Web <ul><li>Search will become more semantic—best-effort match-making between: </li></ul><ul><ul><li>Query intent (NLP, query logs …) </li></ul></ul><ul><ul><li>Interpreted web content </li></ul></ul><ul><li>Deep web has a lot of structured data </li></ul><ul><ul><li>How we get a handle on it is an interesting problem </li></ul></ul><ul><ul><li>But this is only part of the problem … lots of data not here </li></ul></ul><ul><li>Semantic web isn’t working </li></ul><ul><li>Site-wrapping doesn’t scale </li></ul><ul><li>Solutions? </li></ul><ul><ul><li>Domain-wrapping </li></ul></ul><ul><ul><li>Mass collaboration </li></ul></ul><ul><ul><li>?? </li></ul></ul>

×