• Share
  • Email
  • Embed
  • Like
  • Private Content
CouchConf-Bangalore-Intro-to-document-databases
 

CouchConf-Bangalore-Intro-to-document-databases

on

  • 513 views

 

Statistics

Views

Total Views
513
Views on SlideShare
513
Embed Views
0

Actions

Likes
0
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    CouchConf-Bangalore-Intro-to-document-databases CouchConf-Bangalore-Intro-to-document-databases Presentation Transcript

    • Introduction to Document DatabasesDustin Sallings @dlsspy
    • HOW DO WE THINK ABOUT DATA? a brief history 2
    • IDS IMS (IBM) “A Relational Model of Ingres Charles Bachman (GE) Vern Watts Data for Michael Stonebraker MUMPS, Large Shared (Berkeley) history seminal Pickhypertext/hypermedia/web seminal events in in internethypertext/hypermedia/web in internetinternet seminal events internet historyevents in seminal(TRW) internet events beginnings of the history history Data Banks” E.F. Codd (IBM) 1850 1945 1957 1958 1962 1965 1966 1968 1969 1970 1972 1973Atlantic Cable ARPACyrus W. Field (USA) IMP ARPANET "As We May Think" oNLine System (NLS) (UCLA-Stanford) Vannevar Bush Sputnik Doug Engelbart (USSR) • 1850 - Atlantic cable -- taking data transmission up a notch • 1945 - As we may think - "He urges that men of science should then turn to the massive task of making more accessible our bewildering store of knowledge." • 1958 - ARPA - "prevent technological surprise like the launch of Sputnik" - "to prevent technological surprise to the US, but also to create technological surprise for its enemies" • 1969 - IMP - interface message processor (packet network)
    • IDS IMS (IBM) “A Relational Model of Ingres Charles Bachman (GE) Vern Watts Data for Michael Stonebraker MUMPS, Large Shared (Berkeley) Pick (TRW) Data Banks” E.F. Codd (IBM) 1850 1945 1957 1958 1962 1965 1966 1968 1969 1970 1972 1973Atlantic Cable ARPAseminal events in internet history hierarchical/network databases relational databasesCyrus W. Field (USA) IMP ARPANET "As We May Think" oNLine System (NLS) (UCLA-Stanford) Vannevar Bush Sputnik Doug Engelbart (USSR) • 1965 - MUMPS - Massachusetts General Hospital Utility Multi-Programming System - It was largely adopted during the 1970s and early 1980s in healthcare and financial information systems/databases, and continues to be used by many of the same clients today. It is currently used in electronic health record systems as well as by multiple banking networks and online trading/investment services.
    • Pre-1960 GemStone/S (GemStone) Cache Versant Intersystems GT.M, Oracle (Versant) (MUMPS) BerkeleyDB (Larry Ellison) many MySQL Metakit MUMPS Lotus Notes (Michael WideniusSystem R other ANSI, (Lotus) and David Axmark) hypertext/hypermedia/web (IBM) hypertext/hypermedia/web beginnings of the internet ODBMSs seminal events in internet history DBM1974 1976 1977 1982 1983 1984 1985 1989 1990 1991 1994 1997 DNS line-mode browser Cello (Paul Mockapetris) (Nicola Pellow) (Tom Bruce) TCP/IP WWW Mosaic NeXT (Vint Cerf (Tim Berners-Lee) (Marc Andreeson) and ViolaWWW Bob Kahn) (Pei Wei) Hypercard (Bill Atkinson)
    • Pre-1960 GemStone/S (GemStone) Cache Versant Intersystems GT.M, Oracle (Versant) (MUMPS) BerkeleyDB (Larry Ellison) many MySQL Metakit MUMPS Lotus Notes (Michael WideniusSystem R other ANSI, (Lotus) and David Axmark) (IBM) ODBMSs DBM1974 1976 1977 1982 1983 1984 1985 1989 1990 1991 1994 1997relational events in MUMPSDNS seminal databasesseminal object databases line-modeopen source browser Cello (Paul Mockapetris) (Nicola Pellow) (Tom Bruce) TCP/IP WWW Mosaic NeXT (Vint Cerf (Tim Berners-Lee) (Marc Andreeson) and ViolaWWW Bob Kahn) (Pei Wei) Hypercard (Bill Atkinson)
    • Terrastore, Project Voldemort, Riak db4o Cassandra Dynomite, JackRabbit, Hbase, Neo4j QDBM Tokyo Cabinet MongoDB VertexDB BigTable Amazon Couchbase Server "NoSQL" Dynamo "NoSQL"Carlo Rozzi seminal memcached distributed computing events in CouchDB (paper) mobile devices mobile devices membase Couchbase Mobile 1998 2000 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 iOS and iPhone iPad Kindle FireOpen Source Summit CAP Theorem Steve Jobs Tim OReilly Formally Proven Android CAP Theorem Seth Gilbert, (Andy Rubin) Samsung Galaxy Eric Brewer Nancy Lynch (MIT)
    • Terrastore, Project Voldemort, Riak db4o Cassandra Dynomite, JackRabbit, Hbase, Neo4j QDBM Tokyo Cabinet MongoDB VertexDB BigTable Amazon Couchbase Server "NoSQL" Dynamo "NoSQL" memcachedCarlo Rozzi CouchDB (paper) membase Couchbase Mobile 1998 2000 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 seminal eventsseminaliOS and in internet history NoSQL (“Not Only events of Scale in internet history SQL”) iPhone Issues iPad Kindle FireOpen Source Summit CAP Theorem Steve Jobs Tim OReilly Formally Proven Android CAP Theorem Seth Gilbert, (Andy Rubin) Samsung Galaxy Eric Brewer Nancy Lynch (MIT)
    • 2011The Web dynamic user population in the millions innovative applications with changing requirementsMobile off-line applications server synchronization and sharingNoSQL distributed systems availability or consistency 9
    • Logic Scales!
    • This application runs all the way to the edge. A billion concurrent users on this application will have the same experience as a single user.
    • What About Data?
    • The Relational Database Solution
    • RDBMS Scales ... at what cost?
    • ACIDAtomicitydatabase modifications are all or nothingConsistencydatabases go from one consistent state to anotherIsolationtransactions never interfere with each otherDurabilityonce a transaction is committed, it stays 15
    • TRANSACTIONS DAN PRITCHETT, EBAYPayPal Uses TransactionseBay Doesn’t (for non-critical data) two-phase commit not pragmatic responsiveness and site availability would suffer new role for database as a ‘data store’ new problem: transactions per wattOther High-Volume independently determine the same strategy 16
    • CAP THEOREMConsistencyreads and writes happen correctlyAvailability PICK TWO!every operation returns a resultPartition Tolerancethe network allows lost and undeliverablemessages 17
    • BASEBasically AB AvailableSoft-StateSEventually ConsistentE 18
    • The NoSQL Solution
    • The NoSQL Solution
    • DATABASEFeatures-First Oracle, SQL Server, DB2, MySQL, PostgreSQL, Amazon RDSScale-First Couchbase Server, CouchDB, Project Voldemort, Riak, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, Cassandra, HBase and HypertableSimple Structured Storage Amazon SimpleDB, Berkeley DBPurpose-Optimized Stores StreamBase, Vertica, Aster Data, Netezza, Greenplum, VoltDB 21
    • NOSQL TAXONOMY STEVEN YEN, COUCHBASEkey-value-cachekey-value-storeeventually-consistent key-value-storeordered-key-value-storedata-structures servertuple-storeobject databasedocument databasewide columnar store 22
    • WHO WILL WIN? 23
    •  THE MOSTAPPROACHABLE API WITHENOUGH POWER WILL WIN 24
    • NOSQL TAXONOMYkey-value-cachekey-value-storeeventually-consistent key-value-storeordered-key-value-storedata-structures servertuple-storeobject databasedocument databasewide columnar store 25
    • WHY DOCUMENT DATABASES? 26
    • DOCUMENT DATABASE APIS ARE ‘APPROACHABLE’ HTTP GET, POST, PUT, DELETE memcached GET, SET, DELETE, ADD, REPLACE, ... 27
    • Everyone UnderstandsAPIs in every language to work with your data.
    • Documents are FlexibleFocus point: Applications tend to only care about the parts they find interesting while preserving the rest.
    • DOCUMENT DATABASES HAVE “ENOUGH POWER” fast reads from cache fast writes to persistent single-document atomic writes document conflict resolution replication of data clustering and fault-tolerance fail-over and rebalancing on the fly rolling upgrades and deployment high availability partition tolerance rapid-development tools 30
    • Document Reads Are Fast document caching
    • Document Reads Are Fast document persistence
    • Document Writes Are Fast write request write to cache document write queue write asynchronous document
    • Document Writes Are Fast A-R A-R 15 14 new reductions I-R 8 A-H I-R 7 7 M-R 5 A-C D-F G-H I-L N-R 3 2 2 3 4A B C D F G H I K L M N O Q R new document
    • Document Writes Are Safe A-R A-R 15 new root 14 new revisions I-R 8 A-H I-R 7 7 M-R 5 A-C D-F G-H I-L N-R 3 2 2 3 4A B C D F G H I K L M N O Q R
    • Document Databases Scale Out
    • Document Databases Replicate A B❦ 37
    • Document Database Queries Are Fast A-R 14 A-H I-R 7 7 A-C D-F G-H I-L M-R 3 2 2 3 5A B C D F G38 H I K L M N O Q R startkey endkey
    • Document Databases Are Developer Friendly
    • “THE ROADS AND CROSSROADS OF INTERNET HISTORY”HTTP://WWW.NETVALLEY.COM/INTVAL1.HTML“A BRIEF HISTORY OF NOSQL”HTTP://BLOG.KNUTHAUGEN.NO/2010/03/A-BRIEF-HISTORY-OF-NOSQL.HTML“HISTORY OF THE ATLANTIC CABLE AND UNDERSEA COMMUNICATIONS”HTTP://ATLANTIC-CABLE.COM/FIELD/INDEX.HTM“A LITTLE HISTORY OF THE WORLD WIDE WEB”HTTP://WWW.W3.ORG/HISTORY.HTML“DAN PRITCHETT ON ARCHITECTURE AT EBAY”HTTP://WWW.INFOQ.COM/INTERVIEWS/DAN-PRITCHETT-EBAY-ARCHITECTURE“NOSQL IS A HORSELESS CARRIAGE” BY STEVEN YENHTTP://DL.DROPBOX.COM/U/2075876/NOSQL-STEVE-YEN.PDF 40