Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Re-inventing the Database: What to Keep and What to Throw Away


Published on

NoSQL has turned many database concepts upside down. Consistency models, transactions, data models, and query interfaces are being reinvented. Tradeoffs between performance, availability, managability, and usability are being re-thought. In this talk 10gen President Max Schireson, reviews some of the different approaches being taken and offers opinions on the right choices for different uses.

Published in: Technology
  • Be the first to comment

Re-inventing the Database: What to Keep and What to Throw Away

  1. 1. Reinventing the DatabaseMax SchiresonPresident, 10gen
  2. 2. My backgroundAt Oracle from 1994 to 2003At MarkLogic from 2003 to Feb 2011Join 10gen Feb 2011
  3. 3. The world has changed 1970 2011Main memory Intel 1103, 1k bits 4GB of RAM costs $25.99 $25 99Mass storage IBM 3330 Model 1, 100 3TB Superspeed USB MB for $129Microprocessor Nearly – 4004 being Westmere EX has 10 developed; 4 bits and cores, 30MB L3 cache, 92,000 instructions per runs at 2.4GHz secondMotor Trend Car of the Ford Torino Chevy VoltYearPresident Richard Nixon Barack ObamaTed Codd In his 40’s DeadMe In diapers In my 40s
  4. 4. More recent changes A decade ago NowFaster Buy a bigger server Buy more serversFaster tF t storage A SAN with more ith SSD spindlesMore reliable storage More expensive SAN More copies of local storageDeployed in Your data center The cloud – private or publicLarge user base Thousands - Millions - consumers employeesTracking Business transactions Every click and more
  5. 5. Assumptions behind todays DBMSRelational data modelThird normal formACIDSQL QMulti-Multi-statement transactionsDatabase is hardware agnosticRAM is small and disks are slowIf its too slow you can buy a faster computer
  6. 6. Yesterday’s assumptions in today’s t d ’ worldldScaleout is hard Distributed joins are hard Making two-phase commits fast is hard two-Custom solutions proliferate pToo slow? Just add a cacheORM t l everywhere tools hMore computers and disk are nearly free but SANand f d faster computers are expensive i
  7. 7. Challenging some assumptions tiDo you need a database at allHow does it scale outWhat type of queries does it need to be able to doHow should it model dataHow do you query itHow does it handle transactions and consistencyIs iI it enterprise software, open source, an appliance, or a cloud service i f li l d iDoes the data fit in memory?What if your disks are SSD?
  8. 8. My opinionsDifferent use cases will produce different answersExisting RDBMS solutions will continue to solve abroad set of problems well but many applicationswill work better on top of alternative technologiesMany new technologies will find niches but onlyone or two will become mainstream
  9. 9. Do you need a database at all llCan you better solve your problem with a batchprocessing frameworkCan you better solve your problem with an inmemory object store/cache
  10. 10. How does it scale outScale-Scale-out for working set sizeScale-Scale-out for total data sizeScale out for write volumeScale-Scale-out for read volumeScale-Scale-out for redundancyHow do you incrementally add nodes or change configurationHow do you trade off query performance (which wants fewerindex segments) for elasticity (which wants more indexsegments))
  11. 11. What type of queries does it need t b able to d d to be bl t doIs a key/value store enoughWill you be retrieving your data by one key or bymanyIs there a primary way you ll be viewing your data you’llDo you need specialized queries (eg, time series, (eg,geospatial)
  12. 12. Imagine a garage…You hand your valet the keys to your carBefore they park your car, they completely disassemble itThe pistons are stored in piston storage, brake pads with brake pads, steering p p g p p gwheels with steering wheelsOver time, they have storage areas for catalytic converters, DVD-based nav DVD-systems, headlight washers, and traction control systemsWhen you ask for your car back, the valet is incredibly fast at reassemblyOne minor issue: you have to provide the disassembly and reassembly instructionsand they will be followed literally, even if you say the spare tire should be used asa steering wheel and forgot to specify re-insertion of spark plugs re-A technological marvelMight be a good way to store your car if you don’t know whether you’ll be askingfor a car back or lots of brake pads or pistons – for a salvage yard?
  13. 13. How should it model dataRelational Row oriented or column orientedKey valueDocument orientedGraph oriented
  14. 14. How do you query itDo you want an API, a language, or a map-reduce map-style interface?Will most of your queries be hand-typed, embedded hand-in code or dynamically generated
  15. 15. How do you handletransactions and consistencyt ti d i tDo you need transactions at all Be careful; web services, for example, need to be able to assign userIDsDo you need multi-master updates multi- If so, how do y resolve conflicts , youDo you need immediate consistency? For some queries or all?How do you handle failures Are you optimizing for read availability or write availability
  16. 16. What is itEnterprise softwareOpen source p With commercial support?Appliance Packaged with commodity hardware Specialized hardwareCloudCl d service i Available for on-premise deployment? on- Integrated in another PaaS offering? Where on the net?
  17. 17. Does the data fit in memoryTransactions can be very very fastDo you trust enough copies in memory (perhapsacross multiple data centers) or do you requiresome sort of sync to persistent storageHow big will the data be and how much do youcare about costs
  18. 18. What if your disks are SSDAlleviate hotspotsRandom accesses are measured in microseconds notmillisecondsDegradation from in-memory to on-disk can be in- on-more graceful But data representations on disk vs in memory may be very different which may create significant overhead
  19. 19. In choosing a solutionExamine your requirements They will dictate certain choicesOnce you have narrowed the field Prefer solutions that may become mainstream y Consider TCO: Purchase cost Learning curve L i Productivity Viability
  20. 20. Which solution sets will become mainstream b i tHigh confidence Horizontally scalable: to take advantage of hardware trends Non- Non-relational: to enable scalability Highly functional: for usage beyond mega-scale mega- Developer- Developer-friendly: because decision making has shifted Freely available: for rapid adoptionMy predictions Document oriented: enables scalability, functionality, developer friendliness, and agility Open source: with multiple PaaS providers