• Save
Databases benoitg 2009-03-10
Upcoming SlideShare
Loading in...5

Databases benoitg 2009-03-10






Total Views
Views on SlideShare
Embed Views



1 Embed 4

http://www.linkedin.com 4



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Databases benoitg 2009-03-10 Databases benoitg 2009-03-10 Presentation Transcript

  • Making sense in the brave new world of databases Benoit Grégoire Savoir-faire Linux [email_address]
  • Why am I here?
    • The last time SQL was declared dead wasn't fun.
    • There is this new hype called “NoSql”
    • I don't want to deal with another database holy war for the next ten years of my career. Especially if it will prevent us from using the right tool for the job.
    • Lots of smart people worked on new tools, but fanboys are beginning to strike...
  • Typical decision criteria
    • Criteria 1: Is it popular/hyped?
    • There is no criteria 2
  • What did I do
    • Survey all databases that:
      • Are OSS
      • Are active projects
  • Why are you here
    • Nobody needs this theorical mumbo-jumbo, right?
    • Rumor has is Google has a few PHD on it's payroll
  • The problem: The typical web developer's entire Database training is:
    • I use MySql
    • I overheard it has a manual, but I didn't actually check since I use an ORM for everything.
  • ORMs probably did more damage to database understading than any other factor.
    • Not understanding the fundamental caracteristics of the database system prevents you from making good decisions
    • Relational databases aren't the only option. If you go through the pain of using an ORMs, you need to at least want SOMETHING from relational databases:
      • SQL
      • Transactions
      • Relational integrity and schema enforcement
      • Fast joins WITHOUT instanciating objects
      • Replication (but then you probably chose the wrong solution)
  • So if not NoSQL, then what
    • High Performance Scalable Data Stores (HPSDS)
    • Scalable Non Relational Database (SNRD?)
    • The movement formerly known as NoSQL?
    • Databases
  • Internet scale
    • It's great that it's possible
    • Most web applications are not
    • Those that are don't have all of their components « Internet scale »
    • Plan for it, but you probably don't need it NOW.
      • Google didn'T start with BigTable and MapReduce...
  • Major problem: Scaling
    • Read scaling
      • Comparatively easy
    • Write scaling
      • Much harder
    • Computational scalling
      • Not historically part of database feature lists
  • Major problem: Availability and replication
  • Major problem: Transactions
  • Traditional databases: ACID
    • Atomicity
    • Consistency
    • Isolation
    • Durability
  • Life is about compromises
    • So is computing
    • Good – Fast – Cheap
    • For distributed databases
  • Brewer's CAP Theorem (2000)
    • In a distributed system, chose any two of:
      • Consistency
      • Availability
      • Partition Tolerance
    • No set of failures less than total network failure is allowed to cause the system to respond incorrectly (Gilbert & Lynch)
  • Latency is all important
    • And unavoidable (the speed of light is unlikely to change) Montreal Sydney round-trip IS going to take 130ms
    • Unless maybe...
  • BASE
    • Basically Available
    • Soft-state
    • Eventually consistent
  • Classyfying databases is difficult
    • There are projects trying to be more than one basic type (ModetDB, Virtuoso)
    • Some of them use another as backend or frontend
    • Selection criterias
      • Open source
      • Active project
  • Databases classification
  • Relational databases
  • Column-oriented databases
  • Key-Value stores
  • Hierarchical Databases
  • Graph databases
  • Document databases
  • Extensible record stores
  • Object databases
  • What about Map-Reduce?
    • Classical distributed computing:
      • Move program and data to processing nodes
    • Map-Reduce
      • Move program to the data node
  • Comparing curency/locks/transactions Conc Control Data storage Replication Txn Redis Locks RAM Async N Scalaris Locks RAM Sync L Tokyo Locks RAM or disk Async L Voldemort MVCC RAM or BDB Async N SimpleDB None S3 Async N Riak MVCC Pluggable Async N MongoDB Field-level Disk Async N Couch DB MVCC Disk Async N HBase Locks Hadoop Async L HyperTable Locks Files Sync L Cassandra MVCC Disk Async L BigTable Locks + stamps GFS Sync + Async L ScaleDB Locks Disk Sync Y MySQL Cluster Locks Disk or RAM Sync Y MySql MyIsam Locks Disk Async N MySQL InnoDB MVCC Disk Async Y Drizzle Locks Disk Sync Y PostgreSQL MVCC Disk Pluggable Y
  • How early should you try to decouple your data?
    • Spliting data is more difficult than spliting applications
  • Are we headed to the one database to rule them all?
  • Closing thoughts
    • Profile, profile, profile
    • Don't neglect local caching
    • A database is not something that can be abstracted out.