Why are you here <ul><li>Nobody needs this theorical mumbo-jumbo, right?
Rumor has is Google has a few PHD on it's payroll </li></ul>
The problem: The typical web developer's entire Database training is: <ul><li>I use MySql
I overheard it has a manual, but I didn't actually check since I use an ORM for everything. </li></ul>
ORMs probably did more damage to database understading than any other factor. <ul><li>Not understanding the fundamental caracteristics of the database system prevents you from making good decisions
Relational databases aren't the only option. If you go through the pain of using an ORMs, you need to at least want SOMETHING from relational databases: </li><ul><li>SQL
What about Map-Reduce? <ul><li>Classical distributed computing: </li><ul><li>Move program and data to processing nodes </li></ul><li>Map-Reduce </li><ul><li>Move program to the data node </li></ul></ul>
Comparing curency/locks/transactions Conc Control Data storage Replication Txn Redis Locks RAM Async N Scalaris Locks RAM Sync L Tokyo Locks RAM or disk Async L Voldemort MVCC RAM or BDB Async N SimpleDB None S3 Async N Riak MVCC Pluggable Async N MongoDB Field-level Disk Async N Couch DB MVCC Disk Async N HBase Locks Hadoop Async L HyperTable Locks Files Sync L Cassandra MVCC Disk Async L BigTable Locks + stamps GFS Sync + Async L ScaleDB Locks Disk Sync Y MySQL Cluster Locks Disk or RAM Sync Y MySql MyIsam Locks Disk Async N MySQL InnoDB MVCC Disk Async Y Drizzle Locks Disk Sync Y PostgreSQL MVCC Disk Pluggable Y
How early should you try to decouple your data? <ul><li>Spliting data is more difficult than spliting applications </li></ul>
Are we headed to the one database to rule them all?