Databases benoitg 2009-03-10


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Databases benoitg 2009-03-10

  1. 1. Making sense in the brave new world of databases Benoit Grégoire Savoir-faire Linux [email_address]
  2. 2. Why am I here? <ul><li>The last time SQL was declared dead wasn't fun.
  3. 3. There is this new hype called “NoSql”
  4. 4. I don't want to deal with another database holy war for the next ten years of my career. Especially if it will prevent us from using the right tool for the job.
  5. 5. Lots of smart people worked on new tools, but fanboys are beginning to strike... </li></ul>
  6. 6. Typical decision criteria <ul><li>Criteria 1: Is it popular/hyped?
  7. 7. There is no criteria 2 </li></ul>
  8. 8. What did I do <ul><li>Survey all databases that: </li><ul><li>Are OSS
  9. 9. Are active projects </li></ul></ul>
  10. 10. Why are you here <ul><li>Nobody needs this theorical mumbo-jumbo, right?
  11. 11. Rumor has is Google has a few PHD on it's payroll </li></ul>
  12. 12. The problem: The typical web developer's entire Database training is: <ul><li>I use MySql
  13. 13. I overheard it has a manual, but I didn't actually check since I use an ORM for everything. </li></ul>
  14. 14. ORMs probably did more damage to database understading than any other factor. <ul><li>Not understanding the fundamental caracteristics of the database system prevents you from making good decisions
  15. 15. Relational databases aren't the only option. If you go through the pain of using an ORMs, you need to at least want SOMETHING from relational databases: </li><ul><li>SQL
  16. 16. Transactions
  17. 17. Relational integrity and schema enforcement
  18. 18. Fast joins WITHOUT instanciating objects
  19. 19. Replication (but then you probably chose the wrong solution) </li></ul></ul>
  20. 20. So if not NoSQL, then what <ul><li>High Performance Scalable Data Stores (HPSDS)
  21. 21. Scalable Non Relational Database (SNRD?)
  22. 22. The movement formerly known as NoSQL?
  23. 23. Databases </li></ul>
  24. 24. Internet scale <ul><li>It's great that it's possible
  25. 25. Most web applications are not
  26. 26. Those that are don't have all of their components « Internet scale »
  27. 27. Plan for it, but you probably don't need it NOW. </li><ul><li>Google didn'T start with BigTable and MapReduce... </li></ul></ul>
  28. 28. Major problem: Scaling <ul><li>Read scaling </li><ul><li>Comparatively easy </li></ul><li>Write scaling </li><ul><li>Much harder </li></ul><li>Computational scalling </li><ul><li>Not historically part of database feature lists </li></ul></ul>
  29. 29. Major problem: Availability and replication
  30. 30. Major problem: Transactions
  31. 31. Traditional databases: ACID <ul><li>Atomicity
  32. 32. Consistency
  33. 33. Isolation
  34. 34. Durability </li></ul>
  35. 35. Life is about compromises <ul><li>So is computing
  36. 36. Good – Fast – Cheap
  37. 37. For distributed databases </li></ul>
  38. 38. Brewer's CAP Theorem (2000) <ul><li>In a distributed system, chose any two of: </li><ul><li>Consistency
  39. 39. Availability
  40. 40. Partition Tolerance </li></ul><li>No set of failures less than total network failure is allowed to cause the system to respond incorrectly (Gilbert & Lynch) </li></ul>
  41. 41. Latency is all important <ul><li>And unavoidable (the speed of light is unlikely to change) Montreal Sydney round-trip IS going to take 130ms
  42. 42. Unless maybe... </li></ul>
  43. 43. BASE <ul><li>Basically Available
  44. 44. Soft-state
  45. 45. Eventually consistent </li></ul>
  46. 46. Classyfying databases is difficult <ul><li>There are projects trying to be more than one basic type (ModetDB, Virtuoso)
  47. 47. Some of them use another as backend or frontend
  48. 48. Selection criterias </li><ul><li>Open source
  49. 49. Active project </li></ul></ul>
  50. 50. Databases classification
  51. 51. Relational databases
  52. 52. Column-oriented databases
  53. 53. Key-Value stores
  54. 54. Hierarchical Databases
  55. 55. Graph databases
  56. 56. Document databases
  57. 57. Extensible record stores
  58. 58. Object databases
  59. 59. What about Map-Reduce? <ul><li>Classical distributed computing: </li><ul><li>Move program and data to processing nodes </li></ul><li>Map-Reduce </li><ul><li>Move program to the data node </li></ul></ul>
  60. 60. Comparing curency/locks/transactions Conc Control Data storage Replication Txn Redis Locks RAM Async N Scalaris Locks RAM Sync L Tokyo Locks RAM or disk Async L Voldemort MVCC RAM or BDB Async N SimpleDB None S3 Async N Riak MVCC Pluggable Async N MongoDB Field-level Disk Async N Couch DB MVCC Disk Async N HBase Locks Hadoop Async L HyperTable Locks Files Sync L Cassandra MVCC Disk Async L BigTable Locks + stamps GFS Sync + Async L ScaleDB Locks Disk Sync Y MySQL Cluster Locks Disk or RAM Sync Y MySql MyIsam Locks Disk Async N MySQL InnoDB MVCC Disk Async Y Drizzle Locks Disk Sync Y PostgreSQL MVCC Disk Pluggable Y
  61. 61. How early should you try to decouple your data? <ul><li>Spliting data is more difficult than spliting applications </li></ul>
  62. 62. Are we headed to the one database to rule them all?
  63. 63. Closing thoughts <ul><li>Profile, profile, profile
  64. 64. Don't neglect local caching
  65. 65. A database is not something that can be abstracted out. </li></ul>