More Related Content


Front Range PHP NoSQL Databases

  1. NoSQL Databases Jon Meredith [email_address]
  2. NOT a product.
  3. NOT a single technology.
  4. Mostly created in response to scaling and reliability problems.
  5. Huge differences between 'NoSQL' systems – but have elements in common.
  6. Object databases
  7. Graph databases
  8. e-commerce
  9. Social networking
  10. All shambling under the NoSQL banner.
  11. My application needs transactions
  12. Data needs to be nicely normalized
  13. I have replication for scalabilty/reliability
  14. Sets
  15. Arrays
  16. Upgrade/rollback scripts have to operate on the whole database – could be millions of rows.
  17. Doing phased rollouts is hard … the application needs to do work
  18. Google's protocol buffers
  19. Version objects
  20. Every change on the master happens on the slave.
  21. Slaves are read-only. Does not scale INSERT, UPDATE, DELETE queries.
  22. Application responsible for distributing queries to correct server.
  23. Updates travel around the ring
  24. Add back in to the ring
  25. Replication takes time – there is time lag between the first and last server to see an update.
  26. You may not read your writes – not getting aCid properties any more.
  27. The application needs to know how to resolve it
  28. ...Available...
  29. ...Maintainable...
  30. with an RDBMs requires large efforts by application developers and operational staff
  31. App needs to know data location
  32. App needs to handle failover
  33. Network partitions common
  34. Network routes are flapping
  35. Data centers are being destroyed by tornadoes
  36. Best seller lists
  37. Fault tolerant: Keeps N copies of the data
  38. Designed for inconsistency
  39. Totally decentralized – nodes 'gossip' state
  40. Self-healing
  41. Availability
  42. Amazon chose A-P over C
  43. Read operations (get) require 'R' nodes to respond
  44. Write operations (put) require 'W' nodes to respond
  45. If R+W > N nodes will read their writes (if no failure)
  46. NRW tunes the cluster – typically (3,2,2)
  47. Dynamo minimizes with vector clocks
  48. Vector Clocks
  49. Partitioning
  50. Shopping Cart - Conflict Network Failure
  51. Shopping Cart - Merge
  52. Project Voldemort
  53. Google Earth
  54. Table indexed by {key,timestamp} and a variable number of sparse columns
  55. Columns are grouped into column families. Columns in a family are stored together.
  56. Each table is broken into tablets.
  57. Tablets are stored on a cluster file system (GFS).
  58. BigTable – Column Families Copyright Google
  59. Programmers write two functions map() and reduce().
  60. Table is mapped, then reduced.
  61. Job control system monitors and resubmits.
  62. Map/Reduce Source:
  63. CouchDB Map/Reduce
  64. 2010/01/nosql-databases-part-1-landscape.html
  65. So many projects! Dynamo, BigTables, Redis Riak, Voldemort, CouchDb, Peanuts Hadoop/Hbase, Cassandra, Hypertable MongoDb, Terrastore, Scalaris, BerkleyDB MemcacheDB, Dynomite, Neo4J, TokyoCabinet … and more
  66. Sparse Column Family
  67. Decentralized
  68. RESTful HTTP interface
  69. Fully distributed
  70. Clients for multiple languages
  71. Filesystem
  72. Key/Value Store with structured values
  73. Written in C
  74. Memcache-like protocol
  75. Engine Yard
  76. VideoWiki
  77. Operations like increment, decrement, intersection, push, pop
  78. In-memory (can be backed by disk)
  79. Auto-sharding in client libraries
  80. No fault tolerance (coming after 2.0)
  81. Example: retwis – Twitter clone in PHP
  82. BigTable ColumnFamily data model
  83. Dynamo data distribution
  84. Written in Java
  85. Thrift based interface
  86. Twitter
  87. Used by Ubuntu One
  88. HTTP interface
  89. Uses Javascript for indexing/mapreduce
  90. Incremental replication
  91. Multi-process
  92. Replicated
  93. Neo4J – Graph Database
  94. Peanuts – Yahoo
  95. Even range queries are hard for distributed hash systems.
  96. No transactions – rules out some classes of applications.
  97. Space is still evolving
  98. They force you to think about distributed design issues like consistency.
  99. Play with them!

Editor's Notes

  1. Introduce Disclose work for Basho Working on Dynamo clone for the last couple of years