Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Non-Relational Databases at ACCU2011

3,356 views

Published on

Slides from my talk at ACCU2011 in Oxford on 16th April 2011. A whirlwind tour of the non-relational database families, with a little more detail on Redis, MongoDB, Neo4j and HBase.

Published in: Technology
  • Be the first to comment

Non-Relational Databases at ACCU2011

  1. 1. * databases query_language 
<>
‘SQL’;Gavin Heavyside - ACCU Conference - 16 April 2011
  2. 2. *databasesquery_language
<>
‘SQL’LIMIT
4;
  3. 3. Me• Director of Engineering at MyDrive• Hands-on coding in Ruby, C++ & others• Big data, SW architecture, robustness, tdd, devops, data analysis• Background of SW for telecoms, mobile, embedded• @gavinheavyside
  4. 4. MyDrive Solutions• Driver behaviour analysis and scoring for telematics-based insurance• Large-scale geospatial processing of GPS and map data• Relational DBs - PostgreSQL, MySQL• Non-relational DBs - Redis, HBase• Big Data tools - Hadoop• Built on Linux and open-source stack
  5. 5. RDBMS
  6. 6. What is an RDBMS• “Codd’s 12 Rules”, 1970• Relations • e.g. tables, rows, columns• Relational Operators • Manipulate data in tabular form
  7. 7. ACID• Atomicity• Consistency• Isolation• Durability
  8. 8. Atomicity• All or nothing• Maintain atomicity across failures
  9. 9. Consistency• DB moves from one consistent state to another• Only valid data is written to DB• It can only enforce rules it knows about
  10. 10. Isolation• Transactions can’t see data from other incomplete transactions• Blocking & Deadlocks • Dirty reads • MVCC
  11. 11. Locking• Row locking• Whole table locking• TX might require lots of locks• Blocking
  12. 12. MVCC• Multi-Version Concurrency Control• Maintain several versions of objects• Read & write timestamps on transactions• Reads never blocked
  13. 13. Durability• Data from successful tx is never lost
  14. 14. What’s wrong with relational DBs?
  15. 15. http://www.flickr.com/photos/exfordy/4734358134/
  16. 16. All the cool kids use non-relational DBs...Facebook LinkedInTwitter Google
  17. 17. ...and relational DBs
  18. 18. What’s wrong with relational DBs?• Nothing• ‘Impedance Mismatch’• Scaling
  19. 19. Scaling an RDBMS• Launch successful service• Read saturation - add caching• Write saturation - add hardware (£££)• Queries slow - denormalise• Reads still too slow - prematerialise common queries, stop joining• Writes too slow - drop secondary indexes and triggers
  20. 20. Denormalising• Normalise logical data design • Joins • Materialised views can optimise queries• Denormalise logical data design • Eliminate joins • Application must ensure data consistency
  21. 21. Scaling a distributed DB• Just add more commodity servers...• ...we wish
  22. 22. CAP Theorem• Eric Brewer, 2000• Distributed System can’t simultaneously be • Consistent • Available • Partition-tolerant
  23. 23. BASE• Basically Available• Soft state• Eventually consistent• Relaxation of the C in CAP
  24. 24. Eventual Consistency• All nodes eventually see the same data• Different strategies • One • Quorum • All
  25. 25. Horizontal Scaling• Partitioning• Sharding• Dynamo-style
  26. 26. http://vimeo.com/13667174
  27. 27. Non-relational Database Families• Document-oriented• Graph• Column-oriented• Key-value & DHT• Others
  28. 28. DocumentDatabases
  29. 29. Document Databases• IBM Lotus• CouchDB• MongoDB• Riak
  30. 30. http://mongodb.org
  31. 31. MongoDB• JSON-style documents• Indexes on any field• Replication, auto-sharding• Map/Reduce
  32. 32. MongoDB Demo
  33. 33. Other Features• Document linking & embedding• GridFS - store large files• Geospatial indexes and searches
  34. 34. OM
  35. 35. Graph DBs http://www.flickr.com/photos/thefangmonster/2301364418/
  36. 36. Graph Databases• Nodes, relationships & properties• Query by traversing graph• Natural fit for recommendations, shortest paths, social graph
  37. 37. Graph DBs• FlockDB• Neo4j• Apache Hama• Google Pregel
  38. 38. Neo4j• Embedded• Server• REST• Components - indexing, management, rdf, geospatial
  39. 39. Key-Value & DHT
  40. 40. Key-Value & DHT• Amazon Dynamo• Project Voldemort• Redis• Tokyo Cabinet• Amazon SimpleDB
  41. 41. http://redis.io
  42. 42. redis• By Salvatore Sanfillipo (@antirez)• Sponsored by VMware• data-structure server• strings, hashes, lists• sets, sorted sets• All operations in memory, backed by disk
  43. 43. Text Interactive Documentation
  44. 44. Redis Demo
  45. 45. Other features• Replication (master/slaves)• Persistence • Snapshotting • Append-only log file
  46. 46. Object Hash Mappers• cf ORM• OHM
  47. 47. Other KV Stores• Berkeley DB• Memcache• Microsoft Dynomite
  48. 48. Column-Oriented DBs http://www.flickr.com/photos/nationalmediamuseum/3588099765/
  49. 49. Column-Oriented Databases• Google Bigtable• Cassandra• Hypertable• HBase
  50. 50. HBase http://www.flickr.com/photos/negativz/14470756/
  51. 51. • Apache top-level project• Implementation of Google Bigtable• Distributed• High write throughput• ‘real-time’ read/write
  52. 52. HBase• Automatic partitioning• Scale linearly and automatically• Commodity HW• Fault tolerant• MapReduce
  53. 53. Data Model• Schema-less• Versioned cells• key/column family/cell qualifier/timestamp• Column Families
  54. 54. http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
  55. 55. Text http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
  56. 56. Other DBs• Couchbase• Kyoto Cabinet• Many more I’ve omitted
  57. 57. Wrap Up• RDBMS vs non-relational• Distribute DBs• Non-relational families
  58. 58. The End@gavinheavysidegavin.heavyside@mydrivesolutions.com

×