Introduction to NoSQL Databases


Published on

Published in: Technology
  • More than 5000 registered IT consultants and Corporates.Search for IT online training Providers at
    Are you sure you want to  Yes  No
    Your message goes here
  • Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement ---
    Are you sure you want to  Yes  No
    Your message goes here
  • Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design (3rd Edition) ---
    Are you sure you want to  Yes  No
    Your message goes here
  • Fundamentals of Database Systems (7th Edition) ---
    Are you sure you want to  Yes  No
    Your message goes here
  •    Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Indexing types include, single-key, compound, unique, non-unique, and geospatial
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Introduction to NoSQL Databases

    1. 1. Introduction to <br />NoSQL Databases<br />San Diego NoSQL Meetup – Aug 2010<br />By Derek Stainer<br /><br />
    2. 2. Agenda<br />Introduction<br />Objective<br />Explore NoSQL Databases<br />Conclusion<br />
    3. 3. Introduction<br />UCSD Graduate in Computer Science<br />Java Developer for 10 years<br />Creator of<br />Curator of NoSQL information<br />
    4. 4. Objective<br />Deeper dive into each type of NoSQL database<br />Discuss 1-2 NoSQL databases in each family of databases<br />
    5. 5. NoSQL Taxonomy<br />Key/Value<br />Document<br />Column<br />Graph<br />Others<br />Geospatial<br />File System<br />Object<br />
    6. 6. Key/Value Databases<br />Global collection of Key/Value pairs<br />Inspired by Amazon’s Dynamo and Distributed Hashtables<br />Designed to handle massive load<br />Multiple Types<br />In memory i.e. Memcache<br />On Disk i.e. Redis, SimpleDB<br />Eventually Consistent i.e. Dynamo, Voldemort<br />
    7. 7. Key/Value: Voldemort<br />Created by LinkedIn, now open source<br />Inspired by Amazon’s Dynamo<br />Written in Java<br />Pluggable Storage<br />BerkeleyDB, In Memory, MySQL<br />Pluggable Serialization<br />JSON, Thrift, Protocol Buffers, etc.<br />Cluster Rebalancing<br />
    8. 8. Key/Value: Voldemort<br />Versioning, based on Vector Clocks<br />Reconciliation occurs on reads.<br />Partitioning and Replication based on Dynamo<br />Consistent Hashing<br />Virtual Nodes<br />Gossip<br />
    9. 9. Other Key/Value Stores<br />Other Key/Value Stores<br />Amazon’s Dynamo<br />Riak<br />Redis<br />Memcache<br />SimpleDB<br />
    10. 10. Document Databases<br />Similar to a Key/Value database but with a major difference, value is a document<br />Inspired by Lotus Notes<br />Flexible Schema<br />Any number of fields can be added<br />Documents stored in JSON or BSON formats<br />Examples: CouchDB, MongoDB<br />
    11. 11. Sample Document<br />{ <br /> "day": [ 2010, 01, 23 ], <br /> "products": { <br /> "apple": { "price": 10 "quantity": 6 }, <br /> "kiwi": { "price": 20 "quantity": 2 } <br /> }, <br /> "checkout": 100 <br />} <br />
    12. 12. Document: CouchDB<br />Development began ~ 2005 by Damien Katz former Lotus Notes Developer<br />Couch – Cluster Of Unreliable Commodity Hardware<br />Top level Apache Project<br />Commercially supported by CouchIO<br />Licensed under Apache License<br />Written in Erlang<br />Documents are stored in JSON<br />
    13. 13. Document: CouchDB [cont’d]<br />B-Tree Storage Engine<br />MVCC model, no locking <br />No joins, primary key or foreign key (UUIDs are auto assigned) <br />Built bi-directional replication<br />Can even run offline, come back and sync back changes<br />Custom persistent views using MapReduce<br />REST API<br />
    14. 14. Document: MongoDB<br />Development started in 2007<br />Commercially supported and developed by 10Gen<br />Stores documents using BSON<br />Supports AdHoc queries<br />Can query against embedded objects and arrays<br />Support multiples types of indexing<br />
    15. 15. Document: MongoDB [cont’d]<br />Officially supported drivers available for multiple languages<br />C, C++, Java, Javascript, Perl, PHP, Python and Ruby<br />Community supported drivers include:<br />Scala, Node.js, Haskell, Erlang, Smalltalk<br />Replication uses a master/slave model<br />Scales horizontally via sharding<br />Written C++<br />
    16. 16. Column Family Databases<br />Each key is associated with multiple attributes (i.e. Columns)<br />Hybrid row/column stores<br />Inspired by Google BigTable<br />Examples: HBase, Cassandra<br />
    17. 17. Column: HBase<br />Based on Google’s BigTable<br />Apache Project TLP<br />Cloudera (certifications, EC2 AMI’s, etc.)<br />Layered over HDFS (Hadoop Distributed File System)<br />Input/Output for MapReduce Jobs<br />APIs<br />Thrift, REST<br />
    18. 18. Column: Hbase [cont’d]<br />Automatic partitioning<br />Automatic re-balancing/re-partitioning<br />Fault tolerant<br />HDFS <br />Multiple Replicas<br />Highly distributed<br />
    19. 19. Column: Hbase [cont’d]<br />Lars George<br />
    20. 20. Column: Cassandra<br />Created at Facebook for Inbox search<br />Facebook -> Google Code -> ASF<br />Commercial Support available from Riptano<br />Features taken from both Dynamo and BigTable<br />Dynamo – Consistent hashing, Partitioning, Replication<br />Big Table – Column Familes, MemTables, SSTables<br />
    21. 21. Column: Cassandra [cont’d]<br />Symmetric nodes<br />No single point of failure<br />Linearly scalable<br />Ease of administration<br />Flexible/Automated Provisioning<br />Flexible Replica Replacement<br />High Availability<br />Eventual Consistency<br />However, consistency is tuneable<br />
    22. 22. Column: Cassandra [cont’d]<br />Partitioning<br />Random<br />Good distribution of data between nodes<br />Range scans not possible<br />Order Preserving<br />Can lead to unbalanced nodes<br />Range scans, Natural Order<br />Custom<br />Extremely fast reads/writes (low latency)<br />Thrift API<br />
    23. 23. Column: Cassandra [cont’d]<br />Column<br />Basic unit of storage<br />Column Family<br />Collection of like records<br />Record level atomicity<br />Indexed<br />Keyspace<br />Top level namespace<br />Usually one per application<br />
    24. 24. Column: Cassandra [cont’d]<br />Eric Evans<br />
    25. 25. Column: Cassandra [cont’d]<br />Column Details<br />Name<br />byte[]<br />Queried against<br />Determines sort order<br />Value<br />byte[]<br />Opaque to Cassandra<br />Timestamp<br />long<br />Conflict resolution (last write wins)<br />
    26. 26. Graph Databases<br />Inspired by Euler Graph Theory, G=(E,V)<br />Focused on modeling the structure of the data<br />Property Graph Data Model<br />Examples: Neo4j, InfiniteGraph<br />
    27. 27. Sample Property Graph[]<br />Todd Hoff<br />
    28. 28. Graph: Neo4j<br />Data Model: Property Graph<br />Nodes – Person, Place, Thing, etc.<br />Relationships – Lives, Likes, Owns, etc.<br />Properties on Both<br />Primary operation is graph traversal between nodes<br />Written in Java<br />Embedded database<br />
    29. 29. Graph: Neo4j [cont’d]<br />Disk-based<br />Graph stored in custom binary format<br />Transactional<br />JTA/JTS, XA, 2PC, MVCC<br />Scales<br />Billions of nodes/relationships/properties per JVM<br />Robust<br />6+ years in 24/7 production<br />
    30. 30. Graph: Neo4j [cont’d]<br />Multiple language binds<br />Jython, Cpython<br />Jruby (including RESTful API)<br />Clojure<br />Scala (including RESTful API)<br />Uses<br />Social Graph i.e. Facebook<br />Recommendation Engines<br />Financial Audit<br />
    31. 31. Graph: Neo4j [cont’d]<br />Licensed under AGPLv3<br />Dual Commercial License Available<br />First server is free<br />Second server Inexpensive<br />Commercial support provided by Neo Technologies<br />
    32. 32. Other Graph Databases<br />Other graph databases<br />InfiniteGraph<br />HyperGraphDB<br />sones<br />
    33. 33. Conclusion<br />
    34. 34. Thank You!<br />
    35. 35. References<br />NoSQL Databases - Part 1 – Landscape, Vineet Gupta<br />NoSQL for Dummies, Tobias Ivarsson<br />NoSQL Databases, Marin Dimitrov<br />CouchDB vs. MongoDB, Gabriele Lana<br />Hbase, Ryan Rawson<br />Introduction to Cassandra, Gary Dusbabek<br />Cassandra Explained, Eric Evans<br />Towards Robust Distributed Systems, Eric Brewer<br />Cassandra - A Decentralized Structured Storage System, Lakshman, Ladis<br />
    36. 36. References [cont’d]<br />Bigtable: A Distributed Storage System for Structured Data, Google Inc.<br />Dynamo: Amazon’s Highly Available Key-value Store, Amazon Inc.<br />HBase Architecture 101 – Storage, Lars George<br />BASE: An ACID Alternative, Dan Pritchett<br />