Introduction to NoSQL Databases

50,083 views
49,270 views

Published on

Published in: Technology
6 Comments
78 Likes
Statistics
Notes
  • More than 5000 registered IT consultants and Corporates.Search for IT online training Providers at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement --- http://amzn.to/1PkBcIZ
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design (3rd Edition) --- http://amzn.to/21BQRut
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Fundamentals of Database Systems (7th Edition) --- http://amzn.to/22wjM5q
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  •    Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
50,083
On SlideShare
0
From Embeds
0
Number of Embeds
8,442
Actions
Shares
0
Downloads
0
Comments
6
Likes
78
Embeds 0
No embeds

No notes for slide
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Indexing types include, single-key, compound, unique, non-unique, and geospatial
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Introduction to NoSQL Databases

    1. 1. Introduction to <br />NoSQL Databases<br />San Diego NoSQL Meetup – Aug 2010<br />By Derek Stainer<br />http://nosqldatabases.com<br />
    2. 2. Agenda<br />Introduction<br />Objective<br />Explore NoSQL Databases<br />Conclusion<br />
    3. 3. Introduction<br />UCSD Graduate in Computer Science<br />Java Developer for 10 years<br />Creator of http://nosqldatabases.com<br />Curator of NoSQL information<br />
    4. 4. Objective<br />Deeper dive into each type of NoSQL database<br />Discuss 1-2 NoSQL databases in each family of databases<br />
    5. 5. NoSQL Taxonomy<br />Key/Value<br />Document<br />Column<br />Graph<br />Others<br />Geospatial<br />File System<br />Object<br />
    6. 6. Key/Value Databases<br />Global collection of Key/Value pairs<br />Inspired by Amazon’s Dynamo and Distributed Hashtables<br />Designed to handle massive load<br />Multiple Types<br />In memory i.e. Memcache<br />On Disk i.e. Redis, SimpleDB<br />Eventually Consistent i.e. Dynamo, Voldemort<br />
    7. 7. Key/Value: Voldemort<br />Created by LinkedIn, now open source<br />Inspired by Amazon’s Dynamo<br />Written in Java<br />Pluggable Storage<br />BerkeleyDB, In Memory, MySQL<br />Pluggable Serialization<br />JSON, Thrift, Protocol Buffers, etc.<br />Cluster Rebalancing<br />
    8. 8. Key/Value: Voldemort<br />Versioning, based on Vector Clocks<br />Reconciliation occurs on reads.<br />Partitioning and Replication based on Dynamo<br />Consistent Hashing<br />Virtual Nodes<br />Gossip<br />
    9. 9. Other Key/Value Stores<br />Other Key/Value Stores<br />Amazon’s Dynamo<br />Riak<br />Redis<br />Memcache<br />SimpleDB<br />
    10. 10. Document Databases<br />Similar to a Key/Value database but with a major difference, value is a document<br />Inspired by Lotus Notes<br />Flexible Schema<br />Any number of fields can be added<br />Documents stored in JSON or BSON formats<br />Examples: CouchDB, MongoDB<br />
    11. 11. Sample Document<br />{ <br /> "day": [ 2010, 01, 23 ], <br /> "products": { <br /> "apple": { "price": 10 "quantity": 6 }, <br /> "kiwi": { "price": 20 "quantity": 2 } <br /> }, <br /> "checkout": 100 <br />} <br />
    12. 12. Document: CouchDB<br />Development began ~ 2005 by Damien Katz former Lotus Notes Developer<br />Couch – Cluster Of Unreliable Commodity Hardware<br />Top level Apache Project<br />Commercially supported by CouchIO<br />Licensed under Apache License<br />Written in Erlang<br />Documents are stored in JSON<br />
    13. 13. Document: CouchDB [cont’d]<br />B-Tree Storage Engine<br />MVCC model, no locking <br />No joins, primary key or foreign key (UUIDs are auto assigned) <br />Built bi-directional replication<br />Can even run offline, come back and sync back changes<br />Custom persistent views using MapReduce<br />REST API<br />
    14. 14. Document: MongoDB<br />Development started in 2007<br />Commercially supported and developed by 10Gen<br />Stores documents using BSON<br />Supports AdHoc queries<br />Can query against embedded objects and arrays<br />Support multiples types of indexing<br />
    15. 15. Document: MongoDB [cont’d]<br />Officially supported drivers available for multiple languages<br />C, C++, Java, Javascript, Perl, PHP, Python and Ruby<br />Community supported drivers include:<br />Scala, Node.js, Haskell, Erlang, Smalltalk<br />Replication uses a master/slave model<br />Scales horizontally via sharding<br />Written C++<br />
    16. 16. Column Family Databases<br />Each key is associated with multiple attributes (i.e. Columns)<br />Hybrid row/column stores<br />Inspired by Google BigTable<br />Examples: HBase, Cassandra<br />
    17. 17. Column: HBase<br />Based on Google’s BigTable<br />Apache Project TLP<br />Cloudera (certifications, EC2 AMI’s, etc.)<br />Layered over HDFS (Hadoop Distributed File System)<br />Input/Output for MapReduce Jobs<br />APIs<br />Thrift, REST<br />
    18. 18. Column: Hbase [cont’d]<br />Automatic partitioning<br />Automatic re-balancing/re-partitioning<br />Fault tolerant<br />HDFS <br />Multiple Replicas<br />Highly distributed<br />
    19. 19. Column: Hbase [cont’d]<br />Lars George<br />
    20. 20. Column: Cassandra<br />Created at Facebook for Inbox search<br />Facebook -> Google Code -> ASF<br />Commercial Support available from Riptano<br />Features taken from both Dynamo and BigTable<br />Dynamo – Consistent hashing, Partitioning, Replication<br />Big Table – Column Familes, MemTables, SSTables<br />
    21. 21. Column: Cassandra [cont’d]<br />Symmetric nodes<br />No single point of failure<br />Linearly scalable<br />Ease of administration<br />Flexible/Automated Provisioning<br />Flexible Replica Replacement<br />High Availability<br />Eventual Consistency<br />However, consistency is tuneable<br />
    22. 22. Column: Cassandra [cont’d]<br />Partitioning<br />Random<br />Good distribution of data between nodes<br />Range scans not possible<br />Order Preserving<br />Can lead to unbalanced nodes<br />Range scans, Natural Order<br />Custom<br />Extremely fast reads/writes (low latency)<br />Thrift API<br />
    23. 23. Column: Cassandra [cont’d]<br />Column<br />Basic unit of storage<br />Column Family<br />Collection of like records<br />Record level atomicity<br />Indexed<br />Keyspace<br />Top level namespace<br />Usually one per application<br />
    24. 24. Column: Cassandra [cont’d]<br />Eric Evans<br />
    25. 25. Column: Cassandra [cont’d]<br />Column Details<br />Name<br />byte[]<br />Queried against<br />Determines sort order<br />Value<br />byte[]<br />Opaque to Cassandra<br />Timestamp<br />long<br />Conflict resolution (last write wins)<br />
    26. 26. Graph Databases<br />Inspired by Euler Graph Theory, G=(E,V)<br />Focused on modeling the structure of the data<br />Property Graph Data Model<br />Examples: Neo4j, InfiniteGraph<br />
    27. 27. Sample Property Graph[]<br />Todd Hoff<br />
    28. 28. Graph: Neo4j<br />Data Model: Property Graph<br />Nodes – Person, Place, Thing, etc.<br />Relationships – Lives, Likes, Owns, etc.<br />Properties on Both<br />Primary operation is graph traversal between nodes<br />Written in Java<br />Embedded database<br />
    29. 29. Graph: Neo4j [cont’d]<br />Disk-based<br />Graph stored in custom binary format<br />Transactional<br />JTA/JTS, XA, 2PC, MVCC<br />Scales<br />Billions of nodes/relationships/properties per JVM<br />Robust<br />6+ years in 24/7 production<br />
    30. 30. Graph: Neo4j [cont’d]<br />Multiple language binds<br />Jython, Cpython<br />Jruby (including RESTful API)<br />Clojure<br />Scala (including RESTful API)<br />Uses<br />Social Graph i.e. Facebook<br />Recommendation Engines<br />Financial Audit<br />
    31. 31. Graph: Neo4j [cont’d]<br />Licensed under AGPLv3<br />Dual Commercial License Available<br />First server is free<br />Second server Inexpensive<br />Commercial support provided by Neo Technologies<br />
    32. 32. Other Graph Databases<br />Other graph databases<br />InfiniteGraph<br />HyperGraphDB<br />sones<br />
    33. 33. Conclusion<br />
    34. 34. Thank You!<br />
    35. 35. References<br />NoSQL Databases - Part 1 – Landscape, Vineet Guptahttp://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html<br />NoSQL for Dummies, Tobias Ivarssonhttp://www.slideshare.net/thobe/nosql-for-dummies<br />NoSQL Databases, Marin Dimitrovhttp://www.slideshare.net/marin_dimitrov/nosql-databases-3584443<br />CouchDB vs. MongoDB, Gabriele Lanahttp://www.slideshare.net/gabriele.lana/couchdb-vs-mongodb-2982288<br />Hbase, Ryan Rawsonhttp://www.slideshare.net/adorepump/hbase-nosql<br />Introduction to Cassandra, Gary Dusbabekhttp://www.slideshare.net/gdusbabek/introduction-to-cassandra-june-2010<br />Cassandra Explained, Eric Evanshttp://www.slideshare.net/jericevans/cassandra-explained<br />Towards Robust Distributed Systems, Eric Brewerhttp://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf<br />Cassandra - A Decentralized Structured Storage System, Lakshman, Ladishttp://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf<br />
    36. 36. References [cont’d]<br />Bigtable: A Distributed Storage System for Structured Data, Google Inc.http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf<br />Dynamo: Amazon’s Highly Available Key-value Store, Amazon Inc.http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf<br />HBase Architecture 101 – Storage, Lars Georgehttp://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html<br />BASE: An ACID Alternative, Dan Pritchett<br />

    ×