Introduction to NoSQL Databases


Published on

Published in: Technology
  •    Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • nosql is uploded version of sql ............currently google,facebook,amegan,,,,,,,,,,are using ,,,b/c it is the largest data base,,more than previous,,,,,,,,,they also pay for this......
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Indexing types include, single-key, compound, unique, non-unique, and geospatial
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Introduction to NoSQL Databases

    1. 1. Introduction to <br />NoSQL Databases<br />San Diego NoSQL Meetup – Aug 2010<br />By Derek Stainer<br /><br />
    2. 2. Agenda<br />Introduction<br />Objective<br />Explore NoSQL Databases<br />Conclusion<br />
    3. 3. Introduction<br />UCSD Graduate in Computer Science<br />Java Developer for 10 years<br />Creator of<br />Curator of NoSQL information<br />
    4. 4. Objective<br />Deeper dive into each type of NoSQL database<br />Discuss 1-2 NoSQL databases in each family of databases<br />
    5. 5. NoSQL Taxonomy<br />Key/Value<br />Document<br />Column<br />Graph<br />Others<br />Geospatial<br />File System<br />Object<br />
    6. 6. Key/Value Databases<br />Global collection of Key/Value pairs<br />Inspired by Amazon’s Dynamo and Distributed Hashtables<br />Designed to handle massive load<br />Multiple Types<br />In memory i.e. Memcache<br />On Disk i.e. Redis, SimpleDB<br />Eventually Consistent i.e. Dynamo, Voldemort<br />
    7. 7. Key/Value: Voldemort<br />Created by LinkedIn, now open source<br />Inspired by Amazon’s Dynamo<br />Written in Java<br />Pluggable Storage<br />BerkeleyDB, In Memory, MySQL<br />Pluggable Serialization<br />JSON, Thrift, Protocol Buffers, etc.<br />Cluster Rebalancing<br />
    8. 8. Key/Value: Voldemort<br />Versioning, based on Vector Clocks<br />Reconciliation occurs on reads.<br />Partitioning and Replication based on Dynamo<br />Consistent Hashing<br />Virtual Nodes<br />Gossip<br />
    9. 9. Other Key/Value Stores<br />Other Key/Value Stores<br />Amazon’s Dynamo<br />Riak<br />Redis<br />Memcache<br />SimpleDB<br />
    10. 10. Document Databases<br />Similar to a Key/Value database but with a major difference, value is a document<br />Inspired by Lotus Notes<br />Flexible Schema<br />Any number of fields can be added<br />Documents stored in JSON or BSON formats<br />Examples: CouchDB, MongoDB<br />
    11. 11. Sample Document<br />{ <br /> "day": [ 2010, 01, 23 ], <br /> "products": { <br /> "apple": { "price": 10 "quantity": 6 }, <br /> "kiwi": { "price": 20 "quantity": 2 } <br /> }, <br /> "checkout": 100 <br />} <br />
    12. 12. Document: CouchDB<br />Development began ~ 2005 by Damien Katz former Lotus Notes Developer<br />Couch – Cluster Of Unreliable Commodity Hardware<br />Top level Apache Project<br />Commercially supported by CouchIO<br />Licensed under Apache License<br />Written in Erlang<br />Documents are stored in JSON<br />
    13. 13. Document: CouchDB [cont’d]<br />B-Tree Storage Engine<br />MVCC model, no locking <br />No joins, primary key or foreign key (UUIDs are auto assigned) <br />Built bi-directional replication<br />Can even run offline, come back and sync back changes<br />Custom persistent views using MapReduce<br />REST API<br />
    14. 14. Document: MongoDB<br />Development started in 2007<br />Commercially supported and developed by 10Gen<br />Stores documents using BSON<br />Supports AdHoc queries<br />Can query against embedded objects and arrays<br />Support multiples types of indexing<br />
    15. 15. Document: MongoDB [cont’d]<br />Officially supported drivers available for multiple languages<br />C, C++, Java, Javascript, Perl, PHP, Python and Ruby<br />Community supported drivers include:<br />Scala, Node.js, Haskell, Erlang, Smalltalk<br />Replication uses a master/slave model<br />Scales horizontally via sharding<br />Written C++<br />
    16. 16. Column Family Databases<br />Each key is associated with multiple attributes (i.e. Columns)<br />Hybrid row/column stores<br />Inspired by Google BigTable<br />Examples: HBase, Cassandra<br />
    17. 17. Column: HBase<br />Based on Google’s BigTable<br />Apache Project TLP<br />Cloudera (certifications, EC2 AMI’s, etc.)<br />Layered over HDFS (Hadoop Distributed File System)<br />Input/Output for MapReduce Jobs<br />APIs<br />Thrift, REST<br />
    18. 18. Column: Hbase [cont’d]<br />Automatic partitioning<br />Automatic re-balancing/re-partitioning<br />Fault tolerant<br />HDFS <br />Multiple Replicas<br />Highly distributed<br />
    19. 19. Column: Hbase [cont’d]<br />Lars George<br />
    20. 20. Column: Cassandra<br />Created at Facebook for Inbox search<br />Facebook -> Google Code -> ASF<br />Commercial Support available from Riptano<br />Features taken from both Dynamo and BigTable<br />Dynamo – Consistent hashing, Partitioning, Replication<br />Big Table – Column Familes, MemTables, SSTables<br />
    21. 21. Column: Cassandra [cont’d]<br />Symmetric nodes<br />No single point of failure<br />Linearly scalable<br />Ease of administration<br />Flexible/Automated Provisioning<br />Flexible Replica Replacement<br />High Availability<br />Eventual Consistency<br />However, consistency is tuneable<br />
    22. 22. Column: Cassandra [cont’d]<br />Partitioning<br />Random<br />Good distribution of data between nodes<br />Range scans not possible<br />Order Preserving<br />Can lead to unbalanced nodes<br />Range scans, Natural Order<br />Custom<br />Extremely fast reads/writes (low latency)<br />Thrift API<br />
    23. 23. Column: Cassandra [cont’d]<br />Column<br />Basic unit of storage<br />Column Family<br />Collection of like records<br />Record level atomicity<br />Indexed<br />Keyspace<br />Top level namespace<br />Usually one per application<br />
    24. 24. Column: Cassandra [cont’d]<br />Eric Evans<br />
    25. 25. Column: Cassandra [cont’d]<br />Column Details<br />Name<br />byte[]<br />Queried against<br />Determines sort order<br />Value<br />byte[]<br />Opaque to Cassandra<br />Timestamp<br />long<br />Conflict resolution (last write wins)<br />
    26. 26. Graph Databases<br />Inspired by Euler Graph Theory, G=(E,V)<br />Focused on modeling the structure of the data<br />Property Graph Data Model<br />Examples: Neo4j, InfiniteGraph<br />
    27. 27. Sample Property Graph[]<br />Todd Hoff<br />
    28. 28. Graph: Neo4j<br />Data Model: Property Graph<br />Nodes – Person, Place, Thing, etc.<br />Relationships – Lives, Likes, Owns, etc.<br />Properties on Both<br />Primary operation is graph traversal between nodes<br />Written in Java<br />Embedded database<br />
    29. 29. Graph: Neo4j [cont’d]<br />Disk-based<br />Graph stored in custom binary format<br />Transactional<br />JTA/JTS, XA, 2PC, MVCC<br />Scales<br />Billions of nodes/relationships/properties per JVM<br />Robust<br />6+ years in 24/7 production<br />
    30. 30. Graph: Neo4j [cont’d]<br />Multiple language binds<br />Jython, Cpython<br />Jruby (including RESTful API)<br />Clojure<br />Scala (including RESTful API)<br />Uses<br />Social Graph i.e. Facebook<br />Recommendation Engines<br />Financial Audit<br />
    31. 31. Graph: Neo4j [cont’d]<br />Licensed under AGPLv3<br />Dual Commercial License Available<br />First server is free<br />Second server Inexpensive<br />Commercial support provided by Neo Technologies<br />
    32. 32. Other Graph Databases<br />Other graph databases<br />InfiniteGraph<br />HyperGraphDB<br />sones<br />
    33. 33. Conclusion<br />
    34. 34. Thank You!<br />
    35. 35. References<br />NoSQL Databases - Part 1 – Landscape, Vineet Gupta<br />NoSQL for Dummies, Tobias Ivarsson<br />NoSQL Databases, Marin Dimitrov<br />CouchDB vs. MongoDB, Gabriele Lana<br />Hbase, Ryan Rawson<br />Introduction to Cassandra, Gary Dusbabek<br />Cassandra Explained, Eric Evans<br />Towards Robust Distributed Systems, Eric Brewer<br />Cassandra - A Decentralized Structured Storage System, Lakshman, Ladis<br />
    36. 36. References [cont’d]<br />Bigtable: A Distributed Storage System for Structured Data, Google Inc.<br />Dynamo: Amazon’s Highly Available Key-value Store, Amazon Inc.<br />HBase Architecture 101 – Storage, Lars George<br />BASE: An ACID Alternative, Dan Pritchett<br />