Your SlideShare is downloading. ×
Introduction to NoSQL Databases
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Introduction to NoSQL Databases

37,286
views

Published on

Published in: Technology

2 Comments
57 Likes
Statistics
Notes
  •    Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • nosql is uploded version of sql ............currently google,facebook,amegan,,,,,,,,,,are using ,,,b/c it is the largest data base,,more than previous,,,,,,,,,they also pay for this......
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
37,286
On Slideshare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
0
Comments
2
Likes
57
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Indexing types include, single-key, compound, unique, non-unique, and geospatial
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Transcript

    • 1. Introduction to
      NoSQL Databases
      San Diego NoSQL Meetup – Aug 2010
      By Derek Stainer
      http://nosqldatabases.com
    • 2. Agenda
      Introduction
      Objective
      Explore NoSQL Databases
      Conclusion
    • 3. Introduction
      UCSD Graduate in Computer Science
      Java Developer for 10 years
      Creator of http://nosqldatabases.com
      Curator of NoSQL information
    • 4. Objective
      Deeper dive into each type of NoSQL database
      Discuss 1-2 NoSQL databases in each family of databases
    • 5. NoSQL Taxonomy
      Key/Value
      Document
      Column
      Graph
      Others
      Geospatial
      File System
      Object
    • 6. Key/Value Databases
      Global collection of Key/Value pairs
      Inspired by Amazon’s Dynamo and Distributed Hashtables
      Designed to handle massive load
      Multiple Types
      In memory i.e. Memcache
      On Disk i.e. Redis, SimpleDB
      Eventually Consistent i.e. Dynamo, Voldemort
    • 7. Key/Value: Voldemort
      Created by LinkedIn, now open source
      Inspired by Amazon’s Dynamo
      Written in Java
      Pluggable Storage
      BerkeleyDB, In Memory, MySQL
      Pluggable Serialization
      JSON, Thrift, Protocol Buffers, etc.
      Cluster Rebalancing
    • 8. Key/Value: Voldemort
      Versioning, based on Vector Clocks
      Reconciliation occurs on reads.
      Partitioning and Replication based on Dynamo
      Consistent Hashing
      Virtual Nodes
      Gossip
    • 9. Other Key/Value Stores
      Other Key/Value Stores
      Amazon’s Dynamo
      Riak
      Redis
      Memcache
      SimpleDB
    • 10. Document Databases
      Similar to a Key/Value database but with a major difference, value is a document
      Inspired by Lotus Notes
      Flexible Schema
      Any number of fields can be added
      Documents stored in JSON or BSON formats
      Examples: CouchDB, MongoDB
    • 11. Sample Document
      {
      "day": [ 2010, 01, 23 ],
      "products": {
      "apple": { "price": 10 "quantity": 6 },
      "kiwi": { "price": 20 "quantity": 2 }
      },
      "checkout": 100
      }
    • 12. Document: CouchDB
      Development began ~ 2005 by Damien Katz former Lotus Notes Developer
      Couch – Cluster Of Unreliable Commodity Hardware
      Top level Apache Project
      Commercially supported by CouchIO
      Licensed under Apache License
      Written in Erlang
      Documents are stored in JSON
    • 13. Document: CouchDB [cont’d]
      B-Tree Storage Engine
      MVCC model, no locking
      No joins, primary key or foreign key (UUIDs are auto assigned)
      Built bi-directional replication
      Can even run offline, come back and sync back changes
      Custom persistent views using MapReduce
      REST API
    • 14. Document: MongoDB
      Development started in 2007
      Commercially supported and developed by 10Gen
      Stores documents using BSON
      Supports AdHoc queries
      Can query against embedded objects and arrays
      Support multiples types of indexing
    • 15. Document: MongoDB [cont’d]
      Officially supported drivers available for multiple languages
      C, C++, Java, Javascript, Perl, PHP, Python and Ruby
      Community supported drivers include:
      Scala, Node.js, Haskell, Erlang, Smalltalk
      Replication uses a master/slave model
      Scales horizontally via sharding
      Written C++
    • 16. Column Family Databases
      Each key is associated with multiple attributes (i.e. Columns)
      Hybrid row/column stores
      Inspired by Google BigTable
      Examples: HBase, Cassandra
    • 17. Column: HBase
      Based on Google’s BigTable
      Apache Project TLP
      Cloudera (certifications, EC2 AMI’s, etc.)
      Layered over HDFS (Hadoop Distributed File System)
      Input/Output for MapReduce Jobs
      APIs
      Thrift, REST
    • 18. Column: Hbase [cont’d]
      Automatic partitioning
      Automatic re-balancing/re-partitioning
      Fault tolerant
      HDFS
      Multiple Replicas
      Highly distributed
    • 19. Column: Hbase [cont’d]
      Lars George
    • 20. Column: Cassandra
      Created at Facebook for Inbox search
      Facebook -> Google Code -> ASF
      Commercial Support available from Riptano
      Features taken from both Dynamo and BigTable
      Dynamo – Consistent hashing, Partitioning, Replication
      Big Table – Column Familes, MemTables, SSTables
    • 21. Column: Cassandra [cont’d]
      Symmetric nodes
      No single point of failure
      Linearly scalable
      Ease of administration
      Flexible/Automated Provisioning
      Flexible Replica Replacement
      High Availability
      Eventual Consistency
      However, consistency is tuneable
    • 22. Column: Cassandra [cont’d]
      Partitioning
      Random
      Good distribution of data between nodes
      Range scans not possible
      Order Preserving
      Can lead to unbalanced nodes
      Range scans, Natural Order
      Custom
      Extremely fast reads/writes (low latency)
      Thrift API
    • 23. Column: Cassandra [cont’d]
      Column
      Basic unit of storage
      Column Family
      Collection of like records
      Record level atomicity
      Indexed
      Keyspace
      Top level namespace
      Usually one per application
    • 24. Column: Cassandra [cont’d]
      Eric Evans
    • 25. Column: Cassandra [cont’d]
      Column Details
      Name
      byte[]
      Queried against
      Determines sort order
      Value
      byte[]
      Opaque to Cassandra
      Timestamp
      long
      Conflict resolution (last write wins)
    • 26. Graph Databases
      Inspired by Euler Graph Theory, G=(E,V)
      Focused on modeling the structure of the data
      Property Graph Data Model
      Examples: Neo4j, InfiniteGraph
    • 27. Sample Property Graph[]
      Todd Hoff
    • 28. Graph: Neo4j
      Data Model: Property Graph
      Nodes – Person, Place, Thing, etc.
      Relationships – Lives, Likes, Owns, etc.
      Properties on Both
      Primary operation is graph traversal between nodes
      Written in Java
      Embedded database
    • 29. Graph: Neo4j [cont’d]
      Disk-based
      Graph stored in custom binary format
      Transactional
      JTA/JTS, XA, 2PC, MVCC
      Scales
      Billions of nodes/relationships/properties per JVM
      Robust
      6+ years in 24/7 production
    • 30. Graph: Neo4j [cont’d]
      Multiple language binds
      Jython, Cpython
      Jruby (including RESTful API)
      Clojure
      Scala (including RESTful API)
      Uses
      Social Graph i.e. Facebook
      Recommendation Engines
      Financial Audit
    • 31. Graph: Neo4j [cont’d]
      Licensed under AGPLv3
      Dual Commercial License Available
      First server is free
      Second server Inexpensive
      Commercial support provided by Neo Technologies
    • 32. Other Graph Databases
      Other graph databases
      InfiniteGraph
      HyperGraphDB
      sones
    • 33. Conclusion
    • 34. Thank You!
    • 35. References
      NoSQL Databases - Part 1 – Landscape, Vineet Guptahttp://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html
      NoSQL for Dummies, Tobias Ivarssonhttp://www.slideshare.net/thobe/nosql-for-dummies
      NoSQL Databases, Marin Dimitrovhttp://www.slideshare.net/marin_dimitrov/nosql-databases-3584443
      CouchDB vs. MongoDB, Gabriele Lanahttp://www.slideshare.net/gabriele.lana/couchdb-vs-mongodb-2982288
      Hbase, Ryan Rawsonhttp://www.slideshare.net/adorepump/hbase-nosql
      Introduction to Cassandra, Gary Dusbabekhttp://www.slideshare.net/gdusbabek/introduction-to-cassandra-june-2010
      Cassandra Explained, Eric Evanshttp://www.slideshare.net/jericevans/cassandra-explained
      Towards Robust Distributed Systems, Eric Brewerhttp://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
      Cassandra - A Decentralized Structured Storage System, Lakshman, Ladishttp://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
    • 36. References [cont’d]
      Bigtable: A Distributed Storage System for Structured Data, Google Inc.http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf
      Dynamo: Amazon’s Highly Available Key-value Store, Amazon Inc.http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
      HBase Architecture 101 – Storage, Lars Georgehttp://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
      BASE: An ACID Alternative, Dan Pritchett