• Save
Introduction to NoSQL Databases
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • NoSQL Tutorial
    Are you sure you want to
    Your message goes here
  • nosql is uploded version of sql ............currently google,facebook,amegan,,,,,,,,,,are using ,,,b/c it is the largest data base,,more than previous,,,,,,,,,they also pay for this......
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
37,231
On Slideshare
29,951
From Embeds
7,280
Number of Embeds
20

Actions

Shares
Downloads
0
Comments
2
Likes
54

Embeds 7,280

http://www.nosqldatabases.com 5,513
http://www.skill-guru.com 1,346
http://konnaissances.blogspot.com 171
http://www.aalizwel.com 126
http://www.scoop.it 34
http://konnaissances.blogspot.fr 33
http://aalizwel.com 30
http://translate.googleusercontent.com 8
http://translate.yandex.net 4
http://www.konnaissances.blogspot.com 3
http://www.techgig.com 2
http://webcache.googleusercontent.com 2
http://feedly.com 1
http://static.slidesharecdn.com 1
http://www.google.fr 1
http://konnaissances.blogspot.be 1
http://cache.baidu.com 1
http://www.bonweb.fr 1
http://konnaissances.blogspot.ca 1
https://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Indexing types include, single-key, compound, unique, non-unique, and geospatial
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Surveying the NoSQL Landscape, By Derek Stainer

Transcript

  • 1. Introduction to
    NoSQL Databases
    San Diego NoSQL Meetup – Aug 2010
    By Derek Stainer
    http://nosqldatabases.com
  • 2. Agenda
    Introduction
    Objective
    Explore NoSQL Databases
    Conclusion
  • 3. Introduction
    UCSD Graduate in Computer Science
    Java Developer for 10 years
    Creator of http://nosqldatabases.com
    Curator of NoSQL information
  • 4. Objective
    Deeper dive into each type of NoSQL database
    Discuss 1-2 NoSQL databases in each family of databases
  • 5. NoSQL Taxonomy
    Key/Value
    Document
    Column
    Graph
    Others
    Geospatial
    File System
    Object
  • 6. Key/Value Databases
    Global collection of Key/Value pairs
    Inspired by Amazon’s Dynamo and Distributed Hashtables
    Designed to handle massive load
    Multiple Types
    In memory i.e. Memcache
    On Disk i.e. Redis, SimpleDB
    Eventually Consistent i.e. Dynamo, Voldemort
  • 7. Key/Value: Voldemort
    Created by LinkedIn, now open source
    Inspired by Amazon’s Dynamo
    Written in Java
    Pluggable Storage
    BerkeleyDB, In Memory, MySQL
    Pluggable Serialization
    JSON, Thrift, Protocol Buffers, etc.
    Cluster Rebalancing
  • 8. Key/Value: Voldemort
    Versioning, based on Vector Clocks
    Reconciliation occurs on reads.
    Partitioning and Replication based on Dynamo
    Consistent Hashing
    Virtual Nodes
    Gossip
  • 9. Other Key/Value Stores
    Other Key/Value Stores
    Amazon’s Dynamo
    Riak
    Redis
    Memcache
    SimpleDB
  • 10. Document Databases
    Similar to a Key/Value database but with a major difference, value is a document
    Inspired by Lotus Notes
    Flexible Schema
    Any number of fields can be added
    Documents stored in JSON or BSON formats
    Examples: CouchDB, MongoDB
  • 11. Sample Document
    {
    "day": [ 2010, 01, 23 ],
    "products": {
    "apple": { "price": 10 "quantity": 6 },
    "kiwi": { "price": 20 "quantity": 2 }
    },
    "checkout": 100
    }
  • 12. Document: CouchDB
    Development began ~ 2005 by Damien Katz former Lotus Notes Developer
    Couch – Cluster Of Unreliable Commodity Hardware
    Top level Apache Project
    Commercially supported by CouchIO
    Licensed under Apache License
    Written in Erlang
    Documents are stored in JSON
  • 13. Document: CouchDB [cont’d]
    B-Tree Storage Engine
    MVCC model, no locking
    No joins, primary key or foreign key (UUIDs are auto assigned)
    Built bi-directional replication
    Can even run offline, come back and sync back changes
    Custom persistent views using MapReduce
    REST API
  • 14. Document: MongoDB
    Development started in 2007
    Commercially supported and developed by 10Gen
    Stores documents using BSON
    Supports AdHoc queries
    Can query against embedded objects and arrays
    Support multiples types of indexing
  • 15. Document: MongoDB [cont’d]
    Officially supported drivers available for multiple languages
    C, C++, Java, Javascript, Perl, PHP, Python and Ruby
    Community supported drivers include:
    Scala, Node.js, Haskell, Erlang, Smalltalk
    Replication uses a master/slave model
    Scales horizontally via sharding
    Written C++
  • 16. Column Family Databases
    Each key is associated with multiple attributes (i.e. Columns)
    Hybrid row/column stores
    Inspired by Google BigTable
    Examples: HBase, Cassandra
  • 17. Column: HBase
    Based on Google’s BigTable
    Apache Project TLP
    Cloudera (certifications, EC2 AMI’s, etc.)
    Layered over HDFS (Hadoop Distributed File System)
    Input/Output for MapReduce Jobs
    APIs
    Thrift, REST
  • 18. Column: Hbase [cont’d]
    Automatic partitioning
    Automatic re-balancing/re-partitioning
    Fault tolerant
    HDFS
    Multiple Replicas
    Highly distributed
  • 19. Column: Hbase [cont’d]
    Lars George
  • 20. Column: Cassandra
    Created at Facebook for Inbox search
    Facebook -> Google Code -> ASF
    Commercial Support available from Riptano
    Features taken from both Dynamo and BigTable
    Dynamo – Consistent hashing, Partitioning, Replication
    Big Table – Column Familes, MemTables, SSTables
  • 21. Column: Cassandra [cont’d]
    Symmetric nodes
    No single point of failure
    Linearly scalable
    Ease of administration
    Flexible/Automated Provisioning
    Flexible Replica Replacement
    High Availability
    Eventual Consistency
    However, consistency is tuneable
  • 22. Column: Cassandra [cont’d]
    Partitioning
    Random
    Good distribution of data between nodes
    Range scans not possible
    Order Preserving
    Can lead to unbalanced nodes
    Range scans, Natural Order
    Custom
    Extremely fast reads/writes (low latency)
    Thrift API
  • 23. Column: Cassandra [cont’d]
    Column
    Basic unit of storage
    Column Family
    Collection of like records
    Record level atomicity
    Indexed
    Keyspace
    Top level namespace
    Usually one per application
  • 24. Column: Cassandra [cont’d]
    Eric Evans
  • 25. Column: Cassandra [cont’d]
    Column Details
    Name
    byte[]
    Queried against
    Determines sort order
    Value
    byte[]
    Opaque to Cassandra
    Timestamp
    long
    Conflict resolution (last write wins)
  • 26. Graph Databases
    Inspired by Euler Graph Theory, G=(E,V)
    Focused on modeling the structure of the data
    Property Graph Data Model
    Examples: Neo4j, InfiniteGraph
  • 27. Sample Property Graph[]
    Todd Hoff
  • 28. Graph: Neo4j
    Data Model: Property Graph
    Nodes – Person, Place, Thing, etc.
    Relationships – Lives, Likes, Owns, etc.
    Properties on Both
    Primary operation is graph traversal between nodes
    Written in Java
    Embedded database
  • 29. Graph: Neo4j [cont’d]
    Disk-based
    Graph stored in custom binary format
    Transactional
    JTA/JTS, XA, 2PC, MVCC
    Scales
    Billions of nodes/relationships/properties per JVM
    Robust
    6+ years in 24/7 production
  • 30. Graph: Neo4j [cont’d]
    Multiple language binds
    Jython, Cpython
    Jruby (including RESTful API)
    Clojure
    Scala (including RESTful API)
    Uses
    Social Graph i.e. Facebook
    Recommendation Engines
    Financial Audit
  • 31. Graph: Neo4j [cont’d]
    Licensed under AGPLv3
    Dual Commercial License Available
    First server is free
    Second server Inexpensive
    Commercial support provided by Neo Technologies
  • 32. Other Graph Databases
    Other graph databases
    InfiniteGraph
    HyperGraphDB
    sones
  • 33. Conclusion
  • 34. Thank You!
  • 35. References
    NoSQL Databases - Part 1 – Landscape, Vineet Guptahttp://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html
    NoSQL for Dummies, Tobias Ivarssonhttp://www.slideshare.net/thobe/nosql-for-dummies
    NoSQL Databases, Marin Dimitrovhttp://www.slideshare.net/marin_dimitrov/nosql-databases-3584443
    CouchDB vs. MongoDB, Gabriele Lanahttp://www.slideshare.net/gabriele.lana/couchdb-vs-mongodb-2982288
    Hbase, Ryan Rawsonhttp://www.slideshare.net/adorepump/hbase-nosql
    Introduction to Cassandra, Gary Dusbabekhttp://www.slideshare.net/gdusbabek/introduction-to-cassandra-june-2010
    Cassandra Explained, Eric Evanshttp://www.slideshare.net/jericevans/cassandra-explained
    Towards Robust Distributed Systems, Eric Brewerhttp://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
    Cassandra - A Decentralized Structured Storage System, Lakshman, Ladishttp://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
  • 36. References [cont’d]
    Bigtable: A Distributed Storage System for Structured Data, Google Inc.http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf
    Dynamo: Amazon’s Highly Available Key-value Store, Amazon Inc.http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
    HBase Architecture 101 – Storage, Lars Georgehttp://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
    BASE: An ACID Alternative, Dan Pritchett