• Save
Introduction to NoSQL Databases
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Introduction to NoSQL Databases

on

  • 36,473 views

 

Statistics

Views

Total Views
36,473
Views on SlideShare
29,426
Embed Views
7,047

Actions

Likes
54
Downloads
0
Comments
2

20 Embeds 7,047

http://www.nosqldatabases.com 5488
http://www.skill-guru.com 1138
http://konnaissances.blogspot.com 171
http://www.aalizwel.com 126
http://www.scoop.it 34
http://konnaissances.blogspot.fr 33
http://aalizwel.com 30
http://translate.googleusercontent.com 8
http://translate.yandex.net 4
http://www.konnaissances.blogspot.com 3
http://www.techgig.com 2
http://webcache.googleusercontent.com 2
http://feedly.com 1
http://static.slidesharecdn.com 1
http://www.google.fr 1
http://konnaissances.blogspot.be 1
http://cache.baidu.com 1
http://www.bonweb.fr 1
http://konnaissances.blogspot.ca 1
https://www.linkedin.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • NoSQL Tutorial
    Are you sure you want to
    Your message goes here
    Processing…
  • nosql is uploded version of sql ............currently google,facebook,amegan,,,,,,,,,,are using ,,,b/c it is the largest data base,,more than previous,,,,,,,,,they also pay for this......
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Indexing types include, single-key, compound, unique, non-unique, and geospatial
  • Surveying the NoSQL Landscape, By Derek Stainer
  • Surveying the NoSQL Landscape, By Derek Stainer

Introduction to NoSQL Databases Presentation Transcript

  • 1. Introduction to
    NoSQL Databases
    San Diego NoSQL Meetup – Aug 2010
    By Derek Stainer
    http://nosqldatabases.com
  • 2. Agenda
    Introduction
    Objective
    Explore NoSQL Databases
    Conclusion
  • 3. Introduction
    UCSD Graduate in Computer Science
    Java Developer for 10 years
    Creator of http://nosqldatabases.com
    Curator of NoSQL information
  • 4. Objective
    Deeper dive into each type of NoSQL database
    Discuss 1-2 NoSQL databases in each family of databases
  • 5. NoSQL Taxonomy
    Key/Value
    Document
    Column
    Graph
    Others
    Geospatial
    File System
    Object
  • 6. Key/Value Databases
    Global collection of Key/Value pairs
    Inspired by Amazon’s Dynamo and Distributed Hashtables
    Designed to handle massive load
    Multiple Types
    In memory i.e. Memcache
    On Disk i.e. Redis, SimpleDB
    Eventually Consistent i.e. Dynamo, Voldemort
  • 7. Key/Value: Voldemort
    Created by LinkedIn, now open source
    Inspired by Amazon’s Dynamo
    Written in Java
    Pluggable Storage
    BerkeleyDB, In Memory, MySQL
    Pluggable Serialization
    JSON, Thrift, Protocol Buffers, etc.
    Cluster Rebalancing
  • 8. Key/Value: Voldemort
    Versioning, based on Vector Clocks
    Reconciliation occurs on reads.
    Partitioning and Replication based on Dynamo
    Consistent Hashing
    Virtual Nodes
    Gossip
  • 9. Other Key/Value Stores
    Other Key/Value Stores
    Amazon’s Dynamo
    Riak
    Redis
    Memcache
    SimpleDB
  • 10. Document Databases
    Similar to a Key/Value database but with a major difference, value is a document
    Inspired by Lotus Notes
    Flexible Schema
    Any number of fields can be added
    Documents stored in JSON or BSON formats
    Examples: CouchDB, MongoDB
  • 11. Sample Document
    {
    "day": [ 2010, 01, 23 ],
    "products": {
    "apple": { "price": 10 "quantity": 6 },
    "kiwi": { "price": 20 "quantity": 2 }
    },
    "checkout": 100
    }
  • 12. Document: CouchDB
    Development began ~ 2005 by Damien Katz former Lotus Notes Developer
    Couch – Cluster Of Unreliable Commodity Hardware
    Top level Apache Project
    Commercially supported by CouchIO
    Licensed under Apache License
    Written in Erlang
    Documents are stored in JSON
  • 13. Document: CouchDB [cont’d]
    B-Tree Storage Engine
    MVCC model, no locking
    No joins, primary key or foreign key (UUIDs are auto assigned)
    Built bi-directional replication
    Can even run offline, come back and sync back changes
    Custom persistent views using MapReduce
    REST API
  • 14. Document: MongoDB
    Development started in 2007
    Commercially supported and developed by 10Gen
    Stores documents using BSON
    Supports AdHoc queries
    Can query against embedded objects and arrays
    Support multiples types of indexing
  • 15. Document: MongoDB [cont’d]
    Officially supported drivers available for multiple languages
    C, C++, Java, Javascript, Perl, PHP, Python and Ruby
    Community supported drivers include:
    Scala, Node.js, Haskell, Erlang, Smalltalk
    Replication uses a master/slave model
    Scales horizontally via sharding
    Written C++
  • 16. Column Family Databases
    Each key is associated with multiple attributes (i.e. Columns)
    Hybrid row/column stores
    Inspired by Google BigTable
    Examples: HBase, Cassandra
  • 17. Column: HBase
    Based on Google’s BigTable
    Apache Project TLP
    Cloudera (certifications, EC2 AMI’s, etc.)
    Layered over HDFS (Hadoop Distributed File System)
    Input/Output for MapReduce Jobs
    APIs
    Thrift, REST
  • 18. Column: Hbase [cont’d]
    Automatic partitioning
    Automatic re-balancing/re-partitioning
    Fault tolerant
    HDFS
    Multiple Replicas
    Highly distributed
  • 19. Column: Hbase [cont’d]
    Lars George
  • 20. Column: Cassandra
    Created at Facebook for Inbox search
    Facebook -> Google Code -> ASF
    Commercial Support available from Riptano
    Features taken from both Dynamo and BigTable
    Dynamo – Consistent hashing, Partitioning, Replication
    Big Table – Column Familes, MemTables, SSTables
  • 21. Column: Cassandra [cont’d]
    Symmetric nodes
    No single point of failure
    Linearly scalable
    Ease of administration
    Flexible/Automated Provisioning
    Flexible Replica Replacement
    High Availability
    Eventual Consistency
    However, consistency is tuneable
  • 22. Column: Cassandra [cont’d]
    Partitioning
    Random
    Good distribution of data between nodes
    Range scans not possible
    Order Preserving
    Can lead to unbalanced nodes
    Range scans, Natural Order
    Custom
    Extremely fast reads/writes (low latency)
    Thrift API
  • 23. Column: Cassandra [cont’d]
    Column
    Basic unit of storage
    Column Family
    Collection of like records
    Record level atomicity
    Indexed
    Keyspace
    Top level namespace
    Usually one per application
  • 24. Column: Cassandra [cont’d]
    Eric Evans
  • 25. Column: Cassandra [cont’d]
    Column Details
    Name
    byte[]
    Queried against
    Determines sort order
    Value
    byte[]
    Opaque to Cassandra
    Timestamp
    long
    Conflict resolution (last write wins)
  • 26. Graph Databases
    Inspired by Euler Graph Theory, G=(E,V)
    Focused on modeling the structure of the data
    Property Graph Data Model
    Examples: Neo4j, InfiniteGraph
  • 27. Sample Property Graph[]
    Todd Hoff
  • 28. Graph: Neo4j
    Data Model: Property Graph
    Nodes – Person, Place, Thing, etc.
    Relationships – Lives, Likes, Owns, etc.
    Properties on Both
    Primary operation is graph traversal between nodes
    Written in Java
    Embedded database
  • 29. Graph: Neo4j [cont’d]
    Disk-based
    Graph stored in custom binary format
    Transactional
    JTA/JTS, XA, 2PC, MVCC
    Scales
    Billions of nodes/relationships/properties per JVM
    Robust
    6+ years in 24/7 production
  • 30. Graph: Neo4j [cont’d]
    Multiple language binds
    Jython, Cpython
    Jruby (including RESTful API)
    Clojure
    Scala (including RESTful API)
    Uses
    Social Graph i.e. Facebook
    Recommendation Engines
    Financial Audit
  • 31. Graph: Neo4j [cont’d]
    Licensed under AGPLv3
    Dual Commercial License Available
    First server is free
    Second server Inexpensive
    Commercial support provided by Neo Technologies
  • 32. Other Graph Databases
    Other graph databases
    InfiniteGraph
    HyperGraphDB
    sones
  • 33. Conclusion
  • 34. Thank You!
  • 35. References
    NoSQL Databases - Part 1 – Landscape, Vineet Guptahttp://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html
    NoSQL for Dummies, Tobias Ivarssonhttp://www.slideshare.net/thobe/nosql-for-dummies
    NoSQL Databases, Marin Dimitrovhttp://www.slideshare.net/marin_dimitrov/nosql-databases-3584443
    CouchDB vs. MongoDB, Gabriele Lanahttp://www.slideshare.net/gabriele.lana/couchdb-vs-mongodb-2982288
    Hbase, Ryan Rawsonhttp://www.slideshare.net/adorepump/hbase-nosql
    Introduction to Cassandra, Gary Dusbabekhttp://www.slideshare.net/gdusbabek/introduction-to-cassandra-june-2010
    Cassandra Explained, Eric Evanshttp://www.slideshare.net/jericevans/cassandra-explained
    Towards Robust Distributed Systems, Eric Brewerhttp://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
    Cassandra - A Decentralized Structured Storage System, Lakshman, Ladishttp://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
  • 36. References [cont’d]
    Bigtable: A Distributed Storage System for Structured Data, Google Inc.http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf
    Dynamo: Amazon’s Highly Available Key-value Store, Amazon Inc.http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
    HBase Architecture 101 – Storage, Lars Georgehttp://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
    BASE: An ACID Alternative, Dan Pritchett