No sql landscape_nosqltips
Upcoming SlideShare
Loading in...5
×
 

No sql landscape_nosqltips

on

  • 2,556 views

 

Statistics

Views

Total Views
2,556
Views on SlideShare
2,364
Embed Views
192

Actions

Likes
1
Downloads
56
Comments
0

4 Embeds 192

http://nosqltips.blogspot.com 144
http://blog.nosqltips.com 46
http://www.nosqltips.blogspot.com 1
http://nosqltips.blogspot.in 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • NoSQL does not mean no SQL, or that it is against SQL or RDBMS data bases. NoSQL is better characterized as non-RDBMS data stores, but even that is not completely true.
  • NoSQL are very compatible and often used together. SQL usually takes the OLTP role while NoSQL slots in for special purposes.
  • Brewer's Theorem - Inktomi C onsistency A vailability P artition Tolerance You can have any 2 but not all 3 C & A in single node system Add P and you must choose between C and A
  • Membase is distributed (elastic) map CouchDb is document store Companies combined to form CouchBase
  • RDF = Resource Description Framework
  • RDF – Resource Description Framework Triplestore – Subject – Predicate – Object Predicate is relationship OWL – Web Ontology Language – semantic web

No sql landscape_nosqltips No sql landscape_nosqltips Presentation Transcript

  • The NoSQL Landscape
    • Objective – Reasonable understanding of the non-relational or NoSQL data stores and how they relate to RDBMS databases we are all used to working with.
  • About Me
    • Chief Architect – youwho.com
    • Former dot com CTO
    • NoSql advocate
    • nosqltips.blogspot.com
    • @nosqltips on twitter
  • Agenda
    • What is NoSQL?
    • Landscape
    • Vocabulary and concepts
    • CAP Theorem
    • SQL vs NoSQL comparison
    • Overview of each type w/ examples
    • Question and Answer
  •  
  •  
  •  
  •  
  •  
  • Vocabulary
    • CAP Theorem – consistency, availability, partitioning
    • ACID – Atomic, Consistent, Isolated, Durable
    • BASE – Basically Available, Soft state, Eventually consistent
    • RDF – Resource Description Framework
    • Sharding – Partitioning, distributed
    • Web Scale – Google, Twitter, Facebook, etc
  •  
  • CAP Tuning
    • NRW
      • N: Number of Data Copies
      • R: Read Quorum
      • W: Write Quorum
    • Hard Consistency – RDBMS
    • Soft Consistency – No Guarantees
    • Eventual Consistency – Most NoSQL
  • Cap Tuning Chart NRW Outcome N=3 Magic Number of Data Replicas W=N R=1 Read Optimized – Strong Consistency. W=1 R=N Write Optimized – Strong Consistency. W+R > N Strong Consistency on Read and Write. W+R <= N Weak Eventual Consistency. Read may not see the latest Data. N > W > 1 Eventual Consistency - Most NoSQL data stores live here.
  • Eventual Consistency
    • All replicas have same data – eventually
    • Milliseconds to seconds
    • Not all applications are compatible
    • Various ways to ensure latest data
      • Vector Clocks, Read Repair, Gossiping
      • Application determines correct data
  •  
  • Comparison
    • SQL
    • Prefers big-box, self redundant
    • Keep things from breaking
    • Solidly in CA land
    • P is difficult and expensive
    • Query by SQL
    • Stored procedures
    • NoSQL
    • Prefers commodity hardware, distributed
    • Assume things break or are broken
    • Mostly AP, some tunable
    • P generally easy
    • Custom API, SQLish
    • Map/Reduce
  • Comparison
    • SQL
    • ACID transactions
    • Advanced indexing
    • Foreign key support
    • Strong lock support
    • Schema centric
    • API – usually JPA or JDBC
    • Strong access control
    • NoSQL
    • BASE transactions
    • Key only to Advanced
    • Usually none
    • Usually none
    • Usually schema-less
    • Depends on implementation
    • Usually none
  • Comparison
    • SQL
    • Complex disk store, random access
    • Easy for dev with JPA/Hibernate/SQL
    • Multi-platform
    • General purpose
    • Strong commercial support
    • Great tool support
    • NoSQL
    • Usually append only, 1 seek, 1 read
    • Puts more work on application dev
    • Favors Linux/Unix
    • More special purpose
    • Strong to no commercial support
    • Not so much
  •  
  • Column Stores
    • Data stored by column instead of row
    • Schema-less
    • Non-relational, data is de-normalized
    • Column format stores sparse data efficiently
    • Column families cannot change
    • 10,000+ columns by 100 million+ rows
    • Easy sharding (partitioning)
    • Usually not ACID compliant
  • Column stores
    • BigTable – Google, 2006 paper
    • Hadoop/HBase – Part of Apache Hadoop
    • Cassandra – Facebook, LAN/WAN replication
    • Hypertable – Pluggable DFS, HQL
    • Vertica – Full SQL implementation
    • Amazon SimpleDB – Cloud store
  • Document Stores
    • CAP tunable
    • Either key/value or bucket/key/value
    • Easy/Auto sharding - Consistent hashing
    • Usually ACID compliant
    • Not SQL compliant, maybe custom query
    • Easy implementation via map or custom api
  • Document stores
    • Amazon – Dynamo and S3 (cloud based)
    • Riak – CAP tunable, built in map/reduce
    • CouchDB – ACID, REST api
    • MongoDB – Indexing, query support
    • Voldemort – Java, pluggable serialization
    • MySQL – Key access, denormalize schema, kill indexes
  • Memory Stores
    • Mostly in the CA realm
    • P can be tough depending on implementation
    • Some are distributed, some local only
    • Usually key-value stores
    • Many are disk backed, append only files
    • Designed for very high-speed access
  • Memory stores
    • CouchBase – Membase + CouchDb
    • Memcached – Local map
    • Coherence – Commercial Oracle, distributed
    • Redis – Supports hash, list, set, and sorted set, data structure server
    • Tokyo/Kyoto Cabinet – disk backed map
    • Infinispan – JSR-107 jcache impl
    • Scalaris – Erlang, strong consistency
  • Graph/Triple Store
    • Model relationships well, bi-directional
    • Node/edges – edges can be weighted or not
    • RDF Triple – subject -> predicate -> object, w3c standard for semantic web
    • Many implement SPARQL, object api
    • Sharding can difficult because of graph nature
    • Schema-less – nodes, edges, properties
    • Fast set operations
  • Graph/Triple Stores
    • Neo4j – ACID transactions, object API
    • Alegrograph – Reference impl of SPARQL
    • Bigdata – dynamic sharding
    • Trinity – Microsoft research
    • Infinite Graph – Distributed, cross-platform
    • FlockDb – Twitter, fast set operations
    • Infogrid – Object based, REST api
  • Interesting Integrations
    • Lucene - Document Store with Search as Query Language
    • SOLR and Elastic Search – Scalable Lucene
    • Riak Search – Elang impl of Lucene APIs
    • Solandra – Lucene on Cassandra backend
    • Couchdb-lucene – Integration
    • DistributedLucene – Lucene on Hadoop
    • Neo4j – Full Text Search on Graph Store
  • Worth Mentioning
    • Configuration Dbs – ZooKeeper, Doozer
      • Distributed configuration, locks, synchronization
      • Used to make other apps scalable
    • XML Dbs – eXist, BaseX, Xindice
      • XML only, Xquery, Xpath, ACID, GUI support
      • non-distributed
  •  
  •  
  • Case Study - HBase
    • Apache – part of Hadoop/HDFS
    • Requires ZooKeeper
    • Java based
    • Runs well on Amazon EC2
    • Excellent language support
    • Supports REST interface
  • HBase continued
    • Map/Reduce via Hadoop
    • Schema-less, column families fixed
    • Nearly unlimited columns and rows
    • HBQL – partial sql + JDBC support
    • Some ACID support, atomicity, durability
    • Integration with Hive for data warehousing, ad-hoc query support - HiveQL
  • Case Study - Riak
    • Data Model – Bucket/Key/Value
    • Value has MIME type, byte[]
    • Value supports one-way Links, basic graph
    • Erlang, Protocol Buffers, REST interfaces
    • Pre/Post Commit Hooks
    • CAP Tunable per bucket
    • Map/Reduce – Erlang and Javascript
  • Riak Continued
    • Vector Clocks
    • Read repair for R < N
    • Peer-to-Peer, Nothing Shared Architecture
    • Replication across data centers
    • Pluggable storage
    • API for Most Languages + REST
    • Commercial Support
  • Case Study - Redis
    • Supports hash, list, set, and sorted set
    • Fast set operations
    • Atomic updates
    • Everything stored in memory
    • Persistence to disk – periodic save, append only file, can be compacted
    • Good API support, JDBC subset driver
  • Redis Continued
    • Master – slave replication, read scalability, redundancy, slave can sync to disk
    • Can swap out values, keys must be in memory
    • Can be used as pub/sub messaging system
    • Can send multiple commands in single request
    • Built to be extremely fast
    • Supports very high speed atomic counters
  • Case Study - Neo4j
    • Java based – cross platform
    • ACID transactions
    • Durable persistence
    • Handle billions of nodes/edges single machine
    • Supports bulk data loading
    • Good language support
  • Neo4j Continued
    • Spatial index support
    • RDF triples/OWL/SPARQL support
    • Replication and HA – commercial version
    • Object oriented API
    • Sharding at client level
    • Dual open source and commercial license
  • Resources
    • fallabs.com/tokyocabinet
    • fallabs.com/kyotocabinet
    • redis.io
    • www.membase.org
    • neo4j.org
    • en.wikipedia.org/wiki/Triplestore
    • en.wikipedia.org/wiki/Graph_theory
    • research.microsoft.com/en-us/projects/trinity
  • Resources
    • www.jboss.org/infinispan
    • basho.com
    • nosqlpedia.com/wiki/Consistency_models_in_nonrelational_dbs
    • www.hypertable.org
    • project-voldemort.com
    • www.allthingsdistributed.com/2007/10/amazons_dynamo.html
  • Resources
    • nosql-database.org
    • couchdb.apache.org
    • engineering.twitter.com/2010/05/introducing-flockdb.html
    • infinitegraph.com
    • nosql-database.org
    • http://www.w3.org/TR/rdf-concepts/