No sql landscape_nosqltips

The NoSQL Landscape Objective – Reasonable understanding of the non-relational or NoSQL data stores and how they relate to RDBMS databases we are all used to working with.

About Me Chief Architect – youwho.com Former dot com CTO NoSql advocate nosqltips.blogspot.com @nosqltips on twitter

Agenda What is NoSQL? Landscape Vocabulary and concepts CAP Theorem SQL vs NoSQL comparison Overview of each type w/ examples Question and Answer

Vocabulary CAP Theorem – consistency, availability, partitioning ACID – Atomic, Consistent, Isolated, Durable BASE – Basically Available, Soft state, Eventually consistent RDF – Resource Description Framework Sharding – Partitioning, distributed Web Scale – Google, Twitter, Facebook, etc

CAP Tuning NRW N: Number of Data Copies R: Read Quorum W: Write Quorum Hard Consistency – RDBMS Soft Consistency – No Guarantees Eventual Consistency – Most NoSQL

Cap Tuning Chart NRW Outcome N=3 Magic Number of Data Replicas W=N R=1 Read Optimized – Strong Consistency. W=1 R=N Write Optimized – Strong Consistency. W+R > N Strong Consistency on Read and Write. W+R <= N Weak Eventual Consistency. Read may not see the latest Data. N > W > 1 Eventual Consistency - Most NoSQL data stores live here.

Eventual Consistency All replicas have same data – eventually Milliseconds to seconds Not all applications are compatible Various ways to ensure latest data Vector Clocks, Read Repair, Gossiping Application determines correct data

Comparison SQL Prefers big-box, self redundant Keep things from breaking Solidly in CA land P is difficult and expensive Query by SQL Stored procedures NoSQL Prefers commodity hardware, distributed Assume things break or are broken Mostly AP, some tunable P generally easy Custom API, SQLish Map/Reduce

Comparison SQL ACID transactions Advanced indexing Foreign key support Strong lock support Schema centric API – usually JPA or JDBC Strong access control NoSQL BASE transactions Key only to Advanced Usually none Usually none Usually schema-less Depends on implementation Usually none

Comparison SQL Complex disk store, random access Easy for dev with JPA/Hibernate/SQL Multi-platform General purpose Strong commercial support Great tool support NoSQL Usually append only, 1 seek, 1 read Puts more work on application dev Favors Linux/Unix More special purpose Strong to no commercial support Not so much

Column Stores Data stored by column instead of row Schema-less Non-relational, data is de-normalized Column format stores sparse data efficiently Column families cannot change 10,000+ columns by 100 million+ rows Easy sharding (partitioning) Usually not ACID compliant

Column stores BigTable – Google, 2006 paper Hadoop/HBase – Part of Apache Hadoop Cassandra – Facebook, LAN/WAN replication Hypertable – Pluggable DFS, HQL Vertica – Full SQL implementation Amazon SimpleDB – Cloud store

Document Stores CAP tunable Either key/value or bucket/key/value Easy/Auto sharding - Consistent hashing Usually ACID compliant Not SQL compliant, maybe custom query Easy implementation via map or custom api

Document stores Amazon – Dynamo and S3 (cloud based) Riak – CAP tunable, built in map/reduce CouchDB – ACID, REST api MongoDB – Indexing, query support Voldemort – Java, pluggable serialization MySQL – Key access, denormalize schema, kill indexes

Memory Stores Mostly in the CA realm P can be tough depending on implementation Some are distributed, some local only Usually key-value stores Many are disk backed, append only files Designed for very high-speed access

Memory stores CouchBase – Membase + CouchDb Memcached – Local map Coherence – Commercial Oracle, distributed Redis – Supports hash, list, set, and sorted set, data structure server Tokyo/Kyoto Cabinet – disk backed map Infinispan – JSR-107 jcache impl Scalaris – Erlang, strong consistency

Graph/Triple Store Model relationships well, bi-directional Node/edges – edges can be weighted or not RDF Triple – subject -> predicate -> object, w3c standard for semantic web Many implement SPARQL, object api Sharding can difficult because of graph nature Schema-less – nodes, edges, properties Fast set operations

Graph/Triple Stores Neo4j – ACID transactions, object API Alegrograph – Reference impl of SPARQL Bigdata – dynamic sharding Trinity – Microsoft research Infinite Graph – Distributed, cross-platform FlockDb – Twitter, fast set operations Infogrid – Object based, REST api

Interesting Integrations Lucene - Document Store with Search as Query Language SOLR and Elastic Search – Scalable Lucene Riak Search – Elang impl of Lucene APIs Solandra – Lucene on Cassandra backend Couchdb-lucene – Integration DistributedLucene – Lucene on Hadoop Neo4j – Full Text Search on Graph Store

Worth Mentioning Configuration Dbs – ZooKeeper, Doozer Distributed configuration, locks, synchronization Used to make other apps scalable XML Dbs – eXist, BaseX, Xindice XML only, Xquery, Xpath, ACID, GUI support non-distributed

Case Study - HBase Apache – part of Hadoop/HDFS Requires ZooKeeper Java based Runs well on Amazon EC2 Excellent language support Supports REST interface

HBase continued Map/Reduce via Hadoop Schema-less, column families fixed Nearly unlimited columns and rows HBQL – partial sql + JDBC support Some ACID support, atomicity, durability Integration with Hive for data warehousing, ad-hoc query support - HiveQL

Case Study - Riak Data Model – Bucket/Key/Value Value has MIME type, byte[] Value supports one-way Links, basic graph Erlang, Protocol Buffers, REST interfaces Pre/Post Commit Hooks CAP Tunable per bucket Map/Reduce – Erlang and Javascript

Riak Continued Vector Clocks Read repair for R < N Peer-to-Peer, Nothing Shared Architecture Replication across data centers Pluggable storage API for Most Languages + REST Commercial Support

Case Study - Redis Supports hash, list, set, and sorted set Fast set operations Atomic updates Everything stored in memory Persistence to disk – periodic save, append only file, can be compacted Good API support, JDBC subset driver

Redis Continued Master – slave replication, read scalability, redundancy, slave can sync to disk Can swap out values, keys must be in memory Can be used as pub/sub messaging system Can send multiple commands in single request Built to be extremely fast Supports very high speed atomic counters

Case Study - Neo4j Java based – cross platform ACID transactions Durable persistence Handle billions of nodes/edges single machine Supports bulk data loading Good language support

Neo4j Continued Spatial index support RDF triples/OWL/SPARQL support Replication and HA – commercial version Object oriented API Sharding at client level Dual open source and commercial license

Resources fallabs.com/tokyocabinet fallabs.com/kyotocabinet redis.io www.membase.org neo4j.org en.wikipedia.org/wiki/Triplestore en.wikipedia.org/wiki/Graph_theory research.microsoft.com/en-us/projects/trinity

Resources www.jboss.org/infinispan basho.com nosqlpedia.com/wiki/Consistency_models_in_nonrelational_dbs www.hypertable.org project-voldemort.com www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Resources nosql-database.org couchdb.apache.org engineering.twitter.com/2010/05/introducing-flockdb.html infinitegraph.com nosql-database.org http://www.w3.org/TR/rdf-concepts/

No sql landscape_nosqltips

More Related Content

What's hot

Similar to No sql landscape_nosqltips

Recently uploaded

No sql landscape_nosqltips

Editor's Notes