No sql landscape_nosqltips


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • NoSQL does not mean no SQL, or that it is against SQL or RDBMS data bases. NoSQL is better characterized as non-RDBMS data stores, but even that is not completely true.
  • NoSQL are very compatible and often used together. SQL usually takes the OLTP role while NoSQL slots in for special purposes.
  • Brewer's Theorem - Inktomi C onsistency A vailability P artition Tolerance You can have any 2 but not all 3 C & A in single node system Add P and you must choose between C and A
  • Membase is distributed (elastic) map CouchDb is document store Companies combined to form CouchBase
  • RDF = Resource Description Framework
  • RDF – Resource Description Framework Triplestore – Subject – Predicate – Object Predicate is relationship OWL – Web Ontology Language – semantic web
  • No sql landscape_nosqltips

    1. 1. The NoSQL Landscape <ul><li>Objective – Reasonable understanding of the non-relational or NoSQL data stores and how they relate to RDBMS databases we are all used to working with. </li></ul>
    2. 2. About Me <ul><li>Chief Architect – </li></ul><ul><li>Former dot com CTO </li></ul><ul><li>NoSql advocate </li></ul><ul><li> </li></ul><ul><li>@nosqltips on twitter </li></ul>
    3. 3. Agenda <ul><li>What is NoSQL? </li></ul><ul><li>Landscape </li></ul><ul><li>Vocabulary and concepts </li></ul><ul><li>CAP Theorem </li></ul><ul><li>SQL vs NoSQL comparison </li></ul><ul><li>Overview of each type w/ examples </li></ul><ul><li>Question and Answer </li></ul>
    4. 9. Vocabulary <ul><li>CAP Theorem – consistency, availability, partitioning </li></ul><ul><li>ACID – Atomic, Consistent, Isolated, Durable </li></ul><ul><li>BASE – Basically Available, Soft state, Eventually consistent </li></ul><ul><li>RDF – Resource Description Framework </li></ul><ul><li>Sharding – Partitioning, distributed </li></ul><ul><li>Web Scale – Google, Twitter, Facebook, etc </li></ul>
    5. 11. CAP Tuning <ul><li>NRW </li></ul><ul><ul><li>N: Number of Data Copies </li></ul></ul><ul><ul><li>R: Read Quorum </li></ul></ul><ul><ul><li>W: Write Quorum </li></ul></ul><ul><li>Hard Consistency – RDBMS </li></ul><ul><li>Soft Consistency – No Guarantees </li></ul><ul><li>Eventual Consistency – Most NoSQL </li></ul>
    6. 12. Cap Tuning Chart NRW Outcome N=3 Magic Number of Data Replicas W=N R=1 Read Optimized – Strong Consistency. W=1 R=N Write Optimized – Strong Consistency. W+R > N Strong Consistency on Read and Write. W+R <= N Weak Eventual Consistency. Read may not see the latest Data. N > W > 1 Eventual Consistency - Most NoSQL data stores live here.
    7. 13. Eventual Consistency <ul><li>All replicas have same data – eventually </li></ul><ul><li>Milliseconds to seconds </li></ul><ul><li>Not all applications are compatible </li></ul><ul><li>Various ways to ensure latest data </li></ul><ul><ul><li>Vector Clocks, Read Repair, Gossiping </li></ul></ul><ul><ul><li>Application determines correct data </li></ul></ul>
    8. 15. Comparison <ul><li>SQL </li></ul><ul><li>Prefers big-box, self redundant </li></ul><ul><li>Keep things from breaking </li></ul><ul><li>Solidly in CA land </li></ul><ul><li>P is difficult and expensive </li></ul><ul><li>Query by SQL </li></ul><ul><li>Stored procedures </li></ul><ul><li>NoSQL </li></ul><ul><li>Prefers commodity hardware, distributed </li></ul><ul><li>Assume things break or are broken </li></ul><ul><li>Mostly AP, some tunable </li></ul><ul><li>P generally easy </li></ul><ul><li>Custom API, SQLish </li></ul><ul><li>Map/Reduce </li></ul>
    9. 16. Comparison <ul><li>SQL </li></ul><ul><li>ACID transactions </li></ul><ul><li>Advanced indexing </li></ul><ul><li>Foreign key support </li></ul><ul><li>Strong lock support </li></ul><ul><li>Schema centric </li></ul><ul><li>API – usually JPA or JDBC </li></ul><ul><li>Strong access control </li></ul><ul><li>NoSQL </li></ul><ul><li>BASE transactions </li></ul><ul><li>Key only to Advanced </li></ul><ul><li>Usually none </li></ul><ul><li>Usually none </li></ul><ul><li>Usually schema-less </li></ul><ul><li>Depends on implementation </li></ul><ul><li>Usually none </li></ul>
    10. 17. Comparison <ul><li>SQL </li></ul><ul><li>Complex disk store, random access </li></ul><ul><li>Easy for dev with JPA/Hibernate/SQL </li></ul><ul><li>Multi-platform </li></ul><ul><li>General purpose </li></ul><ul><li>Strong commercial support </li></ul><ul><li>Great tool support </li></ul><ul><li>NoSQL </li></ul><ul><li>Usually append only, 1 seek, 1 read </li></ul><ul><li>Puts more work on application dev </li></ul><ul><li>Favors Linux/Unix </li></ul><ul><li>More special purpose </li></ul><ul><li>Strong to no commercial support </li></ul><ul><li>Not so much </li></ul>
    11. 19. Column Stores <ul><li>Data stored by column instead of row </li></ul><ul><li>Schema-less </li></ul><ul><li>Non-relational, data is de-normalized </li></ul><ul><li>Column format stores sparse data efficiently </li></ul><ul><li>Column families cannot change </li></ul><ul><li>10,000+ columns by 100 million+ rows </li></ul><ul><li>Easy sharding (partitioning) </li></ul><ul><li>Usually not ACID compliant </li></ul>
    12. 20. Column stores <ul><li>BigTable – Google, 2006 paper </li></ul><ul><li>Hadoop/HBase – Part of Apache Hadoop </li></ul><ul><li>Cassandra – Facebook, LAN/WAN replication </li></ul><ul><li>Hypertable – Pluggable DFS, HQL </li></ul><ul><li>Vertica – Full SQL implementation </li></ul><ul><li>Amazon SimpleDB – Cloud store </li></ul>
    13. 21. Document Stores <ul><li>CAP tunable </li></ul><ul><li>Either key/value or bucket/key/value </li></ul><ul><li>Easy/Auto sharding - Consistent hashing </li></ul><ul><li>Usually ACID compliant </li></ul><ul><li>Not SQL compliant, maybe custom query </li></ul><ul><li>Easy implementation via map or custom api </li></ul>
    14. 22. Document stores <ul><li>Amazon – Dynamo and S3 (cloud based) </li></ul><ul><li>Riak – CAP tunable, built in map/reduce </li></ul><ul><li>CouchDB – ACID, REST api </li></ul><ul><li>MongoDB – Indexing, query support </li></ul><ul><li>Voldemort – Java, pluggable serialization </li></ul><ul><li>MySQL – Key access, denormalize schema, kill indexes </li></ul>
    15. 23. Memory Stores <ul><li>Mostly in the CA realm </li></ul><ul><li>P can be tough depending on implementation </li></ul><ul><li>Some are distributed, some local only </li></ul><ul><li>Usually key-value stores </li></ul><ul><li>Many are disk backed, append only files </li></ul><ul><li>Designed for very high-speed access </li></ul>
    16. 24. Memory stores <ul><li>CouchBase – Membase + CouchDb </li></ul><ul><li>Memcached – Local map </li></ul><ul><li>Coherence – Commercial Oracle, distributed </li></ul><ul><li>Redis – Supports hash, list, set, and sorted set, data structure server </li></ul><ul><li>Tokyo/Kyoto Cabinet – disk backed map </li></ul><ul><li>Infinispan – JSR-107 jcache impl </li></ul><ul><li>Scalaris – Erlang, strong consistency </li></ul>
    17. 25. Graph/Triple Store <ul><li>Model relationships well, bi-directional </li></ul><ul><li>Node/edges – edges can be weighted or not </li></ul><ul><li>RDF Triple – subject -> predicate -> object, w3c standard for semantic web </li></ul><ul><li>Many implement SPARQL, object api </li></ul><ul><li>Sharding can difficult because of graph nature </li></ul><ul><li>Schema-less – nodes, edges, properties </li></ul><ul><li>Fast set operations </li></ul>
    18. 26. Graph/Triple Stores <ul><li>Neo4j – ACID transactions, object API </li></ul><ul><li>Alegrograph – Reference impl of SPARQL </li></ul><ul><li>Bigdata – dynamic sharding </li></ul><ul><li>Trinity – Microsoft research </li></ul><ul><li>Infinite Graph – Distributed, cross-platform </li></ul><ul><li>FlockDb – Twitter, fast set operations </li></ul><ul><li>Infogrid – Object based, REST api </li></ul>
    19. 27. Interesting Integrations <ul><li>Lucene - Document Store with Search as Query Language </li></ul><ul><li>SOLR and Elastic Search – Scalable Lucene </li></ul><ul><li>Riak Search – Elang impl of Lucene APIs </li></ul><ul><li>Solandra – Lucene on Cassandra backend </li></ul><ul><li>Couchdb-lucene – Integration </li></ul><ul><li>DistributedLucene – Lucene on Hadoop </li></ul><ul><li>Neo4j – Full Text Search on Graph Store </li></ul>
    20. 28. Worth Mentioning <ul><li>Configuration Dbs – ZooKeeper, Doozer </li></ul><ul><ul><li>Distributed configuration, locks, synchronization </li></ul></ul><ul><ul><li>Used to make other apps scalable </li></ul></ul><ul><li>XML Dbs – eXist, BaseX, Xindice </li></ul><ul><ul><li>XML only, Xquery, Xpath, ACID, GUI support </li></ul></ul><ul><ul><li>non-distributed </li></ul></ul>
    21. 31. Case Study - HBase <ul><li>Apache – part of Hadoop/HDFS </li></ul><ul><li>Requires ZooKeeper </li></ul><ul><li>Java based </li></ul><ul><li>Runs well on Amazon EC2 </li></ul><ul><li>Excellent language support </li></ul><ul><li>Supports REST interface </li></ul>
    22. 32. HBase continued <ul><li>Map/Reduce via Hadoop </li></ul><ul><li>Schema-less, column families fixed </li></ul><ul><li>Nearly unlimited columns and rows </li></ul><ul><li>HBQL – partial sql + JDBC support </li></ul><ul><li>Some ACID support, atomicity, durability </li></ul><ul><li>Integration with Hive for data warehousing, ad-hoc query support - HiveQL </li></ul>
    23. 33. Case Study - Riak <ul><li>Data Model – Bucket/Key/Value </li></ul><ul><li>Value has MIME type, byte[] </li></ul><ul><li>Value supports one-way Links, basic graph </li></ul><ul><li>Erlang, Protocol Buffers, REST interfaces </li></ul><ul><li>Pre/Post Commit Hooks </li></ul><ul><li>CAP Tunable per bucket </li></ul><ul><li>Map/Reduce – Erlang and Javascript </li></ul>
    24. 34. Riak Continued <ul><li>Vector Clocks </li></ul><ul><li>Read repair for R < N </li></ul><ul><li>Peer-to-Peer, Nothing Shared Architecture </li></ul><ul><li>Replication across data centers </li></ul><ul><li>Pluggable storage </li></ul><ul><li>API for Most Languages + REST </li></ul><ul><li>Commercial Support </li></ul>
    25. 35. Case Study - Redis <ul><li>Supports hash, list, set, and sorted set </li></ul><ul><li>Fast set operations </li></ul><ul><li>Atomic updates </li></ul><ul><li>Everything stored in memory </li></ul><ul><li>Persistence to disk – periodic save, append only file, can be compacted </li></ul><ul><li>Good API support, JDBC subset driver </li></ul>
    26. 36. Redis Continued <ul><li>Master – slave replication, read scalability, redundancy, slave can sync to disk </li></ul><ul><li>Can swap out values, keys must be in memory </li></ul><ul><li>Can be used as pub/sub messaging system </li></ul><ul><li>Can send multiple commands in single request </li></ul><ul><li>Built to be extremely fast </li></ul><ul><li>Supports very high speed atomic counters </li></ul>
    27. 37. Case Study - Neo4j <ul><li>Java based – cross platform </li></ul><ul><li>ACID transactions </li></ul><ul><li>Durable persistence </li></ul><ul><li>Handle billions of nodes/edges single machine </li></ul><ul><li>Supports bulk data loading </li></ul><ul><li>Good language support </li></ul>
    28. 38. Neo4j Continued <ul><li>Spatial index support </li></ul><ul><li>RDF triples/OWL/SPARQL support </li></ul><ul><li>Replication and HA – commercial version </li></ul><ul><li>Object oriented API </li></ul><ul><li>Sharding at client level </li></ul><ul><li>Dual open source and commercial license </li></ul>
    29. 39. Resources <ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul>
    30. 40. Resources <ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul>
    31. 41. Resources <ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul>