No sql landscape_nosqltips

  • 1,966 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,966
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
57
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • NoSQL does not mean no SQL, or that it is against SQL or RDBMS data bases. NoSQL is better characterized as non-RDBMS data stores, but even that is not completely true.
  • NoSQL are very compatible and often used together. SQL usually takes the OLTP role while NoSQL slots in for special purposes.
  • Brewer's Theorem - Inktomi C onsistency A vailability P artition Tolerance You can have any 2 but not all 3 C & A in single node system Add P and you must choose between C and A
  • Membase is distributed (elastic) map CouchDb is document store Companies combined to form CouchBase
  • RDF = Resource Description Framework
  • RDF – Resource Description Framework Triplestore – Subject – Predicate – Object Predicate is relationship OWL – Web Ontology Language – semantic web

Transcript

  • 1. The NoSQL Landscape
    • Objective – Reasonable understanding of the non-relational or NoSQL data stores and how they relate to RDBMS databases we are all used to working with.
  • 2. About Me
    • Chief Architect – youwho.com
    • Former dot com CTO
    • NoSql advocate
    • nosqltips.blogspot.com
    • @nosqltips on twitter
  • 3. Agenda
    • What is NoSQL?
    • Landscape
    • Vocabulary and concepts
    • CAP Theorem
    • SQL vs NoSQL comparison
    • Overview of each type w/ examples
    • Question and Answer
  • 4.  
  • 5.  
  • 6.  
  • 7.  
  • 8.  
  • 9. Vocabulary
    • CAP Theorem – consistency, availability, partitioning
    • ACID – Atomic, Consistent, Isolated, Durable
    • BASE – Basically Available, Soft state, Eventually consistent
    • RDF – Resource Description Framework
    • Sharding – Partitioning, distributed
    • Web Scale – Google, Twitter, Facebook, etc
  • 10.  
  • 11. CAP Tuning
    • NRW
      • N: Number of Data Copies
      • R: Read Quorum
      • W: Write Quorum
    • Hard Consistency – RDBMS
    • Soft Consistency – No Guarantees
    • Eventual Consistency – Most NoSQL
  • 12. Cap Tuning Chart NRW Outcome N=3 Magic Number of Data Replicas W=N R=1 Read Optimized – Strong Consistency. W=1 R=N Write Optimized – Strong Consistency. W+R > N Strong Consistency on Read and Write. W+R <= N Weak Eventual Consistency. Read may not see the latest Data. N > W > 1 Eventual Consistency - Most NoSQL data stores live here.
  • 13. Eventual Consistency
    • All replicas have same data – eventually
    • Milliseconds to seconds
    • Not all applications are compatible
    • Various ways to ensure latest data
      • Vector Clocks, Read Repair, Gossiping
      • Application determines correct data
  • 14.  
  • 15. Comparison
    • SQL
    • Prefers big-box, self redundant
    • Keep things from breaking
    • Solidly in CA land
    • P is difficult and expensive
    • Query by SQL
    • Stored procedures
    • NoSQL
    • Prefers commodity hardware, distributed
    • Assume things break or are broken
    • Mostly AP, some tunable
    • P generally easy
    • Custom API, SQLish
    • Map/Reduce
  • 16. Comparison
    • SQL
    • ACID transactions
    • Advanced indexing
    • Foreign key support
    • Strong lock support
    • Schema centric
    • API – usually JPA or JDBC
    • Strong access control
    • NoSQL
    • BASE transactions
    • Key only to Advanced
    • Usually none
    • Usually none
    • Usually schema-less
    • Depends on implementation
    • Usually none
  • 17. Comparison
    • SQL
    • Complex disk store, random access
    • Easy for dev with JPA/Hibernate/SQL
    • Multi-platform
    • General purpose
    • Strong commercial support
    • Great tool support
    • NoSQL
    • Usually append only, 1 seek, 1 read
    • Puts more work on application dev
    • Favors Linux/Unix
    • More special purpose
    • Strong to no commercial support
    • Not so much
  • 18.  
  • 19. Column Stores
    • Data stored by column instead of row
    • Schema-less
    • Non-relational, data is de-normalized
    • Column format stores sparse data efficiently
    • Column families cannot change
    • 10,000+ columns by 100 million+ rows
    • Easy sharding (partitioning)
    • Usually not ACID compliant
  • 20. Column stores
    • BigTable – Google, 2006 paper
    • Hadoop/HBase – Part of Apache Hadoop
    • Cassandra – Facebook, LAN/WAN replication
    • Hypertable – Pluggable DFS, HQL
    • Vertica – Full SQL implementation
    • Amazon SimpleDB – Cloud store
  • 21. Document Stores
    • CAP tunable
    • Either key/value or bucket/key/value
    • Easy/Auto sharding - Consistent hashing
    • Usually ACID compliant
    • Not SQL compliant, maybe custom query
    • Easy implementation via map or custom api
  • 22. Document stores
    • Amazon – Dynamo and S3 (cloud based)
    • Riak – CAP tunable, built in map/reduce
    • CouchDB – ACID, REST api
    • MongoDB – Indexing, query support
    • Voldemort – Java, pluggable serialization
    • MySQL – Key access, denormalize schema, kill indexes
  • 23. Memory Stores
    • Mostly in the CA realm
    • P can be tough depending on implementation
    • Some are distributed, some local only
    • Usually key-value stores
    • Many are disk backed, append only files
    • Designed for very high-speed access
  • 24. Memory stores
    • CouchBase – Membase + CouchDb
    • Memcached – Local map
    • Coherence – Commercial Oracle, distributed
    • Redis – Supports hash, list, set, and sorted set, data structure server
    • Tokyo/Kyoto Cabinet – disk backed map
    • Infinispan – JSR-107 jcache impl
    • Scalaris – Erlang, strong consistency
  • 25. Graph/Triple Store
    • Model relationships well, bi-directional
    • Node/edges – edges can be weighted or not
    • RDF Triple – subject -> predicate -> object, w3c standard for semantic web
    • Many implement SPARQL, object api
    • Sharding can difficult because of graph nature
    • Schema-less – nodes, edges, properties
    • Fast set operations
  • 26. Graph/Triple Stores
    • Neo4j – ACID transactions, object API
    • Alegrograph – Reference impl of SPARQL
    • Bigdata – dynamic sharding
    • Trinity – Microsoft research
    • Infinite Graph – Distributed, cross-platform
    • FlockDb – Twitter, fast set operations
    • Infogrid – Object based, REST api
  • 27. Interesting Integrations
    • Lucene - Document Store with Search as Query Language
    • SOLR and Elastic Search – Scalable Lucene
    • Riak Search – Elang impl of Lucene APIs
    • Solandra – Lucene on Cassandra backend
    • Couchdb-lucene – Integration
    • DistributedLucene – Lucene on Hadoop
    • Neo4j – Full Text Search on Graph Store
  • 28. Worth Mentioning
    • Configuration Dbs – ZooKeeper, Doozer
      • Distributed configuration, locks, synchronization
      • Used to make other apps scalable
    • XML Dbs – eXist, BaseX, Xindice
      • XML only, Xquery, Xpath, ACID, GUI support
      • non-distributed
  • 29.  
  • 30.  
  • 31. Case Study - HBase
    • Apache – part of Hadoop/HDFS
    • Requires ZooKeeper
    • Java based
    • Runs well on Amazon EC2
    • Excellent language support
    • Supports REST interface
  • 32. HBase continued
    • Map/Reduce via Hadoop
    • Schema-less, column families fixed
    • Nearly unlimited columns and rows
    • HBQL – partial sql + JDBC support
    • Some ACID support, atomicity, durability
    • Integration with Hive for data warehousing, ad-hoc query support - HiveQL
  • 33. Case Study - Riak
    • Data Model – Bucket/Key/Value
    • Value has MIME type, byte[]
    • Value supports one-way Links, basic graph
    • Erlang, Protocol Buffers, REST interfaces
    • Pre/Post Commit Hooks
    • CAP Tunable per bucket
    • Map/Reduce – Erlang and Javascript
  • 34. Riak Continued
    • Vector Clocks
    • Read repair for R < N
    • Peer-to-Peer, Nothing Shared Architecture
    • Replication across data centers
    • Pluggable storage
    • API for Most Languages + REST
    • Commercial Support
  • 35. Case Study - Redis
    • Supports hash, list, set, and sorted set
    • Fast set operations
    • Atomic updates
    • Everything stored in memory
    • Persistence to disk – periodic save, append only file, can be compacted
    • Good API support, JDBC subset driver
  • 36. Redis Continued
    • Master – slave replication, read scalability, redundancy, slave can sync to disk
    • Can swap out values, keys must be in memory
    • Can be used as pub/sub messaging system
    • Can send multiple commands in single request
    • Built to be extremely fast
    • Supports very high speed atomic counters
  • 37. Case Study - Neo4j
    • Java based – cross platform
    • ACID transactions
    • Durable persistence
    • Handle billions of nodes/edges single machine
    • Supports bulk data loading
    • Good language support
  • 38. Neo4j Continued
    • Spatial index support
    • RDF triples/OWL/SPARQL support
    • Replication and HA – commercial version
    • Object oriented API
    • Sharding at client level
    • Dual open source and commercial license
  • 39. Resources
    • fallabs.com/tokyocabinet
    • fallabs.com/kyotocabinet
    • redis.io
    • www.membase.org
    • neo4j.org
    • en.wikipedia.org/wiki/Triplestore
    • en.wikipedia.org/wiki/Graph_theory
    • research.microsoft.com/en-us/projects/trinity
  • 40. Resources
    • www.jboss.org/infinispan
    • basho.com
    • nosqlpedia.com/wiki/Consistency_models_in_nonrelational_dbs
    • www.hypertable.org
    • project-voldemort.com
    • www.allthingsdistributed.com/2007/10/amazons_dynamo.html
  • 41. Resources
    • nosql-database.org
    • couchdb.apache.org
    • engineering.twitter.com/2010/05/introducing-flockdb.html
    • infinitegraph.com
    • nosql-database.org
    • http://www.w3.org/TR/rdf-concepts/