Your SlideShare is downloading. ×
No sql landscape_nosqltips
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

No sql landscape_nosqltips

2,000

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,000
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
57
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • NoSQL does not mean no SQL, or that it is against SQL or RDBMS data bases. NoSQL is better characterized as non-RDBMS data stores, but even that is not completely true.
  • NoSQL are very compatible and often used together. SQL usually takes the OLTP role while NoSQL slots in for special purposes.
  • Brewer's Theorem - Inktomi C onsistency A vailability P artition Tolerance You can have any 2 but not all 3 C & A in single node system Add P and you must choose between C and A
  • Membase is distributed (elastic) map CouchDb is document store Companies combined to form CouchBase
  • RDF = Resource Description Framework
  • RDF – Resource Description Framework Triplestore – Subject – Predicate – Object Predicate is relationship OWL – Web Ontology Language – semantic web
  • Transcript

    • 1. The NoSQL Landscape
      • Objective – Reasonable understanding of the non-relational or NoSQL data stores and how they relate to RDBMS databases we are all used to working with.
    • 2. About Me
      • Chief Architect – youwho.com
      • Former dot com CTO
      • NoSql advocate
      • nosqltips.blogspot.com
      • @nosqltips on twitter
    • 3. Agenda
      • What is NoSQL?
      • Landscape
      • Vocabulary and concepts
      • CAP Theorem
      • SQL vs NoSQL comparison
      • Overview of each type w/ examples
      • Question and Answer
    • 4.  
    • 5.  
    • 6.  
    • 7.  
    • 8.  
    • 9. Vocabulary
      • CAP Theorem – consistency, availability, partitioning
      • ACID – Atomic, Consistent, Isolated, Durable
      • BASE – Basically Available, Soft state, Eventually consistent
      • RDF – Resource Description Framework
      • Sharding – Partitioning, distributed
      • Web Scale – Google, Twitter, Facebook, etc
    • 10.  
    • 11. CAP Tuning
      • NRW
        • N: Number of Data Copies
        • R: Read Quorum
        • W: Write Quorum
      • Hard Consistency – RDBMS
      • Soft Consistency – No Guarantees
      • Eventual Consistency – Most NoSQL
    • 12. Cap Tuning Chart NRW Outcome N=3 Magic Number of Data Replicas W=N R=1 Read Optimized – Strong Consistency. W=1 R=N Write Optimized – Strong Consistency. W+R > N Strong Consistency on Read and Write. W+R <= N Weak Eventual Consistency. Read may not see the latest Data. N > W > 1 Eventual Consistency - Most NoSQL data stores live here.
    • 13. Eventual Consistency
      • All replicas have same data – eventually
      • Milliseconds to seconds
      • Not all applications are compatible
      • Various ways to ensure latest data
        • Vector Clocks, Read Repair, Gossiping
        • Application determines correct data
    • 14.  
    • 15. Comparison
      • SQL
      • Prefers big-box, self redundant
      • Keep things from breaking
      • Solidly in CA land
      • P is difficult and expensive
      • Query by SQL
      • Stored procedures
      • NoSQL
      • Prefers commodity hardware, distributed
      • Assume things break or are broken
      • Mostly AP, some tunable
      • P generally easy
      • Custom API, SQLish
      • Map/Reduce
    • 16. Comparison
      • SQL
      • ACID transactions
      • Advanced indexing
      • Foreign key support
      • Strong lock support
      • Schema centric
      • API – usually JPA or JDBC
      • Strong access control
      • NoSQL
      • BASE transactions
      • Key only to Advanced
      • Usually none
      • Usually none
      • Usually schema-less
      • Depends on implementation
      • Usually none
    • 17. Comparison
      • SQL
      • Complex disk store, random access
      • Easy for dev with JPA/Hibernate/SQL
      • Multi-platform
      • General purpose
      • Strong commercial support
      • Great tool support
      • NoSQL
      • Usually append only, 1 seek, 1 read
      • Puts more work on application dev
      • Favors Linux/Unix
      • More special purpose
      • Strong to no commercial support
      • Not so much
    • 18.  
    • 19. Column Stores
      • Data stored by column instead of row
      • Schema-less
      • Non-relational, data is de-normalized
      • Column format stores sparse data efficiently
      • Column families cannot change
      • 10,000+ columns by 100 million+ rows
      • Easy sharding (partitioning)
      • Usually not ACID compliant
    • 20. Column stores
      • BigTable – Google, 2006 paper
      • Hadoop/HBase – Part of Apache Hadoop
      • Cassandra – Facebook, LAN/WAN replication
      • Hypertable – Pluggable DFS, HQL
      • Vertica – Full SQL implementation
      • Amazon SimpleDB – Cloud store
    • 21. Document Stores
      • CAP tunable
      • Either key/value or bucket/key/value
      • Easy/Auto sharding - Consistent hashing
      • Usually ACID compliant
      • Not SQL compliant, maybe custom query
      • Easy implementation via map or custom api
    • 22. Document stores
      • Amazon – Dynamo and S3 (cloud based)
      • Riak – CAP tunable, built in map/reduce
      • CouchDB – ACID, REST api
      • MongoDB – Indexing, query support
      • Voldemort – Java, pluggable serialization
      • MySQL – Key access, denormalize schema, kill indexes
    • 23. Memory Stores
      • Mostly in the CA realm
      • P can be tough depending on implementation
      • Some are distributed, some local only
      • Usually key-value stores
      • Many are disk backed, append only files
      • Designed for very high-speed access
    • 24. Memory stores
      • CouchBase – Membase + CouchDb
      • Memcached – Local map
      • Coherence – Commercial Oracle, distributed
      • Redis – Supports hash, list, set, and sorted set, data structure server
      • Tokyo/Kyoto Cabinet – disk backed map
      • Infinispan – JSR-107 jcache impl
      • Scalaris – Erlang, strong consistency
    • 25. Graph/Triple Store
      • Model relationships well, bi-directional
      • Node/edges – edges can be weighted or not
      • RDF Triple – subject -> predicate -> object, w3c standard for semantic web
      • Many implement SPARQL, object api
      • Sharding can difficult because of graph nature
      • Schema-less – nodes, edges, properties
      • Fast set operations
    • 26. Graph/Triple Stores
      • Neo4j – ACID transactions, object API
      • Alegrograph – Reference impl of SPARQL
      • Bigdata – dynamic sharding
      • Trinity – Microsoft research
      • Infinite Graph – Distributed, cross-platform
      • FlockDb – Twitter, fast set operations
      • Infogrid – Object based, REST api
    • 27. Interesting Integrations
      • Lucene - Document Store with Search as Query Language
      • SOLR and Elastic Search – Scalable Lucene
      • Riak Search – Elang impl of Lucene APIs
      • Solandra – Lucene on Cassandra backend
      • Couchdb-lucene – Integration
      • DistributedLucene – Lucene on Hadoop
      • Neo4j – Full Text Search on Graph Store
    • 28. Worth Mentioning
      • Configuration Dbs – ZooKeeper, Doozer
        • Distributed configuration, locks, synchronization
        • Used to make other apps scalable
      • XML Dbs – eXist, BaseX, Xindice
        • XML only, Xquery, Xpath, ACID, GUI support
        • non-distributed
    • 29.  
    • 30.  
    • 31. Case Study - HBase
      • Apache – part of Hadoop/HDFS
      • Requires ZooKeeper
      • Java based
      • Runs well on Amazon EC2
      • Excellent language support
      • Supports REST interface
    • 32. HBase continued
      • Map/Reduce via Hadoop
      • Schema-less, column families fixed
      • Nearly unlimited columns and rows
      • HBQL – partial sql + JDBC support
      • Some ACID support, atomicity, durability
      • Integration with Hive for data warehousing, ad-hoc query support - HiveQL
    • 33. Case Study - Riak
      • Data Model – Bucket/Key/Value
      • Value has MIME type, byte[]
      • Value supports one-way Links, basic graph
      • Erlang, Protocol Buffers, REST interfaces
      • Pre/Post Commit Hooks
      • CAP Tunable per bucket
      • Map/Reduce – Erlang and Javascript
    • 34. Riak Continued
      • Vector Clocks
      • Read repair for R < N
      • Peer-to-Peer, Nothing Shared Architecture
      • Replication across data centers
      • Pluggable storage
      • API for Most Languages + REST
      • Commercial Support
    • 35. Case Study - Redis
      • Supports hash, list, set, and sorted set
      • Fast set operations
      • Atomic updates
      • Everything stored in memory
      • Persistence to disk – periodic save, append only file, can be compacted
      • Good API support, JDBC subset driver
    • 36. Redis Continued
      • Master – slave replication, read scalability, redundancy, slave can sync to disk
      • Can swap out values, keys must be in memory
      • Can be used as pub/sub messaging system
      • Can send multiple commands in single request
      • Built to be extremely fast
      • Supports very high speed atomic counters
    • 37. Case Study - Neo4j
      • Java based – cross platform
      • ACID transactions
      • Durable persistence
      • Handle billions of nodes/edges single machine
      • Supports bulk data loading
      • Good language support
    • 38. Neo4j Continued
      • Spatial index support
      • RDF triples/OWL/SPARQL support
      • Replication and HA – commercial version
      • Object oriented API
      • Sharding at client level
      • Dual open source and commercial license
    • 39. Resources
      • fallabs.com/tokyocabinet
      • fallabs.com/kyotocabinet
      • redis.io
      • www.membase.org
      • neo4j.org
      • en.wikipedia.org/wiki/Triplestore
      • en.wikipedia.org/wiki/Graph_theory
      • research.microsoft.com/en-us/projects/trinity
    • 40. Resources
      • www.jboss.org/infinispan
      • basho.com
      • nosqlpedia.com/wiki/Consistency_models_in_nonrelational_dbs
      • www.hypertable.org
      • project-voldemort.com
      • www.allthingsdistributed.com/2007/10/amazons_dynamo.html
    • 41. Resources
      • nosql-database.org
      • couchdb.apache.org
      • engineering.twitter.com/2010/05/introducing-flockdb.html
      • infinitegraph.com
      • nosql-database.org
      • http://www.w3.org/TR/rdf-concepts/

    ×