No sql distilled-distilled

1,009 views

Published on

3 Comments
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,009
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
16
Comments
3
Likes
2
Embeds 0
No embeds

No notes for slide

No sql distilled-distilled

  1. 1. rICh Morrow, quicloud LLC
  2. 2.  This talk is essentially the first couple chapters of “NoSQL Distilled” (Sadalage, Fowler)  Highly recommend this book!
  3. 3.  App development productivity  Fixes “impedance mismatch”  Large scale  Happily handles the “threeVs” of “big data” ▪ Volume ▪ Velocity ▪ Variety
  4. 4. You’ve always needed a “backing store”  …could be files  great for a single user or application  …could be databases  great for multiple users/applications  …and on the DB side, could be:  Application Database (used by single app)  Integration Database (used by several apps)
  5. 5.  Concurrency  Simple problem, very tough to solve  Application Datastores  One app, many users  Integration Datastores  One set of data, many apps, lots of potential for headbanging
  6. 6. { “id”: “1001”, "firstName": ”Ann", "lastName": "Williams", "age": 55, “purchasedItems”: { 0321290533 {qty, price… } 0321601912 {qty, price… } 0131495054 {qty, price… } } “paymentDetails”: { cc info… } "address": { "street": "1234 Park", "city": "San Francisco", "state": "CA", "zip": "94102" } } 1 object = 10, 20, 100?Tables. Ugh… Your code has one structure, but your RDBMS stores in another…
  7. 7. A great "all purpose" storage + query tool  ACID compliant  Supports many users  Supports many apps  3NF stores data efficiently  Disk wasn't always cheap  Fast and tunable  Introduced a common interface (SQL)  Which every vendor quickly then “broke”
  8. 8.  Impedance mismatch  Many teams build (then have to maintain) custom ORM or SOA proxies  Weren't build to be distributed  Google, Amazon, et al hit hard walls on RDBMS capabilities  Often required expensive, proprietary hardware  Ooops, I sharded myself!  Additional complexity  Cross shard joins now extremely expensive
  9. 9.  Velocity  Faster responses required  Volume  100s ofTB, PB now common  “Web Scale” can mean 100s of thousands of concurrent transactions  Both of those increasing rapidly  Variety  Mixed structure, semi-structured, unstructured
  10. 10.  Bigtable paper (by Google)  Heavily influenced the “Columnar” branch of NoSQL  Dynamo paper (by Amazon)  Heavily influenced the “KeValue” branch of NoSQL  This is NOT DynamoDB!!! Design considerations:  Distributed from the start  Clusters of inexpensive commodity hardware are cheaper & more fault tolerant at scale  Relaxed and/or tunable C&A (from CAP theorem)  Deal with unheard of volume & velocity  Schemaless (bye bye impedance mismatch)
  11. 11.  Consistency  How consistent the data looks to 2 or more viewers  “Eventual” consistency possible (and common)!  Availability  Responsiveness of the system  PartitionTolerance  How well does the system respond to partition failures?  This is normally “untunable”, unlike the C&A
  12. 12.  Because “Cloud” and “Big Data” were just not confusing enough people in IT  "Not ONLY SQL" - incredibly unfortunate "little o"  Name born out of a Bay Area meetup in 2009  …and regretted / derided ever since
  13. 13. Fancy term for “multiple datastores”  ...you're already doing it  Browser side cache  Memcache  Query cache  OLAP systems  ...just add NoSQL  Tell your RDBMS not to worry – it will (probably) still live a long, happy life
  14. 14.  Generally Open Source  Schemaless  Easily change schema or do 'schema on read'  Cluster-oriented  With the exception of Graph DBs  Generally favor "Web Scale" over ACID  Generally better for APPLICATION Databases  Aggregate data models  Let you treat a group of data as a unit  Again, graph DBs are an exception here…
  15. 15.  KeyValue  Fast lookup on a single “hashed” key  Document  Each “Document” self-defines it’s own structure  Columnar (or Column-Family)  Great for “sparse” data (millions of columns)  Graph [bit of a black sheep in the NoSQL family]  Specialized to crawl graph relations like social networks, resource flows, etc  Less popular at the moment, but gaining steam fast
  16. 16.  Can only look up by (normally a single) Key  Extremely fast for that key  Value can be anything  Example: DynamoDB, Riak
  17. 17.  Document can contain anything  json extremely popular  But can also be XML, CSV, semi-structured, unstructured, custom… literally anything  Can query on aggregates inside of document  Can even index on aggregates  Can retrieve part of the document  Extremely memory intensive  Example: MongoDB, CouchDB
  18. 18.  Great for “sparse” data (populated columns vary greatly between rows)  Group columns into families  Think of it as a “two level” aggregate  First level “key” is rowID or aggregate of interest  2nd level values are the columns  You can visualize the data as row or column- oriented  Example: Hbase, Cassandra
  19. 19.  Built to efficiently crawl & search graph trees  Social Networks  Resource flows  “people of interest”  Don’t run well on clusters  Example: Neo4J (and not much else right now)
  20. 20.  RDBMS were not designed with many of today’s problems in mind  NoSQL DBs were built from the ground up to deal with these “ThreeV” issues  NoSQL can either replace or (more commonly) supplement existing RDBMS functions  Move hot tables out to DynamoDB  Write a greenfield app from ground up with only a NoSQL datastore  Consistency & Availability are often tunable  Many flavors exist & each have their own best use cases  Research heavily before deciding upon a platform
  21. 21.  Thanks!

×