• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
No sql distilled-distilled
 

No sql distilled-distilled

on

  • 402 views

 

Statistics

Views

Total Views
402
Views on SlideShare
392
Embed Views
10

Actions

Likes
1
Downloads
2
Comments
3

2 Embeds 10

http://www.linkedin.com 8
https://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

13 of 3 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    No sql distilled-distilled No sql distilled-distilled Presentation Transcript

    • rICh Morrow, quicloud LLC
    •  This talk is essentially the first couple chapters of “NoSQL Distilled” (Sadalage, Fowler)  Highly recommend this book!
    •  App development productivity  Fixes “impedance mismatch”  Large scale  Happily handles the “threeVs” of “big data” ▪ Volume ▪ Velocity ▪ Variety
    • You’ve always needed a “backing store”  …could be files  great for a single user or application  …could be databases  great for multiple users/applications  …and on the DB side, could be:  Application Database (used by single app)  Integration Database (used by several apps)
    •  Concurrency  Simple problem, very tough to solve  Application Datastores  One app, many users  Integration Datastores  One set of data, many apps, lots of potential for headbanging
    • { “id”: “1001”, "firstName": ”Ann", "lastName": "Williams", "age": 55, “purchasedItems”: { 0321290533 {qty, price… } 0321601912 {qty, price… } 0131495054 {qty, price… } } “paymentDetails”: { cc info… } "address": { "street": "1234 Park", "city": "San Francisco", "state": "CA", "zip": "94102" } } 1 object = 10, 20, 100?Tables. Ugh… Your code has one structure, but your RDBMS stores in another…
    • A great "all purpose" storage + query tool  ACID compliant  Supports many users  Supports many apps  3NF stores data efficiently  Disk wasn't always cheap  Fast and tunable  Introduced a common interface (SQL)  Which every vendor quickly then “broke”
    •  Impedance mismatch  Many teams build (then have to maintain) custom ORM or SOA proxies  Weren't build to be distributed  Google, Amazon, et al hit hard walls on RDBMS capabilities  Often required expensive, proprietary hardware  Ooops, I sharded myself!  Additional complexity  Cross shard joins now extremely expensive
    •  Velocity  Faster responses required  Volume  100s ofTB, PB now common  “Web Scale” can mean 100s of thousands of concurrent transactions  Both of those increasing rapidly  Variety  Mixed structure, semi-structured, unstructured
    •  Bigtable paper (by Google)  Heavily influenced the “Columnar” branch of NoSQL  Dynamo paper (by Amazon)  Heavily influenced the “KeValue” branch of NoSQL  This is NOT DynamoDB!!! Design considerations:  Distributed from the start  Clusters of inexpensive commodity hardware are cheaper & more fault tolerant at scale  Relaxed and/or tunable C&A (from CAP theorem)  Deal with unheard of volume & velocity  Schemaless (bye bye impedance mismatch)
    •  Consistency  How consistent the data looks to 2 or more viewers  “Eventual” consistency possible (and common)!  Availability  Responsiveness of the system  PartitionTolerance  How well does the system respond to partition failures?  This is normally “untunable”, unlike the C&A
    •  Because “Cloud” and “Big Data” were just not confusing enough people in IT  "Not ONLY SQL" - incredibly unfortunate "little o"  Name born out of a Bay Area meetup in 2009  …and regretted / derided ever since
    • Fancy term for “multiple datastores”  ...you're already doing it  Browser side cache  Memcache  Query cache  OLAP systems  ...just add NoSQL  Tell your RDBMS not to worry – it will (probably) still live a long, happy life
    •  Generally Open Source  Schemaless  Easily change schema or do 'schema on read'  Cluster-oriented  With the exception of Graph DBs  Generally favor "Web Scale" over ACID  Generally better for APPLICATION Databases  Aggregate data models  Let you treat a group of data as a unit  Again, graph DBs are an exception here…
    •  KeyValue  Fast lookup on a single “hashed” key  Document  Each “Document” self-defines it’s own structure  Columnar (or Column-Family)  Great for “sparse” data (millions of columns)  Graph [bit of a black sheep in the NoSQL family]  Specialized to crawl graph relations like social networks, resource flows, etc  Less popular at the moment, but gaining steam fast
    •  Can only look up by (normally a single) Key  Extremely fast for that key  Value can be anything  Example: DynamoDB, Riak
    •  Document can contain anything  json extremely popular  But can also be XML, CSV, semi-structured, unstructured, custom… literally anything  Can query on aggregates inside of document  Can even index on aggregates  Can retrieve part of the document  Extremely memory intensive  Example: MongoDB, CouchDB
    •  Great for “sparse” data (populated columns vary greatly between rows)  Group columns into families  Think of it as a “two level” aggregate  First level “key” is rowID or aggregate of interest  2nd level values are the columns  You can visualize the data as row or column- oriented  Example: Hbase, Cassandra
    •  Built to efficiently crawl & search graph trees  Social Networks  Resource flows  “people of interest”  Don’t run well on clusters  Example: Neo4J (and not much else right now)
    •  RDBMS were not designed with many of today’s problems in mind  NoSQL DBs were built from the ground up to deal with these “ThreeV” issues  NoSQL can either replace or (more commonly) supplement existing RDBMS functions  Move hot tables out to DynamoDB  Write a greenfield app from ground up with only a NoSQL datastore  Consistency & Availability are often tunable  Many flavors exist & each have their own best use cases  Research heavily before deciding upon a platform
    •  Thanks!