An Introduction to NoSQLBrad Anderson - DevNexusMarch 21, 2011
Me‘boorad’ most places (twitter, github, etc.)Erlang Programmer  Cloudant BigCouch, Ericsson Monaco, Verdeeco  Java, Pytho...
AgendaNoSQL is BULLSHITYou Don’t Need ItYou Can’t Query It
The NamePlay on MySQL (Eric Evans, Rackspace)Not Only SQL (Emil Eifrem)Broad UmbrellaShitty Marketing Term and we’re stuck...
Why do you need NoSQL?
Why do you need NoSQL? YOU DON’T!
Seriously, you don’t...Vastly different performance characteristicsImmature APIs and tools / ecosystemsBugs, most are acti...
Why do they exist?Every one of these new data storage systemscame from a particular pain someone washaving.Each system was...
Prediction: Pain
ExamplesGoogle - index Internet (mapreduce/bigtable)Yahoo - keep up with Google (Hadoop)Amazon - shopping cart (Dynamo)Fac...
Pain of ScalingScale Reads with master-slave replicationScale Writes with master-master replicationPartitioning Vertically...
What to do?Distribute both data and processing    horizontal scalingOrganize data differentlyUse appropriate on-disk storage
Sorting Hat Says...Distribution ModelData ModelDisk Data Structure
Distribution ModelEmbedded (no distribution)Replication / ShardingChord - peer to peerDynamo  consistent hashing, vnodes, ...
No DistributionBDBNeo4J
Replication / ShardingDistributionMongoDBCouchDBRedis
Dynamo DistributionBigCouchRiakVoldemortCassandra    no vnodes    no vector clocksHibari ?
Dynamo - how does it work?                                                                                                ...
Dynamo - how does it work?PUT http://boorad.cloudant.com/dbname/blah?w=2                                                  ...
Dynamo - how does it work?PUT http://boorad.cloudant.com/dbname/blah?w=2                                                  ...
Dynamo - how does it work?PUT http://boorad.cloudant.com/dbname/blah?w=2                                                  ...
Dynamo - how does it work?PUT http://boorad.cloudant.com/dbname/blah?w=2                                                  ...
CAP TheoremPick Two (at any given time)  Consistency  Availability  Partition ToleranceCP refuses requests, AP eventually ...
Data ModelKey/ValueDocumentColumnGraph
Key / ValueBDBRiakVoldemortRedisHibari
DocumentCouchDBMongoDBSimpleDB
Column StoresHBaseCassandraHypertable
Graph DatabasesNeo4JAllegroGraphFlockDB
Disk Data Structurebtree - many different kindsmmap - compact bsonmemtable/sstable or log structured merge treelog-structu...
Querying NoSQLKey Lookups fast, easy, limitingSecondary Indexes Immature part of most systems Roll your own MapReduceMongo...
Polyglot Persistence                                  RDBMS                batch processes                                ...
DriversSpring  commons, hadoop, kv, document, graph  membase, hbase, cassandra comingSerialization  Thrift, Protocol Buffe...
Good Luck! You’ll Need It.
Questions?
Upcoming SlideShare
Loading in...5
×

DevNexus 2011

6,170

Published on

"An Introduction to NoSQL"
DevNexus talk 3/21/2011

Published in: Technology
1 Comment
17 Likes
Statistics
Notes
No Downloads
Views
Total Views
6,170
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
115
Comments
1
Likes
17
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • DevNexus 2011

    1. 1. An Introduction to NoSQLBrad Anderson - DevNexusMarch 21, 2011
    2. 2. Me‘boorad’ most places (twitter, github, etc.)Erlang Programmer Cloudant BigCouch, Ericsson Monaco, Verdeeco Java, Python, D, Javascript, Common LispNoSQL East - October 2009Data Warehousing / Big Datapre-lunch talks... always.
    3. 3. AgendaNoSQL is BULLSHITYou Don’t Need ItYou Can’t Query It
    4. 4. The NamePlay on MySQL (Eric Evans, Rackspace)Not Only SQL (Emil Eifrem)Broad UmbrellaShitty Marketing Term and we’re stuck with it
    5. 5. Why do you need NoSQL?
    6. 6. Why do you need NoSQL? YOU DON’T!
    7. 7. Seriously, you don’t...Vastly different performance characteristicsImmature APIs and tools / ecosystemsBugs, most are actively being developedYour situation doesn’t warrant it
    8. 8. Why do they exist?Every one of these new data storage systemscame from a particular pain someone washaving.Each system was created to specifically solvethe pain point the authors were experiencing.This pain usually involves a metric shit-tonne ofdata and distributed processing is required.Schema-free
    9. 9. Prediction: Pain
    10. 10. ExamplesGoogle - index Internet (mapreduce/bigtable)Yahoo - keep up with Google (Hadoop)Amazon - shopping cart (Dynamo)Facebook - inbox search (Cassandra)Lotus - Notes legacy restrictions (CouchDB)Cloudant - physics research (BigCouch)Basho - CRM product (Riak)Neo - graph traversal (Neo4J)
    11. 11. Pain of ScalingScale Reads with master-slave replicationScale Writes with master-master replicationPartitioning Vertically (by functional groups)Partitioning Horizontally (by key, i.e. ‘date’)Caching works, kinda
    12. 12. What to do?Distribute both data and processing horizontal scalingOrganize data differentlyUse appropriate on-disk storage
    13. 13. Sorting Hat Says...Distribution ModelData ModelDisk Data Structure
    14. 14. Distribution ModelEmbedded (no distribution)Replication / ShardingChord - peer to peerDynamo consistent hashing, vnodes, vector clocks
    15. 15. No DistributionBDBNeo4J
    16. 16. Replication / ShardingDistributionMongoDBCouchDBRedis
    17. 17. Dynamo DistributionBigCouchRiakVoldemortCassandra no vnodes no vector clocksHibari ?
    18. 18. Dynamo - how does it work? N=3 W=2 Node 1 26 No de A B C D de No B 2 C B C A DZ E C N od e D 3 E F D No de E 4 F G 17
    19. 19. Dynamo - how does it work?PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Node 1 26 No de A B C D de No B 2 C B C A D Z E C N od e D 3 E F D No de E 4 F G 17
    20. 20. Dynamo - how does it work?PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Node 1 26 No de A B C D de No B 2 C B C A D Z E C N od e D 3 E F D No de E 4 F G 17
    21. 21. Dynamo - how does it work?PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Node 1 26 No de A B C D de No B 2 C B C A D Z hash(blah) E C N od e D 3 E F D No de E 4 F G 17
    22. 22. Dynamo - how does it work?PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Node 1 26 No de A B C D de No B 2 C B C A D Z hash(blah) E C N od e D 3 E F D No de E 4 F G 17
    23. 23. CAP TheoremPick Two (at any given time) Consistency Availability Partition ToleranceCP refuses requests, AP eventually consistentMust Read: http://codahale.com/you-cant-sacrifice-partition-tolerance/
    24. 24. Data ModelKey/ValueDocumentColumnGraph
    25. 25. Key / ValueBDBRiakVoldemortRedisHibari
    26. 26. DocumentCouchDBMongoDBSimpleDB
    27. 27. Column StoresHBaseCassandraHypertable
    28. 28. Graph DatabasesNeo4JAllegroGraphFlockDB
    29. 29. Disk Data Structurebtree - many different kindsmmap - compact bsonmemtable/sstable or log structured merge treelog-structured linear hashingadjacency lists / adjacency matrices
    30. 30. Querying NoSQLKey Lookups fast, easy, limitingSecondary Indexes Immature part of most systems Roll your own MapReduceMongo query language
    31. 31. Polyglot Persistence RDBMS batch processes CacheRaw Hadoop NoSQL AppsData NoSQL
    32. 32. DriversSpring commons, hadoop, kv, document, graph membase, hbase, cassandra comingSerialization Thrift, Protocol Buffers, AvroNative Cassandra, Hadoop, Voldemort JInterface to Erlang?
    33. 33. Good Luck! You’ll Need It.
    34. 34. Questions?
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×