Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Walk down NOSQL Lane      in the Cloud    New York City Cloud Computing Group                          February 2011    ...
Who is this blowhard?Columbia University pays my mortgageFor the better part of a decade in MedicalInformaticsAm not shill...
When I put my data inthe “cloud”, to me it just means that it’s    virtualized in   someone else’s     server room
...the Silver LiningMany, many providers and only growing  Amazon, Rackspace, Joyent, CouchOne,  Cloudant, Azure, GAE, Her...
...With a Chance of         Rain?Vendor lock inUnreliable performance  i/o  cpu, memoryBare metal > software virtualization
NoSQL or NOSQL?Not Only SQLNon/post relationalBig tent policyUmbrella termFragmented                      http://www.flickr...
Your Usage PatternsRead vs. WriteMutable vs. ImmutableProduct Considerations:  In place updates  Write Only Logs
This vs. ThatRiak wiki comparisons pagehttp://wiki.basho.com/Riak-Comparisons.htmlPopular one page comparison of a number ...
NOSQL concepts are  Not Brand NewMemcached since 2003                       http://memcached.orgGoogle papers 2004-2006Ama...
Why NOSQLSupport for “Vary Large” data setsSchemalessDenormalizedGreen fieldNew applications                      http://ww...
AcademiaGoogle:  Bigtable        http://labs.google.com/papers/bigtable.html  GFS     http://labs.google.com/papers/gfs.ht...
Under the Hood      TerminologyWrite Only Log           http://en.wikipedia.org/wiki/Log-structured_file_systemMerkle Trees...
CAP Theorem           http://en.wikipedia.org/wiki/CAP_theoremConsistencyAvailabilityPartition Tolerance   Pick two?      ...
CouchDBCouchOne, Cloudant    HTTP interfaceErlang                Offline usageExtreme replication   Sharded scalingscenario...
CouchDB Internal  Architecture  http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG
MongoDB10Gen, MongoHQ,      Soft landing forMongoLab             those coming from                     mysql (relationalC+...
MongoDB Sharding     Diagramhttp://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/
MySQL to Mongo Query similarity       http://nosqlpedia.com/wiki/File:MongoDB.JPG
RiakBasho, Joyent               Multiple backendsErlang                      HomogeneousDistributed                 CAP tu...
HadoopCloudera, Apache       Huge ecosystemFoundation                          Yahoo, FB, Twitter,Java                    ...
HBaseJavaLow latency storesits on top of HadoopModeled after Google BigtableColumn orientedThrift, protobufBackend for new...
CassandraApacheJavaColumn orientedLike Bigtable and DynamoOriginated at FacebookAt Twitter, Distributed countinghttp://www...
RedisOpenRedis              incredibly fastC                      memcached on                       steroidsREmoteDIction...
CommonalitiesOpen SourceAdherence to common or standard:  data formats    json, bson, utf8, binary  data trandport mechani...
Ok. So Now What?Analyze your requirementsMailing listsIRC, twitterProject pages, wikiGithub/Google Code/Bitbucket:  projec...
Variety PackHybrid architectures will become the normTwitter - mysql, cassandra, hadoopGoogle - mysql, GAE (BT)Facebook - ...
Questions?New York City Cloud Computing Group                      February 2011                   Alexander Sicular      ...
Upcoming SlideShare
Loading in …5
×

A walk down NOSQL Lane in the cloud

1,442 views

Published on

Introduction to NOSQL and various NOSQL solutions.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A walk down NOSQL Lane in the cloud

  1. 1. A Walk down NOSQL Lane in the Cloud New York City Cloud Computing Group February 2011 Alexander Sicular @siculars
  2. 2. Who is this blowhard?Columbia University pays my mortgageFor the better part of a decade in MedicalInformaticsAm not shilling for any of these companiesAm not a computer scientistAm a computer science enthusiastparticularly in the area of Informatics
  3. 3. When I put my data inthe “cloud”, to me it just means that it’s virtualized in someone else’s server room
  4. 4. ...the Silver LiningMany, many providers and only growing Amazon, Rackspace, Joyent, CouchOne, Cloudant, Azure, GAE, Heroku, no.deOutsourced managementZero capexControlled costs
  5. 5. ...With a Chance of Rain?Vendor lock inUnreliable performance i/o cpu, memoryBare metal > software virtualization
  6. 6. NoSQL or NOSQL?Not Only SQLNon/post relationalBig tent policyUmbrella termFragmented http://www.flickr.com/photos/morgennebel/2933723145/
  7. 7. Your Usage PatternsRead vs. WriteMutable vs. ImmutableProduct Considerations: In place updates Write Only Logs
  8. 8. This vs. ThatRiak wiki comparisons pagehttp://wiki.basho.com/Riak-Comparisons.htmlPopular one page comparison of a number ofNOSQL players by Kristof Kovacs:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
  9. 9. NOSQL concepts are Not Brand NewMemcached since 2003 http://memcached.orgGoogle papers 2004-2006Amazon Dynamo 2007Consistent Hashing 2007 http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clientsUsing relational systems as a key-value blobstore 2009 FriendFeed (not the first) http://bret.appspot.com/entry/how- friendfeed-uses-mysql
  10. 10. Why NOSQLSupport for “Vary Large” data setsSchemalessDenormalizedGreen fieldNew applications http://www.flickr.com/photos/gailtang/1243984297/
  11. 11. AcademiaGoogle: Bigtable http://labs.google.com/papers/bigtable.html GFS http://labs.google.com/papers/gfs.html M/R http://labs.google.com/papers/mapreduce.htmlAmazon: Dynamo http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfNOSQL Summer http://nosqlsummer.org/papers
  12. 12. Under the Hood TerminologyWrite Only Log http://en.wikipedia.org/wiki/Log-structured_file_systemMerkle Trees http://en.wikipedia.org/wiki/Hash_treeB-trees http://en.wikipedia.org/wiki/B-treeVector clock http://en.wikipedia.org/wiki/Vector_clockBloom filters http://en.wikipedia.org/wiki/Bloom_filtersBig O Notation http://en.wikipedia.org/wiki/Big_o_notationConsistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing
  13. 13. CAP Theorem http://en.wikipedia.org/wiki/CAP_theoremConsistencyAvailabilityPartition Tolerance Pick two? http://guide.couchdb.org/draft/consistency.html
  14. 14. CouchDBCouchOne, Cloudant HTTP interfaceErlang Offline usageExtreme replication Sharded scalingscenariosWorks on phonesUpdated indexing(b-tree)
  15. 15. CouchDB Internal Architecture http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG
  16. 16. MongoDB10Gen, MongoHQ, Soft landing forMongoLab those coming from mysql (relationalC++ databases)huMONGOus Native javascriptSharded scaling, Secondary indexesreplicated master/slaveLocated in NYC(go visit them)
  17. 17. MongoDB Sharding Diagramhttp://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/
  18. 18. MySQL to Mongo Query similarity http://nosqlpedia.com/wiki/File:MongoDB.JPG
  19. 19. RiakBasho, Joyent Multiple backendsErlang HomogeneousDistributed CAP tunableHTTP, protobufNative javascript,erlang
  20. 20. HadoopCloudera, Apache Huge ecosystemFoundation Yahoo, FB, Twitter,Java Fortune 500High latency Pig, Hive, FlumeBatch orientedHDFS is GFS basedOpen source Googlestack via the Googlepapers
  21. 21. HBaseJavaLow latency storesits on top of HadoopModeled after Google BigtableColumn orientedThrift, protobufBackend for new Facebook Messaging service
  22. 22. CassandraApacheJavaColumn orientedLike Bigtable and DynamoOriginated at FacebookAt Twitter, Distributed countinghttp://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-Kinghttp://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
  23. 23. RedisOpenRedis incredibly fastC memcached on steroidsREmoteDIctionary replicatedServer master/slaveSpecific datastructures
  24. 24. CommonalitiesOpen SourceAdherence to common or standard: data formats json, bson, utf8, binary data trandport mechanisms http, thrift, protobuf, simple wire protocols
  25. 25. Ok. So Now What?Analyze your requirementsMailing listsIRC, twitterProject pages, wikiGithub/Google Code/Bitbucket: project page specific language clients
  26. 26. Variety PackHybrid architectures will become the normTwitter - mysql, cassandra, hadoopGoogle - mysql, GAE (BT)Facebook - mysql,cassandra, hbase,memcachedYahoo - mysql, hadoopLinkedIn - voldemort http://www.flickr.com/photos/uncleweed/82245324/
  27. 27. Questions?New York City Cloud Computing Group February 2011 Alexander Sicular @siculars

×