A walk down NOSQL Lane in the cloud


Published on

Introduction to NOSQL and various NOSQL solutions.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A walk down NOSQL Lane in the cloud

  1. 1. A Walk down NOSQL Lane in the Cloud New York City Cloud Computing Group February 2011 Alexander Sicular @siculars
  2. 2. Who is this blowhard?Columbia University pays my mortgageFor the better part of a decade in MedicalInformaticsAm not shilling for any of these companiesAm not a computer scientistAm a computer science enthusiastparticularly in the area of Informatics
  3. 3. When I put my data inthe “cloud”, to me it just means that it’s virtualized in someone else’s server room
  4. 4. ...the Silver LiningMany, many providers and only growing Amazon, Rackspace, Joyent, CouchOne, Cloudant, Azure, GAE, Heroku, no.deOutsourced managementZero capexControlled costs
  5. 5. ...With a Chance of Rain?Vendor lock inUnreliable performance i/o cpu, memoryBare metal > software virtualization
  6. 6. NoSQL or NOSQL?Not Only SQLNon/post relationalBig tent policyUmbrella termFragmented http://www.flickr.com/photos/morgennebel/2933723145/
  7. 7. Your Usage PatternsRead vs. WriteMutable vs. ImmutableProduct Considerations: In place updates Write Only Logs
  8. 8. This vs. ThatRiak wiki comparisons pagehttp://wiki.basho.com/Riak-Comparisons.htmlPopular one page comparison of a number ofNOSQL players by Kristof Kovacs:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
  9. 9. NOSQL concepts are Not Brand NewMemcached since 2003 http://memcached.orgGoogle papers 2004-2006Amazon Dynamo 2007Consistent Hashing 2007 http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clientsUsing relational systems as a key-value blobstore 2009 FriendFeed (not the first) http://bret.appspot.com/entry/how- friendfeed-uses-mysql
  10. 10. Why NOSQLSupport for “Vary Large” data setsSchemalessDenormalizedGreen fieldNew applications http://www.flickr.com/photos/gailtang/1243984297/
  11. 11. AcademiaGoogle: Bigtable http://labs.google.com/papers/bigtable.html GFS http://labs.google.com/papers/gfs.html M/R http://labs.google.com/papers/mapreduce.htmlAmazon: Dynamo http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfNOSQL Summer http://nosqlsummer.org/papers
  12. 12. Under the Hood TerminologyWrite Only Log http://en.wikipedia.org/wiki/Log-structured_file_systemMerkle Trees http://en.wikipedia.org/wiki/Hash_treeB-trees http://en.wikipedia.org/wiki/B-treeVector clock http://en.wikipedia.org/wiki/Vector_clockBloom filters http://en.wikipedia.org/wiki/Bloom_filtersBig O Notation http://en.wikipedia.org/wiki/Big_o_notationConsistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing
  13. 13. CAP Theorem http://en.wikipedia.org/wiki/CAP_theoremConsistencyAvailabilityPartition Tolerance Pick two? http://guide.couchdb.org/draft/consistency.html
  14. 14. CouchDBCouchOne, Cloudant HTTP interfaceErlang Offline usageExtreme replication Sharded scalingscenariosWorks on phonesUpdated indexing(b-tree)
  15. 15. CouchDB Internal Architecture http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG
  16. 16. MongoDB10Gen, MongoHQ, Soft landing forMongoLab those coming from mysql (relationalC++ databases)huMONGOus Native javascriptSharded scaling, Secondary indexesreplicated master/slaveLocated in NYC(go visit them)
  17. 17. MongoDB Sharding Diagramhttp://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/
  18. 18. MySQL to Mongo Query similarity http://nosqlpedia.com/wiki/File:MongoDB.JPG
  19. 19. RiakBasho, Joyent Multiple backendsErlang HomogeneousDistributed CAP tunableHTTP, protobufNative javascript,erlang
  20. 20. HadoopCloudera, Apache Huge ecosystemFoundation Yahoo, FB, Twitter,Java Fortune 500High latency Pig, Hive, FlumeBatch orientedHDFS is GFS basedOpen source Googlestack via the Googlepapers
  21. 21. HBaseJavaLow latency storesits on top of HadoopModeled after Google BigtableColumn orientedThrift, protobufBackend for new Facebook Messaging service
  22. 22. CassandraApacheJavaColumn orientedLike Bigtable and DynamoOriginated at FacebookAt Twitter, Distributed countinghttp://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-Kinghttp://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
  23. 23. RedisOpenRedis incredibly fastC memcached on steroidsREmoteDIctionary replicatedServer master/slaveSpecific datastructures
  24. 24. CommonalitiesOpen SourceAdherence to common or standard: data formats json, bson, utf8, binary data trandport mechanisms http, thrift, protobuf, simple wire protocols
  25. 25. Ok. So Now What?Analyze your requirementsMailing listsIRC, twitterProject pages, wikiGithub/Google Code/Bitbucket: project page specific language clients
  26. 26. Variety PackHybrid architectures will become the normTwitter - mysql, cassandra, hadoopGoogle - mysql, GAE (BT)Facebook - mysql,cassandra, hbase,memcachedYahoo - mysql, hadoopLinkedIn - voldemort http://www.flickr.com/photos/uncleweed/82245324/
  27. 27. Questions?New York City Cloud Computing Group February 2011 Alexander Sicular @siculars