Nosql-columbia-feb2011

1,234 views
1,170 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,234
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Nosql-columbia-feb2011

    1. 1. NOSQL. WTW? adicu.com February 2011 Alexander Sicular @siculars
    2. 2. Who is this blowhard?Columbia University pays my mortgageFor the better part of a decade in MedicalInformaticsAm not shilling for any of these companiesAm not a computer scientistAm a computer science enthusiastparticularly in the area of Informatics
    3. 3. NoSQL or NOSQL?Not Only SQLNon/post relationalBig tent policyUmbrella termFragmented http://www.flickr.com/photos/morgennebel/2933723145/
    4. 4. Your Usage PatternsRead vs. WriteMutable vs. ImmutableProduct Considerations: In place updates Write Only Logs
    5. 5. This vs. ThatRiak wiki comparisons pagehttp://wiki.basho.com/Riak-Comparisons.htmlPopular one page comparison of a number ofNOSQL players by Kristof Kovacs:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
    6. 6. NOSQL concepts are Not Brand NewMemcached since 2003 http://memcached.orgGoogle papers 2004-2006Amazon Dynamo 2007Consistent Hashing 2007 http://www.last.fm/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clientsUsing relational systems as a key-value blobstore 2009 FriendFeed (not the first) http://bret.appspot.com/entry/how- friendfeed-uses-mysql
    7. 7. Why NOSQLSupport for “Vary Large” data setsSchemalessDenormalizedGreen fieldNew applications http://www.flickr.com/photos/gailtang/1243984297/
    8. 8. AcademiaGoogle: Bigtable http://labs.google.com/papers/bigtable.html GFS http://labs.google.com/papers/gfs.html M/R http://labs.google.com/papers/mapreduce.htmlAmazon: Dynamo http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfNOSQL Summer http://nosqlsummer.org/papersNOSQL Tapes http://nosqltapes.com
    9. 9. Under the Hood TerminologyWrite Only Log http://en.wikipedia.org/wiki/Log-structured_file_systemMerkle Trees http://en.wikipedia.org/wiki/Hash_treeB-trees http://en.wikipedia.org/wiki/B-treeVector clock http://en.wikipedia.org/wiki/Vector_clockBloom filters http://en.wikipedia.org/wiki/Bloom_filtersBig O Notation http://en.wikipedia.org/wiki/Big_o_notationConsistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing
    10. 10. CAP Theorem http://en.wikipedia.org/wiki/CAP_theoremConsistencyAvailabilityPartition Tolerance Pick two? http://guide.couchdb.org/draft/consistency.html
    11. 11. CouchDBCouchOne, Cloudant HTTP interfaceErlang Offline usageExtreme replication Sharded scalingscenariosWorks on phonesUpdated indexing(b-tree)
    12. 12. CouchDB Internal Architecture http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG
    13. 13. MongoDB10Gen, MongoHQ, Soft landing forMongoLab those coming from mysql (relationalC++ databases)huMONGOus Native javascriptSharded scaling, Secondary indexesreplicated master/slaveLocated in NYC(go visit them)
    14. 14. MongoDB Sharding Diagramhttp://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/
    15. 15. MySQL to Mongo Query similarity http://nosqlpedia.com/wiki/File:MongoDB.JPG
    16. 16. RiakBasho, Joyent Multiple backendsErlang HomogeneousDistributed CAP tunableHTTP, protobufNative javascript,erlang
    17. 17. HadoopCloudera, Apache Huge ecosystemFoundation Yahoo, FB, Twitter,Java Fortune 500High latency Pig, Hive, FlumeBatch orientedHDFS is GFS basedOpen source Googlestack via the Googlepapers
    18. 18. HBaseJavaLow latency storesits on top of HadoopModeled after Google BigtableColumn orientedThrift, protobufBackend for new Facebook Messaging service
    19. 19. CassandraApacheJavaColumn orientedLike Bigtable and DynamoOriginated at FacebookAt Twitter, Distributed countinghttp://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-Kinghttp://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
    20. 20. RedisOpenRedis incredibly fastC memcached on steroidsREmoteDIctionary replicatedServer master/slaveSpecific datastructures
    21. 21. CommonalitiesOpen SourceAdherence to common or standard: data formats json, bson, utf8, binary data trandport mechanisms http, thrift, protobuf, simple wire protocols
    22. 22. Ok. So Now What?Analyze your requirementsMailing listsIRC, twitterProject pages, wikiGithub/Google Code/Bitbucket: project page specific language clients
    23. 23. Variety PackHybrid architectures will become the normTwitter - mysql, cassandra, hadoopGoogle - mysql, GAE (BT)Facebook - mysql,cassandra, hbase,memcachedYahoo - mysql, hadoopLinkedIn - voldemort http://www.flickr.com/photos/uncleweed/82245324/
    24. 24. Questions? adicu.com February 2011 Alexander Sicular @siculars

    ×