High-order bits from Cassandra & Hadoop<br />srisatishambati<br />@srisatish<br />
NoSQL-<br />Know your queries.<br />
points<br />Usecases<br />Why NoSQL?<br />Why cassandra?<br />Usecase: Hadoop, Brisk<br />FUD:Consistency <br />Why facebo...
Users. Netflix.<br />Key by Customer, read-heavy<br />Key by Customer:Movie, write-heavy<br />
TimeSeries: (several customers)<br />periodic readings:  dev0, dev1…deviceID:metric:timestamp ->value<br />Metrics typical...
Why Cassandra?<br />
Operational simplicity<br />peer-to-peer<br />
Operational simplicity<br />peer-to-peer<br />
Replication: <br />Multi-datacenter<br />Multi-region ec2<br />Multi-availability zones<br />
reads local<br />dc1<br />dc2<br />Replication: <br />Multi-datacenter<br />Multi-region ec2, aws<br />Multi-availability ...
4.21.2011,  Amazon Web Services outage:<br />“Movie marathons on Netflix awaiting AWS to come back up.”  #ec2disabled<br />
4.21.2011,  Amazon Web Services outage:<br />Netflix was running on AWS. <br />
fast durable writes. <br />fast reads. <br />
Writes<br />Sequential, append-only.<br />~1-5ms<br />
Writes<br />Sequential, append-only.<br />~1-5ms<br />On cloud: ephemeral disks rock!<br />
Reads<br /> Local<br />Key & row caches, (also, jna-based 0xffheap)<br />indexes, materialized<br />
Reads<br /> Local<br />Key & row caches, (also, jna-based 0xffheap)<br />indexes, materialized<br />ssds, improved read pe...
Clients: cql, thrift<br />pycassa, phpcassa<br /> hector, pelops<br />(scala, ruby, clojure)<br />
Usecase #3: hadoop<br />Hdfs cassandra hive<br />Logs         stats          analytics<br />
Brisk<br />Truly peer-to-peer hadoop.<br />
mv computation<br />not data<br />
Parallel Execution View<br />
jobtracker, tasktracker<br />hdfs: namenode, datanode<br />
cloudera<br />amazon: elastic map reduce<br />hortonworks<br />mapR<br />brisk<br />
Namenode decomposition, explained.<br />
Use column families (tables)<br />inode<br />sblock<br />
near-real time hadoop<br />Low latency: cassandra_dc nodes<br />Batch Analytics: brisk_dc nodes<br />
FUD, <br />acronym: fear, uncertainty, doubt.<br />
Consistency:  R + W > N    <br />ORACLE, 2-node: R=1, W=2, N=2,(T=2)<br />DNS<br />* N is replication factor. Not to be co...
Tune-able, flexibility.<br />For High Consistency:  <br />read:quorum, write:quorum<br />For High Availability: <br />	hig...
Inbox Search: <br />600+cores.120+TB (2008)<br />Went from 100-500m users.<br />Average NoSQL deployment size: ~6-12 nodes...
Usecase #5: search<br />Apache Solr + Cassandra = Solandra<br />Other inbox/file Searches:<br />xobni, c3<br />github.com/...
“Eventual consistency is harder to program.”<br />mostly immutable data.<br />complex systems at scale.<br />
Miscellaneous,<br />Myth: data-loss, partial rows.<br />writes are durable.<br />
Three good reasons for Cassandra...<br />
Tools<br />AMIs, OpsCenter, DataStax<br />AppDynamics<br />
B e a u t i f u l   C   0   d   e<br />= new code(); //less is more<br />~90k.java.concurrent.@annotate. <br />bloomfilter...
Current & Future Focus:<br />Distributed Counters, CQL.<br />Simple client.<br />operational smoothening. <br />compaction...
Community<br />Robust. Rapid. #<br />Professional support from DataStax.<br />Filesysteminnovatin from Acunu<br />engineer...
Usecase #4:  first NoSQL, then scale!<br />simpledb  Cassandra<br />mongodb Cassandra<br />
Copyright: xkcd<br />
Copyright: plantoys<br />… more than one way to do it!<br />
Summary -<br />high scale peer-to-peer datastore<br />best friend for <br />multi-region, multi-zone availability.<br />Ha...
Q&A<br />@srisatish<br />
NoSQL-<br />Know your queries.<br />
Upcoming SlideShare
Loading in …5
×

High order bits from cassandra & hadoop

1,573
-1

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,573
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

High order bits from cassandra & hadoop

  1. 1. High-order bits from Cassandra & Hadoop<br />srisatishambati<br />@srisatish<br />
  2. 2. NoSQL-<br />Know your queries.<br />
  3. 3. points<br />Usecases<br />Why NoSQL?<br />Why cassandra?<br />Usecase: Hadoop, Brisk<br />FUD:Consistency <br />Why facebook is not using Cassandra?<br />Community, Code, Tools<br />Q&A<br />
  4. 4. Users. Netflix.<br />Key by Customer, read-heavy<br />Key by Customer:Movie, write-heavy<br />
  5. 5. TimeSeries: (several customers)<br />periodic readings: dev0, dev1…deviceID:metric:timestamp ->value<br />Metrics typically way larger dataset than users.<br />
  6. 6. Why Cassandra?<br />
  7. 7. Operational simplicity<br />peer-to-peer<br />
  8. 8. Operational simplicity<br />peer-to-peer<br />
  9. 9. Replication: <br />Multi-datacenter<br />Multi-region ec2<br />Multi-availability zones<br />
  10. 10. reads local<br />dc1<br />dc2<br />Replication: <br />Multi-datacenter<br />Multi-region ec2, aws<br />Multi-availability zones<br />
  11. 11. 4.21.2011, Amazon Web Services outage:<br />“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled<br />
  12. 12. 4.21.2011, Amazon Web Services outage:<br />Netflix was running on AWS. <br />
  13. 13. fast durable writes. <br />fast reads. <br />
  14. 14. Writes<br />Sequential, append-only.<br />~1-5ms<br />
  15. 15. Writes<br />Sequential, append-only.<br />~1-5ms<br />On cloud: ephemeral disks rock!<br />
  16. 16. Reads<br /> Local<br />Key & row caches, (also, jna-based 0xffheap)<br />indexes, materialized<br />
  17. 17. Reads<br /> Local<br />Key & row caches, (also, jna-based 0xffheap)<br />indexes, materialized<br />ssds, improved read performance! <br />
  18. 18. Clients: cql, thrift<br />pycassa, phpcassa<br /> hector, pelops<br />(scala, ruby, clojure)<br />
  19. 19. Usecase #3: hadoop<br />Hdfs cassandra hive<br />Logs stats analytics<br />
  20. 20. Brisk<br />Truly peer-to-peer hadoop.<br />
  21. 21. mv computation<br />not data<br />
  22. 22.
  23. 23. Parallel Execution View<br />
  24. 24.
  25. 25. jobtracker, tasktracker<br />hdfs: namenode, datanode<br />
  26. 26. cloudera<br />amazon: elastic map reduce<br />hortonworks<br />mapR<br />brisk<br />
  27. 27. Namenode decomposition, explained.<br />
  28. 28.
  29. 29.
  30. 30. Use column families (tables)<br />inode<br />sblock<br />
  31. 31. near-real time hadoop<br />Low latency: cassandra_dc nodes<br />Batch Analytics: brisk_dc nodes<br />
  32. 32. FUD, <br />acronym: fear, uncertainty, doubt.<br />
  33. 33. Consistency: R + W > N <br />ORACLE, 2-node: R=1, W=2, N=2,(T=2)<br />DNS<br />* N is replication factor. Not to be confused with T=total #of nodes<br />
  34. 34. Tune-able, flexibility.<br />For High Consistency: <br />read:quorum, write:quorum<br />For High Availability: <br /> high W, low R. <br />
  35. 35.
  36. 36. Inbox Search: <br />600+cores.120+TB (2008)<br />Went from 100-500m users.<br />Average NoSQL deployment size: ~6-12 nodes.<br />
  37. 37. Usecase #5: search<br />Apache Solr + Cassandra = Solandra<br />Other inbox/file Searches:<br />xobni, c3<br />github.com/tjake/solandra<br />
  38. 38. “Eventual consistency is harder to program.”<br />mostly immutable data.<br />complex systems at scale.<br />
  39. 39. Miscellaneous,<br />Myth: data-loss, partial rows.<br />writes are durable.<br />
  40. 40. Three good reasons for Cassandra...<br />
  41. 41. Tools<br />AMIs, OpsCenter, DataStax<br />AppDynamics<br />
  42. 42. B e a u t i f u l C 0 d e<br />= new code(); //less is more<br />~90k.java.concurrent.@annotate. <br />bloomfilters, merkletrees.<br />non-blocking, staged-event-driven.<br />bigtable, dynamo. <br />
  43. 43. Current & Future Focus:<br />Distributed Counters, CQL.<br />Simple client.<br />operational smoothening. <br />compaction.<br />
  44. 44. Community<br />Robust. Rapid. #<br />Professional support from DataStax.<br />Filesysteminnovatin from Acunu<br />engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..<br />Come join the efforts!<br />
  45. 45.
  46. 46. Usecase #4: first NoSQL, then scale!<br />simpledb Cassandra<br />mongodb Cassandra<br />
  47. 47.
  48. 48.
  49. 49. Copyright: xkcd<br />
  50. 50. Copyright: plantoys<br />… more than one way to do it!<br />
  51. 51. Summary -<br />high scale peer-to-peer datastore<br />best friend for <br />multi-region, multi-zone availability.<br />Hadoop – HDFS engulfing the DataWorld<br />
  52. 52. Q&A<br />@srisatish<br />
  53. 53. NoSQL-<br />Know your queries.<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×