Your SlideShare is downloading. ×
0
Apache Cassandra: NoSQL, Yes to Scale!<br />srisatishambati<br />@srisatish<br />
NoSQL-<br />Know your queries.<br />
points<br />Usecases<br />Why cassandra?<br />Usecase: Hadoop, Brisk<br />FUD:Consistency <br />Why facebook is not using ...
Users. Netflix.<br />Key by Customer, read-heavy<br />Key by Customer:Movie, write-heavy<br />
TimeSeries: (several customers)<br />periodic readings:  dev0, dev1…deviceID:metric:timestamp ->value<br />Metrics typical...
Why Cassandra?<br />
Operational simplicity<br />peer-to-peer<br />
Operational simplicity<br />peer-to-peer<br />
Replication: <br />Multi-datacenter<br />Multi-region ec2<br />Multi-availability zones<br />
reads local<br />dc1<br />dc2<br />Replication: <br />Multi-datacenter<br />Multi-region ec2, aws<br />Multi-availability ...
4.21.2011,  Amazon Web Services outage:<br />“Movie marathons on Netflix awaiting AWS to come back up.”  #ec2disabled<br />
4.21.2011,  Amazon Web Services outage:<br />Netflix was running on AWS. <br />
fast durable writes. <br />fast reads. <br />
Writes<br />Sequential, append-only.<br />~1-5ms<br />
Reads<br /> Local<br />Key & row caches, (also, jna-based 0xffheap)<br />indexes, materialized<br />
Clients: cql, thrift<br />pycassa, phpcassa<br /> hector, pelops<br />(scala, ruby, clojure)<br />
Usecase #3: hadoop<br />Hdfs cassandra hive<br />Logs         stats          analytics<br />
Brisk<br />Truly peer-to-peer hadoop.<br />
Namenode decomposition, explained.<br />
Use column families (tables)<br />inode<br />sblock<br />
near-real time hadoop<br />Low latency: cassandra_dc nodes<br />Batch Analytics: brisk_dc nodes<br />
FUD, <br />acronym: fear, uncertainty, doubt.<br />
Consistency:  R + W > N    <br />ORACLE, 2-node: R=1, W=2, N=2,(T=2)<br />DNS<br />* N is replication factor. Not to be co...
Tune-able, flexibility.<br />For High Consistency:  <br />read:quorum, write:quorum<br />For High Availability: <br />	hig...
Inbox Search: <br />600+cores.120+TB (2008)<br />Went from 100-500m users.<br />Average NoSQL deployment size: ~6-12 nodes...
Usecase #5: search<br />Apache Solr + Cassandra = Solandra<br />Other inbox/file Searches:<br />xobni, c3<br />github.com/...
“Eventual consistency is harder to program.”<br />mostly immutable data.<br />complex systems at scale.<br />
Miscellaneous,<br />Myth: data-loss, partial rows.<br />writes are durable.<br />
Three more reasons for Cassandra...<br />
Tools<br />AMIs, OpsCenter, DataStax<br />AppDynamics<br />
B e a u t i f u l   C   0   d   e<br />= new code(); //less is more<br />~90k.java.concurrent.@annotate. <br />bloomfilter...
Current & Future Focus:<br />Distributed Counters, CQL.<br />Simple client.<br />operational smoothening. <br />compaction...
Community<br />Robust. Rapid. #<br />Professional support from DataStax.<br />engineers: independent,startups, large compa...
Usecase #4:  first NoSQL, then scale!<br />simpledb  Cassandra<br />mongodb Cassandra<br />
Copyright: xkcd<br />
Copyright: plantoys<br />… more than one way to do it!<br />
Summary -<br />high scale peer-to-peer <br />distributed database.<br />
Q&A<br />@srisatish<br />
Upcoming SlideShare
Loading in...5
×

Cassandra at no_sql

3,165

Published on

SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop.

This talk lays out a few talking points for Apache Cassandra.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,165
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
37
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Cassandra at no_sql"

  1. 1. Apache Cassandra: NoSQL, Yes to Scale!<br />srisatishambati<br />@srisatish<br />
  2. 2. NoSQL-<br />Know your queries.<br />
  3. 3. points<br />Usecases<br />Why cassandra?<br />Usecase: Hadoop, Brisk<br />FUD:Consistency <br />Why facebook is not using Cassandra?<br />Community, Code, Tools<br />Q&A<br />
  4. 4. Users. Netflix.<br />Key by Customer, read-heavy<br />Key by Customer:Movie, write-heavy<br />
  5. 5. TimeSeries: (several customers)<br />periodic readings: dev0, dev1…deviceID:metric:timestamp ->value<br />Metrics typically way larger dataset than users.<br />
  6. 6. Why Cassandra?<br />
  7. 7. Operational simplicity<br />peer-to-peer<br />
  8. 8. Operational simplicity<br />peer-to-peer<br />
  9. 9. Replication: <br />Multi-datacenter<br />Multi-region ec2<br />Multi-availability zones<br />
  10. 10. reads local<br />dc1<br />dc2<br />Replication: <br />Multi-datacenter<br />Multi-region ec2, aws<br />Multi-availability zones<br />
  11. 11. 4.21.2011, Amazon Web Services outage:<br />“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled<br />
  12. 12. 4.21.2011, Amazon Web Services outage:<br />Netflix was running on AWS. <br />
  13. 13. fast durable writes. <br />fast reads. <br />
  14. 14. Writes<br />Sequential, append-only.<br />~1-5ms<br />
  15. 15. Reads<br /> Local<br />Key & row caches, (also, jna-based 0xffheap)<br />indexes, materialized<br />
  16. 16. Clients: cql, thrift<br />pycassa, phpcassa<br /> hector, pelops<br />(scala, ruby, clojure)<br />
  17. 17. Usecase #3: hadoop<br />Hdfs cassandra hive<br />Logs stats analytics<br />
  18. 18. Brisk<br />Truly peer-to-peer hadoop.<br />
  19. 19. Namenode decomposition, explained.<br />
  20. 20.
  21. 21.
  22. 22. Use column families (tables)<br />inode<br />sblock<br />
  23. 23. near-real time hadoop<br />Low latency: cassandra_dc nodes<br />Batch Analytics: brisk_dc nodes<br />
  24. 24. FUD, <br />acronym: fear, uncertainty, doubt.<br />
  25. 25. Consistency: R + W > N <br />ORACLE, 2-node: R=1, W=2, N=2,(T=2)<br />DNS<br />* N is replication factor. Not to be confused with T=total #of nodes<br />
  26. 26. Tune-able, flexibility.<br />For High Consistency: <br />read:quorum, write:quorum<br />For High Availability: <br /> high W, low R. <br />
  27. 27.
  28. 28. Inbox Search: <br />600+cores.120+TB (2008)<br />Went from 100-500m users.<br />Average NoSQL deployment size: ~6-12 nodes.<br />
  29. 29. Usecase #5: search<br />Apache Solr + Cassandra = Solandra<br />Other inbox/file Searches:<br />xobni, c3<br />github.com/tjake/solandra<br />
  30. 30. “Eventual consistency is harder to program.”<br />mostly immutable data.<br />complex systems at scale.<br />
  31. 31. Miscellaneous,<br />Myth: data-loss, partial rows.<br />writes are durable.<br />
  32. 32. Three more reasons for Cassandra...<br />
  33. 33. Tools<br />AMIs, OpsCenter, DataStax<br />AppDynamics<br />
  34. 34. B e a u t i f u l C 0 d e<br />= new code(); //less is more<br />~90k.java.concurrent.@annotate. <br />bloomfilters, merkletrees.<br />non-blocking, staged-event-driven.<br />bigtable, dynamo. <br />
  35. 35. Current & Future Focus:<br />Distributed Counters, CQL.<br />Simple client.<br />operational smoothening. <br />compaction.<br />
  36. 36. Community<br />Robust. Rapid. #<br />Professional support from DataStax.<br />engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..<br />Come join the efforts!<br />
  37. 37.
  38. 38. Usecase #4: first NoSQL, then scale!<br />simpledb Cassandra<br />mongodb Cassandra<br />
  39. 39.
  40. 40.
  41. 41. Copyright: xkcd<br />
  42. 42. Copyright: plantoys<br />… more than one way to do it!<br />
  43. 43. Summary -<br />high scale peer-to-peer <br />distributed database.<br />
  44. 44. Q&A<br />@srisatish<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×