Your SlideShare is downloading. ×
0
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
High order bits from cassandra & hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

High order bits from cassandra & hadoop

1,457

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,457
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. High-order bits from Cassandra & Hadoop<br />srisatishambati<br />@srisatish<br />
  • 2. NoSQL-<br />Know your queries.<br />
  • 3. points<br />Usecases<br />Why NoSQL?<br />Why cassandra?<br />Usecase: Hadoop, Brisk<br />FUD:Consistency <br />Why facebook is not using Cassandra?<br />Community, Code, Tools<br />Q&A<br />
  • 4. Users. Netflix.<br />Key by Customer, read-heavy<br />Key by Customer:Movie, write-heavy<br />
  • 5. TimeSeries: (several customers)<br />periodic readings: dev0, dev1…deviceID:metric:timestamp ->value<br />Metrics typically way larger dataset than users.<br />
  • 6. Why Cassandra?<br />
  • 7. Operational simplicity<br />peer-to-peer<br />
  • 8. Operational simplicity<br />peer-to-peer<br />
  • 9. Replication: <br />Multi-datacenter<br />Multi-region ec2<br />Multi-availability zones<br />
  • 10. reads local<br />dc1<br />dc2<br />Replication: <br />Multi-datacenter<br />Multi-region ec2, aws<br />Multi-availability zones<br />
  • 11. 4.21.2011, Amazon Web Services outage:<br />“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled<br />
  • 12. 4.21.2011, Amazon Web Services outage:<br />Netflix was running on AWS. <br />
  • 13. fast durable writes. <br />fast reads. <br />
  • 14. Writes<br />Sequential, append-only.<br />~1-5ms<br />
  • 15. Writes<br />Sequential, append-only.<br />~1-5ms<br />On cloud: ephemeral disks rock!<br />
  • 16. Reads<br /> Local<br />Key & row caches, (also, jna-based 0xffheap)<br />indexes, materialized<br />
  • 17. Reads<br /> Local<br />Key & row caches, (also, jna-based 0xffheap)<br />indexes, materialized<br />ssds, improved read performance! <br />
  • 18. Clients: cql, thrift<br />pycassa, phpcassa<br /> hector, pelops<br />(scala, ruby, clojure)<br />
  • 19. Usecase #3: hadoop<br />Hdfs cassandra hive<br />Logs stats analytics<br />
  • 20. Brisk<br />Truly peer-to-peer hadoop.<br />
  • 21. mv computation<br />not data<br />
  • 22.
  • 23. Parallel Execution View<br />
  • 24.
  • 25. jobtracker, tasktracker<br />hdfs: namenode, datanode<br />
  • 26. cloudera<br />amazon: elastic map reduce<br />hortonworks<br />mapR<br />brisk<br />
  • 27. Namenode decomposition, explained.<br />
  • 28.
  • 29.
  • 30. Use column families (tables)<br />inode<br />sblock<br />
  • 31. near-real time hadoop<br />Low latency: cassandra_dc nodes<br />Batch Analytics: brisk_dc nodes<br />
  • 32. FUD, <br />acronym: fear, uncertainty, doubt.<br />
  • 33. Consistency: R + W > N <br />ORACLE, 2-node: R=1, W=2, N=2,(T=2)<br />DNS<br />* N is replication factor. Not to be confused with T=total #of nodes<br />
  • 34. Tune-able, flexibility.<br />For High Consistency: <br />read:quorum, write:quorum<br />For High Availability: <br /> high W, low R. <br />
  • 35.
  • 36. Inbox Search: <br />600+cores.120+TB (2008)<br />Went from 100-500m users.<br />Average NoSQL deployment size: ~6-12 nodes.<br />
  • 37. Usecase #5: search<br />Apache Solr + Cassandra = Solandra<br />Other inbox/file Searches:<br />xobni, c3<br />github.com/tjake/solandra<br />
  • 38. “Eventual consistency is harder to program.”<br />mostly immutable data.<br />complex systems at scale.<br />
  • 39. Miscellaneous,<br />Myth: data-loss, partial rows.<br />writes are durable.<br />
  • 40. Three good reasons for Cassandra...<br />
  • 41. Tools<br />AMIs, OpsCenter, DataStax<br />AppDynamics<br />
  • 42. B e a u t i f u l C 0 d e<br />= new code(); //less is more<br />~90k.java.concurrent.@annotate. <br />bloomfilters, merkletrees.<br />non-blocking, staged-event-driven.<br />bigtable, dynamo. <br />
  • 43. Current & Future Focus:<br />Distributed Counters, CQL.<br />Simple client.<br />operational smoothening. <br />compaction.<br />
  • 44. Community<br />Robust. Rapid. #<br />Professional support from DataStax.<br />Filesysteminnovatin from Acunu<br />engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..<br />Come join the efforts!<br />
  • 45.
  • 46. Usecase #4: first NoSQL, then scale!<br />simpledb Cassandra<br />mongodb Cassandra<br />
  • 47.
  • 48.
  • 49. Copyright: xkcd<br />
  • 50. Copyright: plantoys<br />… more than one way to do it!<br />
  • 51. Summary -<br />high scale peer-to-peer datastore<br />best friend for <br />multi-region, multi-zone availability.<br />Hadoop – HDFS engulfing the DataWorld<br />
  • 52. Q&A<br />@srisatish<br />
  • 53. NoSQL-<br />Know your queries.<br />

×