Your SlideShare is downloading. ×
  • Like
High order bits from cassandra & hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

High order bits from cassandra & hadoop

  • 1,405 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,405
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
8
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. High-order bits from Cassandra & Hadoop
    srisatishambati
    @srisatish
  • 2. NoSQL-
    Know your queries.
  • 3. points
    Usecases
    Why NoSQL?
    Why cassandra?
    Usecase: Hadoop, Brisk
    FUD:Consistency
    Why facebook is not using Cassandra?
    Community, Code, Tools
    Q&A
  • 4. Users. Netflix.
    Key by Customer, read-heavy
    Key by Customer:Movie, write-heavy
  • 5. TimeSeries: (several customers)
    periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
    Metrics typically way larger dataset than users.
  • 6. Why Cassandra?
  • 7. Operational simplicity
    peer-to-peer
  • 8. Operational simplicity
    peer-to-peer
  • 9. Replication:
    Multi-datacenter
    Multi-region ec2
    Multi-availability zones
  • 10. reads local
    dc1
    dc2
    Replication:
    Multi-datacenter
    Multi-region ec2, aws
    Multi-availability zones
  • 11. 4.21.2011, Amazon Web Services outage:
    “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
  • 12. 4.21.2011, Amazon Web Services outage:
    Netflix was running on AWS.
  • 13. fast durable writes.
    fast reads.
  • 14. Writes
    Sequential, append-only.
    ~1-5ms
  • 15. Writes
    Sequential, append-only.
    ~1-5ms
    On cloud: ephemeral disks rock!
  • 16. Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
  • 17. Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
    ssds, improved read performance!
  • 18. Clients: cql, thrift
    pycassa, phpcassa
    hector, pelops
    (scala, ruby, clojure)
  • 19. Usecase #3: hadoop
    Hdfs cassandra hive
    Logs stats analytics
  • 20. Brisk
    Truly peer-to-peer hadoop.
  • 21. mv computation
    not data
  • 22.
  • 23. Parallel Execution View
  • 24.
  • 25. jobtracker, tasktracker
    hdfs: namenode, datanode
  • 26. cloudera
    amazon: elastic map reduce
    hortonworks
    mapR
    brisk
  • 27. Namenode decomposition, explained.
  • 28.
  • 29.
  • 30. Use column families (tables)
    inode
    sblock
  • 31. near-real time hadoop
    Low latency: cassandra_dc nodes
    Batch Analytics: brisk_dc nodes
  • 32. FUD,
    acronym: fear, uncertainty, doubt.
  • 33. Consistency: R + W > N
    ORACLE, 2-node: R=1, W=2, N=2,(T=2)
    DNS
    * N is replication factor. Not to be confused with T=total #of nodes
  • 34. Tune-able, flexibility.
    For High Consistency:
    read:quorum, write:quorum
    For High Availability:
    high W, low R.
  • 35.
  • 36. Inbox Search:
    600+cores.120+TB (2008)
    Went from 100-500m users.
    Average NoSQL deployment size: ~6-12 nodes.
  • 37. Usecase #5: search
    Apache Solr + Cassandra = Solandra
    Other inbox/file Searches:
    xobni, c3
    github.com/tjake/solandra
  • 38. “Eventual consistency is harder to program.”
    mostly immutable data.
    complex systems at scale.
  • 39. Miscellaneous,
    Myth: data-loss, partial rows.
    writes are durable.
  • 40. Three good reasons for Cassandra...
  • 41. Tools
    AMIs, OpsCenter, DataStax
    AppDynamics
  • 42. B e a u t i f u l C 0 d e
    = new code(); //less is more
    ~90k.java.concurrent.@annotate.
    bloomfilters, merkletrees.
    non-blocking, staged-event-driven.
    bigtable, dynamo.
  • 43. Current & Future Focus:
    Distributed Counters, CQL.
    Simple client.
    operational smoothening.
    compaction.
  • 44. Community
    Robust. Rapid. #
    Professional support from DataStax.
    Filesysteminnovatin from Acunu
    engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
    Come join the efforts!
  • 45.
  • 46. Usecase #4: first NoSQL, then scale!
    simpledb Cassandra
    mongodb Cassandra
  • 47.
  • 48.
  • 49. Copyright: xkcd
  • 50. Copyright: plantoys
    … more than one way to do it!
  • 51. Summary -
    high scale peer-to-peer datastore
    best friend for
    multi-region, multi-zone availability.
    Hadoop – HDFS engulfing the DataWorld
  • 52. Q&A
    @srisatish
  • 53. NoSQL-
    Know your queries.