High order bits from cassandra & hadoop
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

High order bits from cassandra & hadoop

on

  • 2,017 views

 

Statistics

Views

Total Views
2,017
Views on SlideShare
2,008
Embed Views
9

Actions

Likes
2
Downloads
7
Comments
0

3 Embeds 9

http://paper.li 4
http://www.linkedin.com 3
https://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

High order bits from cassandra & hadoop Presentation Transcript

  • 1. High-order bits from Cassandra & Hadoop
    srisatishambati
    @srisatish
  • 2. NoSQL-
    Know your queries.
  • 3. points
    Usecases
    Why NoSQL?
    Why cassandra?
    Usecase: Hadoop, Brisk
    FUD:Consistency
    Why facebook is not using Cassandra?
    Community, Code, Tools
    Q&A
  • 4. Users. Netflix.
    Key by Customer, read-heavy
    Key by Customer:Movie, write-heavy
  • 5. TimeSeries: (several customers)
    periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
    Metrics typically way larger dataset than users.
  • 6. Why Cassandra?
  • 7. Operational simplicity
    peer-to-peer
  • 8. Operational simplicity
    peer-to-peer
  • 9. Replication:
    Multi-datacenter
    Multi-region ec2
    Multi-availability zones
  • 10. reads local
    dc1
    dc2
    Replication:
    Multi-datacenter
    Multi-region ec2, aws
    Multi-availability zones
  • 11. 4.21.2011, Amazon Web Services outage:
    “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
  • 12. 4.21.2011, Amazon Web Services outage:
    Netflix was running on AWS.
  • 13. fast durable writes.
    fast reads.
  • 14. Writes
    Sequential, append-only.
    ~1-5ms
  • 15. Writes
    Sequential, append-only.
    ~1-5ms
    On cloud: ephemeral disks rock!
  • 16. Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
  • 17. Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
    ssds, improved read performance!
  • 18. Clients: cql, thrift
    pycassa, phpcassa
    hector, pelops
    (scala, ruby, clojure)
  • 19. Usecase #3: hadoop
    Hdfs cassandra hive
    Logs stats analytics
  • 20. Brisk
    Truly peer-to-peer hadoop.
  • 21. mv computation
    not data
  • 22.
  • 23. Parallel Execution View
  • 24.
  • 25. jobtracker, tasktracker
    hdfs: namenode, datanode
  • 26. cloudera
    amazon: elastic map reduce
    hortonworks
    mapR
    brisk
  • 27. Namenode decomposition, explained.
  • 28.
  • 29.
  • 30. Use column families (tables)
    inode
    sblock
  • 31. near-real time hadoop
    Low latency: cassandra_dc nodes
    Batch Analytics: brisk_dc nodes
  • 32. FUD,
    acronym: fear, uncertainty, doubt.
  • 33. Consistency: R + W > N
    ORACLE, 2-node: R=1, W=2, N=2,(T=2)
    DNS
    * N is replication factor. Not to be confused with T=total #of nodes
  • 34. Tune-able, flexibility.
    For High Consistency:
    read:quorum, write:quorum
    For High Availability:
    high W, low R.
  • 35.
  • 36. Inbox Search:
    600+cores.120+TB (2008)
    Went from 100-500m users.
    Average NoSQL deployment size: ~6-12 nodes.
  • 37. Usecase #5: search
    Apache Solr + Cassandra = Solandra
    Other inbox/file Searches:
    xobni, c3
    github.com/tjake/solandra
  • 38. “Eventual consistency is harder to program.”
    mostly immutable data.
    complex systems at scale.
  • 39. Miscellaneous,
    Myth: data-loss, partial rows.
    writes are durable.
  • 40. Three good reasons for Cassandra...
  • 41. Tools
    AMIs, OpsCenter, DataStax
    AppDynamics
  • 42. B e a u t i f u l C 0 d e
    = new code(); //less is more
    ~90k.java.concurrent.@annotate.
    bloomfilters, merkletrees.
    non-blocking, staged-event-driven.
    bigtable, dynamo.
  • 43. Current & Future Focus:
    Distributed Counters, CQL.
    Simple client.
    operational smoothening.
    compaction.
  • 44. Community
    Robust. Rapid. #
    Professional support from DataStax.
    Filesysteminnovatin from Acunu
    engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
    Come join the efforts!
  • 45.
  • 46. Usecase #4: first NoSQL, then scale!
    simpledb Cassandra
    mongodb Cassandra
  • 47.
  • 48.
  • 49. Copyright: xkcd
  • 50. Copyright: plantoys
    … more than one way to do it!
  • 51. Summary -
    high scale peer-to-peer datastore
    best friend for
    multi-region, multi-zone availability.
    Hadoop – HDFS engulfing the DataWorld
  • 52. Q&A
    @srisatish
  • 53. NoSQL-
    Know your queries.