High order bits from cassandra & hadoop
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

High order bits from cassandra & hadoop

on

  • 2,002 views

 

Statistics

Views

Total Views
2,002
Views on SlideShare
1,998
Embed Views
4

Actions

Likes
0
Downloads
31
Comments
0

2 Embeds 4

https://www.linkedin.com 3
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

High order bits from cassandra & hadoop Presentation Transcript

  • 1. High-order bits from Cassandra & Hadoop
    srisatishambati
    @srisatish
  • 2. Thank You!
    svccg in first page of search results for “cloud” on google!
  • 3. NoSQL-
    Know your queries.
  • 4. points
    Usecases
    Why cassandra?
    Usecase: Hadoop, Brisk
    FUD:Consistency
    Why facebook is not using Cassandra?
    Anti-patterns
    Community, Code, Tools
    Q&A
  • 5. Users. Netflix.
    Key by Customer, read-heavy
    Key by Customer:Movie, write-heavy
  • 6. TimeSeries: (several customers)
    periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
    Metrics typically way larger dataset than users.
  • 7. Why Cassandra?
  • 8. Operational simplicity
    peer-to-peer
  • 9. Operational simplicity
    peer-to-peer
  • 10. Replication:
    Multi-datacenter
    Multi-region ec2
    Multi-availability zones
  • 11. reads local
    dc1
    dc2
    Replication:
    Multi-datacenter
    Multi-region ec2, aws
    Multi-availability zones
  • 12. 4.21.2011, Amazon Web Services outage:
    “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
  • 13. 4.21.2011, Amazon Web Services outage:
    Netflix was running on AWS.
  • 14. fast durable writes.
    fast reads.
  • 15. Writes
    Sequential, append-only.
    ~1-5ms
  • 16. Writes
    Sequential, append-only.
    ~1-5ms
    On cloud: ephemeral disks rock!
  • 17. Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
  • 18. Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
    ssds: improved read performance!
  • 19. Distribution between nodes
    Gossip
    Anti-entropy
    Failure-detector
    L i g h t w e i g h t
  • 20. Clients: cql, thrift
    pycassa, phpcassa
    hector, pelops
    (scala, ruby, clojure)
  • 21. Usecase #3: hadoop
    Hdfs cassandra hive
    Logs stats analytics
  • 22. Brisk
    Truly peer-to-peer hadoop.
  • 23. mv computation
    not data
  • 24.
  • 25. Parallel Execution View
  • 26.
  • 27. jobtracker, tasktracker
    hdfs: namenode, datanode
  • 28. cloudera
    amazon: elastic map reduce
    hortonworks
    mapR
    brisk
  • 29. Tools & Analytics
    Hive, Pig, R
    Karmasphere
    Datameer
    … dozens of stealth startups!
  • 30. Namenode decomposition, explained.
  • 31.
  • 32.
  • 33. Use column families (tables)
    inode
    sblock
  • 34. near-real time hadoop
    Low latency: cassandra_dc nodes
    Batch Analytics: brisk_dc nodes
  • 35. FUD,
    acronym: fear, uncertainty, doubt.
  • 36. Consistency: R + W > N
    ORACLE, 2-node: R=1, W=2, N=2,(T=2)
    DNS
    * N is replication factor. Not to be confused with T=total #of nodes
  • 37. Tune-able, flexibility.
    For High Consistency:
    read:quorum, write:quorum
    For High Availability:
    high W, low R.
  • 38.
  • 39. Inbox Search:
    600+cores.120+TB (2008)
    Went from 100-500m users.
    Average NoSQL deployment size: ~6-12 nodes.
  • 40. Usecase #5: search
    Apache Solr + Cassandra = Solandra
    Other inbox/file Searches:
    xobni, c3
    github.com/tjake/solandra
  • 41. “Eventual consistency is harder to program.”
    mostly immutable data.
    complex systems at scale.
  • 42. Miscellaneous,
    Myth: data-loss, partial rows.
    writes are durable.
  • 43. Anti-Patterns
    Transactions
    Joins
    Read before write
  • 44. Anti-Patterns for cloud
    ebs
    jvm, virtualized
    single region
  • 45. Three good reasons for Cassandra...
  • 46. Tools
    AMIs, OpsCenter, DataStax
    AppDynamics
    Netflix just builds AMIs for deployment!
  • 47. B e a u t i f u l C 0 d e
    = new code(); //less is more
    ~90k.java.concurrent.@annotate.
    bloomfilters, merkletrees.
    non-blocking, staged-event-driven.
    bigtable, dynamo.
  • 48. Current & Future Focus:
    Distributed Counters, CQL.
    Simple client.
    operational smoothening.
    compaction.
  • 49. Community
    Robust. Rapid. #
    Professional support from DataStax.
    Filesysteminnovatin from Acunu
    engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
    Come join the efforts!
  • 50.
  • 51. Usecase #4: first NoSQL, then scale!
    simpledb Cassandra
    mongodb Cassandra
  • 52.
  • 53.
  • 54.
  • 55. Copyright: xkcd
  • 56. Copyright: plantoys
    … more than one way to do it!
  • 57. Summary -
    high scale peer-to-peer datastore
    best friend for
    multi-region, multi-zone availability.
    Hadoop – HDFS engulfing the DataWorld
  • 58. Q&A
    @srisatish
  • 59. NoSQL-
    Know your queries.