High order bits from cassandra & hadoop
Upcoming SlideShare
Loading in...5
×
 

High order bits from cassandra & hadoop

on

  • 1,972 views

 

Statistics

Views

Total Views
1,972
Views on SlideShare
1,963
Embed Views
9

Actions

Likes
2
Downloads
7
Comments
0

3 Embeds 9

http://paper.li 4
http://www.linkedin.com 3
https://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    High order bits from cassandra & hadoop High order bits from cassandra & hadoop Presentation Transcript

    • High-order bits from Cassandra & Hadoop
      srisatishambati
      @srisatish
    • NoSQL-
      Know your queries.
    • points
      Usecases
      Why NoSQL?
      Why cassandra?
      Usecase: Hadoop, Brisk
      FUD:Consistency
      Why facebook is not using Cassandra?
      Community, Code, Tools
      Q&A
    • Users. Netflix.
      Key by Customer, read-heavy
      Key by Customer:Movie, write-heavy
    • TimeSeries: (several customers)
      periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
      Metrics typically way larger dataset than users.
    • Why Cassandra?
    • Operational simplicity
      peer-to-peer
    • Operational simplicity
      peer-to-peer
    • Replication:
      Multi-datacenter
      Multi-region ec2
      Multi-availability zones
    • reads local
      dc1
      dc2
      Replication:
      Multi-datacenter
      Multi-region ec2, aws
      Multi-availability zones
    • 4.21.2011, Amazon Web Services outage:
      “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
    • 4.21.2011, Amazon Web Services outage:
      Netflix was running on AWS.
    • fast durable writes.
      fast reads.
    • Writes
      Sequential, append-only.
      ~1-5ms
    • Writes
      Sequential, append-only.
      ~1-5ms
      On cloud: ephemeral disks rock!
    • Reads
      Local
      Key & row caches, (also, jna-based 0xffheap)
      indexes, materialized
    • Reads
      Local
      Key & row caches, (also, jna-based 0xffheap)
      indexes, materialized
      ssds, improved read performance!
    • Clients: cql, thrift
      pycassa, phpcassa
      hector, pelops
      (scala, ruby, clojure)
    • Usecase #3: hadoop
      Hdfs cassandra hive
      Logs stats analytics
    • Brisk
      Truly peer-to-peer hadoop.
    • mv computation
      not data
    • Parallel Execution View
    • jobtracker, tasktracker
      hdfs: namenode, datanode
    • cloudera
      amazon: elastic map reduce
      hortonworks
      mapR
      brisk
    • Namenode decomposition, explained.
    • Use column families (tables)
      inode
      sblock
    • near-real time hadoop
      Low latency: cassandra_dc nodes
      Batch Analytics: brisk_dc nodes
    • FUD,
      acronym: fear, uncertainty, doubt.
    • Consistency: R + W > N
      ORACLE, 2-node: R=1, W=2, N=2,(T=2)
      DNS
      * N is replication factor. Not to be confused with T=total #of nodes
    • Tune-able, flexibility.
      For High Consistency:
      read:quorum, write:quorum
      For High Availability:
      high W, low R.
    • Inbox Search:
      600+cores.120+TB (2008)
      Went from 100-500m users.
      Average NoSQL deployment size: ~6-12 nodes.
    • Usecase #5: search
      Apache Solr + Cassandra = Solandra
      Other inbox/file Searches:
      xobni, c3
      github.com/tjake/solandra
    • “Eventual consistency is harder to program.”
      mostly immutable data.
      complex systems at scale.
    • Miscellaneous,
      Myth: data-loss, partial rows.
      writes are durable.
    • Three good reasons for Cassandra...
    • Tools
      AMIs, OpsCenter, DataStax
      AppDynamics
    • B e a u t i f u l C 0 d e
      = new code(); //less is more
      ~90k.java.concurrent.@annotate.
      bloomfilters, merkletrees.
      non-blocking, staged-event-driven.
      bigtable, dynamo.
    • Current & Future Focus:
      Distributed Counters, CQL.
      Simple client.
      operational smoothening.
      compaction.
    • Community
      Robust. Rapid. #
      Professional support from DataStax.
      Filesysteminnovatin from Acunu
      engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
      Come join the efforts!
    • Usecase #4: first NoSQL, then scale!
      simpledb Cassandra
      mongodb Cassandra
    • Copyright: xkcd
    • Copyright: plantoys
      … more than one way to do it!
    • Summary -
      high scale peer-to-peer datastore
      best friend for
      multi-region, multi-zone availability.
      Hadoop – HDFS engulfing the DataWorld
    • Q&A
      @srisatish
    • NoSQL-
      Know your queries.