High order bits from cassandra & hadoop
Upcoming SlideShare
Loading in...5
×
 

High order bits from cassandra & hadoop

on

  • 1,976 views

 

Statistics

Views

Total Views
1,976
Views on SlideShare
1,972
Embed Views
4

Actions

Likes
0
Downloads
31
Comments
0

2 Embeds 4

https://www.linkedin.com 3
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

High order bits from cassandra & hadoop High order bits from cassandra & hadoop Presentation Transcript

  • High-order bits from Cassandra & Hadoop
    srisatishambati
    @srisatish
  • Thank You!
    svccg in first page of search results for “cloud” on google!
  • NoSQL-
    Know your queries.
  • points
    Usecases
    Why cassandra?
    Usecase: Hadoop, Brisk
    FUD:Consistency
    Why facebook is not using Cassandra?
    Anti-patterns
    Community, Code, Tools
    Q&A
  • Users. Netflix.
    Key by Customer, read-heavy
    Key by Customer:Movie, write-heavy
  • TimeSeries: (several customers)
    periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
    Metrics typically way larger dataset than users.
  • Why Cassandra?
  • Operational simplicity
    peer-to-peer
  • Operational simplicity
    peer-to-peer
  • Replication:
    Multi-datacenter
    Multi-region ec2
    Multi-availability zones
  • reads local
    dc1
    dc2
    Replication:
    Multi-datacenter
    Multi-region ec2, aws
    Multi-availability zones
  • 4.21.2011, Amazon Web Services outage:
    “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
  • 4.21.2011, Amazon Web Services outage:
    Netflix was running on AWS.
  • fast durable writes.
    fast reads.
  • Writes
    Sequential, append-only.
    ~1-5ms
  • Writes
    Sequential, append-only.
    ~1-5ms
    On cloud: ephemeral disks rock!
  • Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
  • Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
    ssds: improved read performance!
  • Distribution between nodes
    Gossip
    Anti-entropy
    Failure-detector
    L i g h t w e i g h t
  • Clients: cql, thrift
    pycassa, phpcassa
    hector, pelops
    (scala, ruby, clojure)
  • Usecase #3: hadoop
    Hdfs cassandra hive
    Logs stats analytics
  • Brisk
    Truly peer-to-peer hadoop.
  • mv computation
    not data
  • Parallel Execution View
  • jobtracker, tasktracker
    hdfs: namenode, datanode
  • cloudera
    amazon: elastic map reduce
    hortonworks
    mapR
    brisk
  • Tools & Analytics
    Hive, Pig, R
    Karmasphere
    Datameer
    … dozens of stealth startups!
  • Namenode decomposition, explained.
  • Use column families (tables)
    inode
    sblock
  • near-real time hadoop
    Low latency: cassandra_dc nodes
    Batch Analytics: brisk_dc nodes
  • FUD,
    acronym: fear, uncertainty, doubt.
  • Consistency: R + W > N
    ORACLE, 2-node: R=1, W=2, N=2,(T=2)
    DNS
    * N is replication factor. Not to be confused with T=total #of nodes
  • Tune-able, flexibility.
    For High Consistency:
    read:quorum, write:quorum
    For High Availability:
    high W, low R.
  • Inbox Search:
    600+cores.120+TB (2008)
    Went from 100-500m users.
    Average NoSQL deployment size: ~6-12 nodes.
  • Usecase #5: search
    Apache Solr + Cassandra = Solandra
    Other inbox/file Searches:
    xobni, c3
    github.com/tjake/solandra
  • “Eventual consistency is harder to program.”
    mostly immutable data.
    complex systems at scale.
  • Miscellaneous,
    Myth: data-loss, partial rows.
    writes are durable.
  • Anti-Patterns
    Transactions
    Joins
    Read before write
  • Anti-Patterns for cloud
    ebs
    jvm, virtualized
    single region
  • Three good reasons for Cassandra...
  • Tools
    AMIs, OpsCenter, DataStax
    AppDynamics
    Netflix just builds AMIs for deployment!
  • B e a u t i f u l C 0 d e
    = new code(); //less is more
    ~90k.java.concurrent.@annotate.
    bloomfilters, merkletrees.
    non-blocking, staged-event-driven.
    bigtable, dynamo.
  • Current & Future Focus:
    Distributed Counters, CQL.
    Simple client.
    operational smoothening.
    compaction.
  • Community
    Robust. Rapid. #
    Professional support from DataStax.
    Filesysteminnovatin from Acunu
    engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
    Come join the efforts!
  • Usecase #4: first NoSQL, then scale!
    simpledb Cassandra
    mongodb Cassandra
  • Copyright: xkcd
  • Copyright: plantoys
    … more than one way to do it!
  • Summary -
    high scale peer-to-peer datastore
    best friend for
    multi-region, multi-zone availability.
    Hadoop – HDFS engulfing the DataWorld
  • Q&A
    @srisatish
  • NoSQL-
    Know your queries.