Cassandra at no_sql
Upcoming SlideShare
Loading in...5
×
 

Cassandra at no_sql

on

  • 3,677 views

SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop. ...

SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop.

This talk lays out a few talking points for Apache Cassandra.

Statistics

Views

Total Views
3,677
Views on SlideShare
3,656
Embed Views
21

Actions

Likes
2
Downloads
36
Comments
0

5 Embeds 21

https://twitter.com 10
http://twitter.com 6
http://dschool.co 3
http://marakana.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Cassandra at no_sql Cassandra at no_sql Presentation Transcript

  • Apache Cassandra: NoSQL, Yes to Scale!
    srisatishambati
    @srisatish
  • NoSQL-
    Know your queries.
  • points
    Usecases
    Why cassandra?
    Usecase: Hadoop, Brisk
    FUD:Consistency
    Why facebook is not using Cassandra?
    Community, Code, Tools
    Q&A
    View slide
  • Users. Netflix.
    Key by Customer, read-heavy
    Key by Customer:Movie, write-heavy
    View slide
  • TimeSeries: (several customers)
    periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
    Metrics typically way larger dataset than users.
  • Why Cassandra?
  • Operational simplicity
    peer-to-peer
  • Operational simplicity
    peer-to-peer
  • Replication:
    Multi-datacenter
    Multi-region ec2
    Multi-availability zones
  • reads local
    dc1
    dc2
    Replication:
    Multi-datacenter
    Multi-region ec2, aws
    Multi-availability zones
  • 4.21.2011, Amazon Web Services outage:
    “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
  • 4.21.2011, Amazon Web Services outage:
    Netflix was running on AWS.
  • fast durable writes.
    fast reads.
  • Writes
    Sequential, append-only.
    ~1-5ms
  • Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
  • Clients: cql, thrift
    pycassa, phpcassa
    hector, pelops
    (scala, ruby, clojure)
  • Usecase #3: hadoop
    Hdfs cassandra hive
    Logs stats analytics
  • Brisk
    Truly peer-to-peer hadoop.
  • Namenode decomposition, explained.
  • Use column families (tables)
    inode
    sblock
  • near-real time hadoop
    Low latency: cassandra_dc nodes
    Batch Analytics: brisk_dc nodes
  • FUD,
    acronym: fear, uncertainty, doubt.
  • Consistency: R + W > N
    ORACLE, 2-node: R=1, W=2, N=2,(T=2)
    DNS
    * N is replication factor. Not to be confused with T=total #of nodes
  • Tune-able, flexibility.
    For High Consistency:
    read:quorum, write:quorum
    For High Availability:
    high W, low R.
  • Inbox Search:
    600+cores.120+TB (2008)
    Went from 100-500m users.
    Average NoSQL deployment size: ~6-12 nodes.
  • Usecase #5: search
    Apache Solr + Cassandra = Solandra
    Other inbox/file Searches:
    xobni, c3
    github.com/tjake/solandra
  • “Eventual consistency is harder to program.”
    mostly immutable data.
    complex systems at scale.
  • Miscellaneous,
    Myth: data-loss, partial rows.
    writes are durable.
  • Three more reasons for Cassandra...
  • Tools
    AMIs, OpsCenter, DataStax
    AppDynamics
  • B e a u t i f u l C 0 d e
    = new code(); //less is more
    ~90k.java.concurrent.@annotate.
    bloomfilters, merkletrees.
    non-blocking, staged-event-driven.
    bigtable, dynamo.
  • Current & Future Focus:
    Distributed Counters, CQL.
    Simple client.
    operational smoothening.
    compaction.
  • Community
    Robust. Rapid. #
    Professional support from DataStax.
    engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
    Come join the efforts!
  • Usecase #4: first NoSQL, then scale!
    simpledb Cassandra
    mongodb Cassandra
  • Copyright: xkcd
  • Copyright: plantoys
    … more than one way to do it!
  • Summary -
    high scale peer-to-peer
    distributed database.
  • Q&A
    @srisatish