• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cassandra at no_sql
 

Cassandra at no_sql

on

  • 3,601 views

SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop. ...

SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop.

This talk lays out a few talking points for Apache Cassandra.

Statistics

Views

Total Views
3,601
Views on SlideShare
3,580
Embed Views
21

Actions

Likes
2
Downloads
36
Comments
0

5 Embeds 21

https://twitter.com 10
http://twitter.com 6
http://dschool.co 3
http://marakana.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cassandra at no_sql Cassandra at no_sql Presentation Transcript

    • Apache Cassandra: NoSQL, Yes to Scale!
      srisatishambati
      @srisatish
    • NoSQL-
      Know your queries.
    • points
      Usecases
      Why cassandra?
      Usecase: Hadoop, Brisk
      FUD:Consistency
      Why facebook is not using Cassandra?
      Community, Code, Tools
      Q&A
    • Users. Netflix.
      Key by Customer, read-heavy
      Key by Customer:Movie, write-heavy
    • TimeSeries: (several customers)
      periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
      Metrics typically way larger dataset than users.
    • Why Cassandra?
    • Operational simplicity
      peer-to-peer
    • Operational simplicity
      peer-to-peer
    • Replication:
      Multi-datacenter
      Multi-region ec2
      Multi-availability zones
    • reads local
      dc1
      dc2
      Replication:
      Multi-datacenter
      Multi-region ec2, aws
      Multi-availability zones
    • 4.21.2011, Amazon Web Services outage:
      “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
    • 4.21.2011, Amazon Web Services outage:
      Netflix was running on AWS.
    • fast durable writes.
      fast reads.
    • Writes
      Sequential, append-only.
      ~1-5ms
    • Reads
      Local
      Key & row caches, (also, jna-based 0xffheap)
      indexes, materialized
    • Clients: cql, thrift
      pycassa, phpcassa
      hector, pelops
      (scala, ruby, clojure)
    • Usecase #3: hadoop
      Hdfs cassandra hive
      Logs stats analytics
    • Brisk
      Truly peer-to-peer hadoop.
    • Namenode decomposition, explained.
    • Use column families (tables)
      inode
      sblock
    • near-real time hadoop
      Low latency: cassandra_dc nodes
      Batch Analytics: brisk_dc nodes
    • FUD,
      acronym: fear, uncertainty, doubt.
    • Consistency: R + W > N
      ORACLE, 2-node: R=1, W=2, N=2,(T=2)
      DNS
      * N is replication factor. Not to be confused with T=total #of nodes
    • Tune-able, flexibility.
      For High Consistency:
      read:quorum, write:quorum
      For High Availability:
      high W, low R.
    • Inbox Search:
      600+cores.120+TB (2008)
      Went from 100-500m users.
      Average NoSQL deployment size: ~6-12 nodes.
    • Usecase #5: search
      Apache Solr + Cassandra = Solandra
      Other inbox/file Searches:
      xobni, c3
      github.com/tjake/solandra
    • “Eventual consistency is harder to program.”
      mostly immutable data.
      complex systems at scale.
    • Miscellaneous,
      Myth: data-loss, partial rows.
      writes are durable.
    • Three more reasons for Cassandra...
    • Tools
      AMIs, OpsCenter, DataStax
      AppDynamics
    • B e a u t i f u l C 0 d e
      = new code(); //less is more
      ~90k.java.concurrent.@annotate.
      bloomfilters, merkletrees.
      non-blocking, staged-event-driven.
      bigtable, dynamo.
    • Current & Future Focus:
      Distributed Counters, CQL.
      Simple client.
      operational smoothening.
      compaction.
    • Community
      Robust. Rapid. #
      Professional support from DataStax.
      engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
      Come join the efforts!
    • Usecase #4: first NoSQL, then scale!
      simpledb Cassandra
      mongodb Cassandra
    • Copyright: xkcd
    • Copyright: plantoys
      … more than one way to do it!
    • Summary -
      high scale peer-to-peer
      distributed database.
    • Q&A
      @srisatish