Cassandra at no_sql

  • 3,081 views
Uploaded on

SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop. …

SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop.

This talk lays out a few talking points for Apache Cassandra.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,081
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
36
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Cassandra: NoSQL, Yes to Scale!
    srisatishambati
    @srisatish
  • 2. NoSQL-
    Know your queries.
  • 3. points
    Usecases
    Why cassandra?
    Usecase: Hadoop, Brisk
    FUD:Consistency
    Why facebook is not using Cassandra?
    Community, Code, Tools
    Q&A
  • 4. Users. Netflix.
    Key by Customer, read-heavy
    Key by Customer:Movie, write-heavy
  • 5. TimeSeries: (several customers)
    periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
    Metrics typically way larger dataset than users.
  • 6. Why Cassandra?
  • 7. Operational simplicity
    peer-to-peer
  • 8. Operational simplicity
    peer-to-peer
  • 9. Replication:
    Multi-datacenter
    Multi-region ec2
    Multi-availability zones
  • 10. reads local
    dc1
    dc2
    Replication:
    Multi-datacenter
    Multi-region ec2, aws
    Multi-availability zones
  • 11. 4.21.2011, Amazon Web Services outage:
    “Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
  • 12. 4.21.2011, Amazon Web Services outage:
    Netflix was running on AWS.
  • 13. fast durable writes.
    fast reads.
  • 14. Writes
    Sequential, append-only.
    ~1-5ms
  • 15. Reads
    Local
    Key & row caches, (also, jna-based 0xffheap)
    indexes, materialized
  • 16. Clients: cql, thrift
    pycassa, phpcassa
    hector, pelops
    (scala, ruby, clojure)
  • 17. Usecase #3: hadoop
    Hdfs cassandra hive
    Logs stats analytics
  • 18. Brisk
    Truly peer-to-peer hadoop.
  • 19. Namenode decomposition, explained.
  • 20.
  • 21.
  • 22. Use column families (tables)
    inode
    sblock
  • 23. near-real time hadoop
    Low latency: cassandra_dc nodes
    Batch Analytics: brisk_dc nodes
  • 24. FUD,
    acronym: fear, uncertainty, doubt.
  • 25. Consistency: R + W > N
    ORACLE, 2-node: R=1, W=2, N=2,(T=2)
    DNS
    * N is replication factor. Not to be confused with T=total #of nodes
  • 26. Tune-able, flexibility.
    For High Consistency:
    read:quorum, write:quorum
    For High Availability:
    high W, low R.
  • 27.
  • 28. Inbox Search:
    600+cores.120+TB (2008)
    Went from 100-500m users.
    Average NoSQL deployment size: ~6-12 nodes.
  • 29. Usecase #5: search
    Apache Solr + Cassandra = Solandra
    Other inbox/file Searches:
    xobni, c3
    github.com/tjake/solandra
  • 30. “Eventual consistency is harder to program.”
    mostly immutable data.
    complex systems at scale.
  • 31. Miscellaneous,
    Myth: data-loss, partial rows.
    writes are durable.
  • 32. Three more reasons for Cassandra...
  • 33. Tools
    AMIs, OpsCenter, DataStax
    AppDynamics
  • 34. B e a u t i f u l C 0 d e
    = new code(); //less is more
    ~90k.java.concurrent.@annotate.
    bloomfilters, merkletrees.
    non-blocking, staged-event-driven.
    bigtable, dynamo.
  • 35. Current & Future Focus:
    Distributed Counters, CQL.
    Simple client.
    operational smoothening.
    compaction.
  • 36. Community
    Robust. Rapid. #
    Professional support from DataStax.
    engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
    Come join the efforts!
  • 37.
  • 38. Usecase #4: first NoSQL, then scale!
    simpledb Cassandra
    mongodb Cassandra
  • 39.
  • 40.
  • 41. Copyright: xkcd
  • 42. Copyright: plantoys
    … more than one way to do it!
  • 43. Summary -
    high scale peer-to-peer
    distributed database.
  • 44. Q&A
    @srisatish