UDA SE Tech TalkSep 20 2012                   CASSANDRA
Agenda•What is Cassandra?•Architecture•Why use Cassandra?•Current Consumers/use cases?•Limitations•Demo                2
CASSANDRA    3
What is Cassandra?-Free, Open Source, Distributed database-Written by 2 Facebook Engineers-Hybrid of  -BigTable from Googl...
Architectural Overview           •Independent nodes form cluster           •All nodes peers           •Gossip protocol to ...
Data Partitioning     •Should be decided when setting up     •Total Data managed by Cassandra like a Ring     •Ring is div...
Data Partitioning (Contd)   Partitioning in Multi-Data Center Clusters                           7
Replication         •Replication – process of storing copies of data         •Replication Strategy              •Number of...
Snitches…    •The snitch is a configurable component of cluster    •Defines how the nodes are grouped together    •Types o...
Snitches…(contd)         10
Why Use Cassandra?•Very High Volume writes/reads•All writes HAVE to succeed•Horizontal scalability•Commodity HW•Integratio...
Some Well known Current CustomersWebEx              OoyalaClearspring        OpenwaveCloudkick          OpenXCloudtalk    ...
LimitationsBe aware of these differences when you move  from a relational database to Cassandra.• No transactions,• No JOI...
DEMO
Questions?
Upcoming SlideShare
Loading in...5
×

Cassandra tech talk

315

Published on

Cassandra Tech Talk that I gave for UDA SE and DBA teams on Sep 20 2012. The preso was followed by a demo.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
315
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cassandra tech talk

  1. 1. UDA SE Tech TalkSep 20 2012 CASSANDRA
  2. 2. Agenda•What is Cassandra?•Architecture•Why use Cassandra?•Current Consumers/use cases?•Limitations•Demo 2
  3. 3. CASSANDRA 3
  4. 4. What is Cassandra?-Free, Open Source, Distributed database-Written by 2 Facebook Engineers-Hybrid of -BigTable from Google -DynamoDB from Amazon-For Structured, Semi-structured, Unstructured Data-Designed to scale across commodity servers-Assures AP out of CAP -(Consistency, Availability, Partition Tolerance) 4
  5. 5. Architectural Overview •Independent nodes form cluster •All nodes peers •Gossip protocol to discover/connect nodes •Gossip process runs every second •Nodes exchanges state mesgs with max 3 nodes •Nodes exchange info about themselves/Others •Seed Nodes have cluster info in cassandra.yaml file •All nodes have same seed nodes in their config file •Nodes remember all gossip info since last restart 5
  6. 6. Data Partitioning •Should be decided when setting up •Total Data managed by Cassandra like a Ring •Ring is divided into Ranges •Each node responsible for one or more •Before a node joins it is given a token •Token depends on •Node’s position •Range of data it is responsible for •Column Family partitioned based on row key •For given row key value, ring is walked clockwise until token is within range •2 High Level Partitioning Schemes: - Random Partitioner - Ordered Partitioner •Random Partitioner uses consistent hashing •Ordered Partitioner ensures sorted order. 6
  7. 7. Data Partitioning (Contd) Partitioning in Multi-Data Center Clusters 7
  8. 8. Replication •Replication – process of storing copies of data •Replication Strategy •Number of Replicas •Distro of replicas over the nodes •Relies on cluster configured Snitch •SimpleStrategy - default •NetworkTopologyStrategy •Takes rack, data center into consideration 8
  9. 9. Snitches… •The snitch is a configurable component of cluster •Defines how the nodes are grouped together •Types of snitches: •SimpleSnitch •BriskSimpleSnitch •RackInferringSnitch •PropertyFileSnitch •EC2Snitch •Dynamic Snitching 9
  10. 10. Snitches…(contd) 10
  11. 11. Why Use Cassandra?•Very High Volume writes/reads•All writes HAVE to succeed•Horizontal scalability•Commodity HW•Integration with Hadoop/Hbase/HIVE•SQL Like usage•No Single point of failure•Powerful dynamic Schema data model •Maximum flexibility •Performance at scale 11
  12. 12. Some Well known Current CustomersWebEx OoyalaClearspring OpenwaveCloudkick OpenXCloudtalk Plaxoconnex.io RackspaceConstant Contact RedditDigg SimpleGeoFacebook SoundCloudIBM TwitterNetflix Walmart LabsFormspring YakazMahalo.com 12
  13. 13. LimitationsBe aware of these differences when you move from a relational database to Cassandra.• No transactions,• No JOINs• No foreign keys and keys are immutable• Keys have to be unique• Failed operations may leave changes• Searching is complicated• Super columns and order preserving partitioners are discouraged• Healing from failure is manual• It remembers deletes (until v0.8, at least) 13
  14. 14. DEMO
  15. 15. Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×