Learn Cassandra at edureka!


Published on

Cassandra training course is designed to provide knowledge and skills to become a successful Cassandra developer. In depth knowledge of concepts such as Clusters, Keyspaces, Column familes, Replication, Cassandra’s Data Model, Cassandra’s Architecture, Performance Tuning, How to read and write data and finally how to integrate Cassandra with Hadoop will be covered in this course.

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • On this foil, we shall explain how with the advent of distributed systems, one solution cant solve all the problems stated in the preceding foils. Cassandra can be used for Twitter and Expedia due to high scale and availability where we can compromise on consistency. These usecases also don’t have dynamic queries so cassandra fits in very well. The BookMyShow usecase requires consistency along with scale. We can tradeoff Availability in that case. So MongoDB can be used.In case of Facebook Messenger, Consistency is very much required along with Massive scale. The data is short temporal and large set which rarely gets accessed. Hbase can be used in this case.
  • Another Classification of NoSQL DBs based on implementation
  • Lets take the scenario of a Post OfficeThere are three counters Currency exchange.Stationary Letters and couriersIn centralized approach we have a router or a counter to forward the customer to respective counters.Drawbacks: System will fails if the router fails.In decentralized approach all the systems are identical and no router is there in between.
  • If any node goes down, other node is capable of doing the job. Since each node is identical.
  • The client can control the number of replicas to block on for all updates. This is done by setting the consistency level against the replication factor.Strong consistency is the ability to guarantee that an update is propagated to all locations where that piece of data resides. In a single data centre set up, this would guarantee that all of the servers that should have a copy of the data will have it before the client is acknowledged with a success. In terms of performance, this usually means a cost of a few extra milliseconds to write data to several servers.Eventual consistency means that the client is acknowledged as soon as part of the cluster acknowledges the write. In one case, a single server could acknowledge receiving the data and begin propagating the data to the other servers immediately. This use case would be the best when application performance matters the most. 
  • We can explain some of these. Need not go in details here. We shall be explaining these in the course.
  • Learn Cassandra at edureka!

    1. 1. What are we going to learn today?  New Problems which can’t be handled by traditional RDBMS  Tradeoff between Consistency, Availability, Partition Tolerance ( CAP theorem)  What are the different solutions available?  What is Cassandra?  Use-Cases for Cassandra  Cassandra Features – Tunable Consistency, P2P Architecture, Elastic Scalability, Column Orientation  Demo Application using Cassandra
    2. 2. Twitter – Massive Scale, High Availability
    3. 3. Travel Booking – Scale and Availability
    4. 4. Movie Booking – Consistency and Scale
    5. 5. Facebook Graph Search – Fast, Complex Querying
    6. 6. Facebook Messenger- Consistency and Scale
    7. 7. So, What Is Common?  Huge Data  Fast Random access  Variable Schema  Need of Compression  High Availability  Need for Consistency  Need of Distribution (Sharding)
    8. 8. Brewer’s CAP Theorem http://www.w3resource.com/mongodb/nosql.php Consistency Partition Tolerance Availability CA CP AP RDBMS MongoDB HBase Redis CouchDB Cassandra DynamoDB Riak
    9. 9. NoSQL Landscape Scalability&Speed Query and Navigational Complexity Performance Key-Value Stores Dynamo (Amazon), Voldemort (LinkedIn), Citrusleaf, Membase, Riak, Tokyo Cabinet Big Table Clones BigTable (Google), Cassandra, HBase, Hypertable Document Database CouchOne, MongoDB, Terrastore, OrientDB Graph Databases FlockDB (Twitter), AllegroGraph, DEX, InfoGrid, Neo4J, Sones
    10. 10. Cassandra Usecase – Deep Dive 5000 TPS Caching Layer 300 ~ 500 SQL Transaction 100 ~ 200 SQL Transaction 1000 TPS WEB APPLICATION RDBMS1 Applications Changing Data RDBMS1 Elastic Scale
    11. 11. Using Cassandra 1000 TPS Elastic Scale WEB APPLICATION Applications Changing Data Elastic Scale CASSANDRA 300 ~ 500 SQL Transaction 100 ~ 200 SQL Transaction 5000 TPS
    12. 12.  eCommerce (Travel Portal)  Both B2B & B2C Consumers  High volume of shopping transactions ( > 500 Million Visits / Day)  High volume supply changes (Manual & System) generated.  Huge Inventory Database ( Millions of hotels)  High Read/Write (Thousands Reads & Writes/Second)  Application has to 99.99% Available  Fault Tolerant & Reliable.  Fast & Quick Shopping Experience.  Elastic Scale  Innovative Recommendations & Algorithms.  Should be fast for new changes  Should be cost effective for maintenance.  Development Approaches  Legacy Way (Pure RDBMS)  Augmented (RDBMS + Caching, Heavy Database Hardware)  Using Cassandra Cassandra Use Case -Summary
    13. 13. Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, Tuneably consistent, column-oriented database. What is Apache Cassandra Cassandra Features Open Source Distributed Decentralized Elastically Scalable Highly Scalable Fault Tolerant Tuneably Consistent Column Oriented
    14. 14. Distributed And Decentralised Post Office Decentralised Post Office Centralised CCY Exchange stationary Letter/Couriers Ccy Courier Stationary CCY, Stationary, Lette r/Couriers CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers Ccy Courier Stationary
    15. 15.  Every Node Is Identical.  Peer to Peer Protocol and uses Gossip Protocol to maintain and keep the List of nodes in Sync.  No Single Point of Failure.  No Special Host to Coordinate Activities.  Easier to Operate and Maintain because all nodes are same. CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers Ccy Courier Stationary Distributed And Decentralised
    16. 16.  Types of Scalability  Vertical Scalability  Horizontal Scalability  What is Elastic Scalability?  This is special property of Horizontal Scalability.  The cluster can seamlessly scale up and scale back down without major disruption. Elastic Scalability
    17. 17.  Cluster must accept new nodes without major disruption or reconfiguration. ADD A NODE AND MOVE ON!! CCY, Stationary, Letter/Couriers CCY, Stationary, Le tter/Couriers CCY, Stationary, Letter/Couriers Ccy Courier Stationary CCY, Stationary, Le tter/Couriers  Process should not be restarted  Do not have to change application charges  Don’t have to rebalance data Elastic Scalability
    18. 18.  Highly Available  No Downtime High Availability And Fault Tolerance CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers Ccy Courier Stationary
    19. 19. Tunable Consistency Strong Consistency Eventual Consistency Cassandra enables us to tune the Consistency based on the Application Requirement
    20. 20.  Cassandra was designed specifically from the ground up to take full advantage of multiprocessor/ multicore machines, and to run across many dozens of these machines housed in multiple data centres.  It scales consistently and seamlessly to hundreds of terabytes.  Shows exceptional performance under heavy loads.  Consistently shows very fast throughput for writes per second on a basic commodity workstation. High Performance
    21. 21. Cassandra Terminologies Cluster / Server (Datacenters, Racks, Nodes & Virtual Nodes) Client (Thrift, CQL) Data Model • Key Spaces • Column Families / Super Column Families / System Key Spaces • Primary & Secondary Indexes Fault Tolerance / High Availability • Replication (Simple, Network) • Partitioning (Token Ring, Token Ranges, Random, Ordered, Murmer3) • Snitches (Simple, EC2 etc) • Cluster Communications (Gossip, Seed Nodes) Consistency & Reliability • Any, One, Two, Three, QOURUM, Hinted Handoff • Strong Consistency (Read vs Write) • Anti-Entropy / Read Repairs & Hinted Handoffs. • HeadLog, Bloom Filter, MemTable, SSTable • Compaction (SSTable, Snappy) • Tombstones, Row & Key Caches
    22. 22. Use if your application has :-  Big Data (Billions Of Records Rows & Columns)  Very High Velocity Random Reads & Writes.  Flexible Sparse / Wide Column Requirements.  No Multiple Secondary Index Needs.  Low Latency Use Cases  eCommerce Inventory Cache Use Cases  Time Series / Events Use Cases.  Feed Based Activities / Use Cases. Where to use Cassandra
    23. 23. Where NOT to use Cassandra Don’t Use if you application has :- • Secondary Indexes. • Relational Data. • Transactional (Rollback, Commit) • Primary & Financial Records. • Stringent Security & Authorization Needs On Data • Dynamic Queries on Columns. • Searching Column Data • Low Latency
    24. 24. Cassandra Installation & Configuration • conf/cassandra.yaml • Tools Key Space Setup Column Family / Data Model Setup • Key • Columns & Data Types • Indexes (Primary & Secondary) • Programmatic Consistency Thrift Hector API CQL3 API Application Demo
    25. 25. Questions?