• Save

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
941
On Slideshare
531
From Embeds
410
Number of Embeds
4

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 410

http://192.168.1.56 333
http://www.edureka.co 44
http://www.edureka.in 22
http://localhost 11

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. www.edureka.in/cassandraSlide 1
  • 2. www.edureka.in/cassandraSlide 2 Course Structure  Module 1: Getting Started With Cassandra  Module 2: Understanding Cassandra Data Model  Module 3: Understanding Cassandra Architecture  Module 4: Creating Sample Application  Module 5: Configuring, Monitoring, Maintenance and Tuning Cassandra  Module 6: Integrating Cassandra With Hadoop  Module 7: CRUD operations in Cassandra  Module 8: Live Project
  • 3. www.edureka.in/cassandraSlide 3 How it Works?  Live Classes  Class Recordings  Module wise Quizzes, Coding Assignments  24x7 on-demand Technical Support  Sample Application and Live Project  Online Certification Exam  Lifetime access to the Learning Management System
  • 4. www.edureka.in/cassandraSlide 4 Module 1 Getting Started With Cassandra  New Problems which can’t be handled by traditional RDBMS  Tradeoff between Consistency, Availability, Partition Tolerance (CAP theorem)  What are the different solutions available?  What is Cassandra?  Use-Cases for Cassandra  Cassandra Features – Tunable Consistency, P2P Architecture, Elastic Scalability, Col Orientation  Demo Application using Cassandra  Questions?
  • 5. www.edureka.in/cassandraSlide 5 Module 2 Understanding Cassandra Data Model  Understand what database model is.  Understand the analogy between the RDBMS and Cassandra Data Model.  Understand the following Cassandra database elements:  Cluster  Keyspaces  Column Families  Columns  Super Columns  Rows  Indexes in Cassandra  Primary and Composite Keys and their limitations  Design Differences between RDBMS and Cassandra  Materialized Views  Valueless Columns  Aggregate Keys
  • 6. www.edureka.in/cassandraSlide 6 Module 3 Understanding Cassandra Architecture  Learn about the System Keyspaces  Learn about internode communication such as Peer to Peer structure as well as Gossip Protocols  Learn how Cassandra detects the failures in the nodes and repairs it  Learn about Anti Entropy and Read Repair  Learn about the Memtables, Sstables, and Commit logs  Hinted Handoffs  Compaction  Bloom Filters  Tombstones  SEDA  Manager and Services
  • 7. www.edureka.in/cassandraSlide 7 Module 4 Creating Sample Application  Identify challenges faced by RDBMS  Identify various possible available solutions  Identify the rational behind choosing Cassandra  Understand how data modelling differs in Cassandra from traditional relational databases  Understand how queries are used to design Cassandra data model  Apply Cassandra data modelling to various use cases  Create the application which would involve creating various data elements you learned about in Module 2  Perform batch updates and search column families  Overview of the whole project specifying how Cassandra solved the problem which was laid out in the beginning
  • 8. www.edureka.in/cassandraSlide 8 Module 5 Configuring, Monitoring, Maintenance and Tuning Cassandra Learn about various options of configuring Keyspaces and Column Families  Learn about various Cassandra Replacement Strategies  Learn about Replication  Learn about Partitioners  Learn about Snitches  Learn about configuring Cluster  Learn about Security  Learn about Monitoring Cassandra Cluster  Learn about Cassandra Maintenance  Getting Ring information  Basic Maintenance  Snapshots  Load Balancing  Decommissioning and Updating nodes  Learn about Performance Tuning  Data storage, Reply timeouts  Commit Logs, MemTables, Caching and Buffer sizes
  • 9. www.edureka.in/cassandraSlide 9 Integrating Cassandra with Hadoop  Learn what Hadoop is  Learn Hadoop Disribution File System  Learn how to work with Map Reduce  Learn Tools like PIG and HIVE  Learn PIG and HIVE interaction with Cassandra Module 6
  • 10. www.edureka.in/cassandraSlide 10 CRUD Operations in Cassandra  Learn about Reading and writing data in Cassandra  Learn about Cassandra API (Thrift)  Learn about Slice Predicates  Learn Data Definition Language (DDL) in Cassandra  Learn Data Manipulation Language (DML) statements within Cassandra  Learn to execute CQL scripts from with in CQL and from Command prompt  Learn to Create and Modify Users  Learn about Batch Mutates and Batch Deletes  Learn various Security configurations in Cassandra  Learn to Capture CQL outputs to a file  Learn to Import and Export data with CQL Module 7
  • 11. www.edureka.in/cassandraSlide 11 Live Project! Module 8
  • 12. www.edureka.in/cassandraSlide 12 What are we going to learn today?  New Problems which can’t be handled by traditional RDBMS  Tradeoff between Consistency, Availability, Partition Tolerance (CAP theorem)  What are the different solutions available?  What is Cassandra?  Use-Cases for Cassandra  Cassandra Features – Tunable Consistency, P2P Architecture, Elastic Scalability, Column Orientation  Demo Application using Cassandra  Questions
  • 13. www.edureka.in/cassandraSlide 13 Twitter – Massive Scale, High Availability
  • 14. www.edureka.in/cassandraSlide 14 Travel Booking – Scale and Availability
  • 15. www.edureka.in/cassandraSlide 15 Movie Booking – Consistency and Scale
  • 16. www.edureka.in/cassandraSlide 16 Facebook Graph Search – Fast, Complex Querying
  • 17. www.edureka.in/cassandraSlide 17 Facebook Messenger – Consistency and Scale
  • 18. www.edureka.in/cassandraSlide 18 So, What Is Common?  Huge Data  Fast Random access  Variable Schema  Need of Compression  High Availability  Need for Consistency  Need of Distribution (Sharding)
  • 19. www.edureka.in/cassandraSlide 19 NoSQL Database  Non Relational  Distributed  Open Source  Horizontally Scalable  Features of NoSQL Database
  • 20. www.edureka.in/cassandraSlide 20 NoSQL Database types
  • 21. www.edureka.in/cassandraSlide 21 NoSQL Database types CouchDB, MongoDB Collection of key value Connections Incomplete Data Tolerant Query Performance, No Standard Query Syntax Hbase, Cassandra Column Families Fast Look-ups Very Low Level API Amazon Simple DB, Redis Collection of Key Value pairs Fast Look-ups Stored Data has no Schema InfoGrid, Infinite Graph “Property Graph” - Nodes Graph Algorithms – Shortest Path, Connected ness, Etc Not easy to Cluster, traverse whole graph to get answer Data Model Example Weakness Strength Data Model Example Weakness Strength Data Model Example Weakness Strength Data Model Example Weakness Strength Document Data Store Databases Key Value Databases Columnar NoSQL Databases Graph NoSQL Databases No SQL Database Types
  • 22. www.edureka.in/cassandraSlide 22 Welcome To Cassandra!
  • 23. www.edureka.in/cassandraSlide 23 Cassandra Name’s Story Troy Destruction King Priam Hecuba Cassandra Greek God Apollo
  • 24. www.edureka.in/cassandraSlide 24 Why Use Cassandra? Why Use Cassandra…? RDBMS When there is RDBMS!
  • 25. www.edureka.in/cassandraSlide 25 Drawbacks of RDBMS  Scalability  Joins Slow Down  Non-Availability of Data  Queuing
  • 26. www.edureka.in/cassandraSlide 26 Solutions… Vertical Scaling  More Memory  Faster Processor  Upgrading Disks
  • 27. www.edureka.in/cassandraSlide 27 Further Steps… What can go wrong?? ReplicationOr even add boxes in database cluster… Leading to new problems… Consistency Failover Scenario DATA DATA DATA
  • 28. www.edureka.in/cassandraSlide 28 More Steps… Database Configuration Caching Layer Consistency problem between the updates in the Cache and updates in the databases - Problem gets complex over clusters Might mean manipulating the Write - Turning write logs off— Not a desirable situation
  • 29. www.edureka.in/cassandraSlide 29 Current Data Challenges  Massive Data Growth and Scalability  100% Availability  Quick Real Time Analytics  No Failures !
  • 30. www.edureka.in/cassandraSlide 30 Why to use Cassandra? Why to Use Cassandra…? For High Velocity Data Writing Data Anywhere, Everywhere Scaling Writes and Reads No Downtime Scaling Out Strategy Scaling for both READS and WRITES Voluminous Data Data Originating from Multiple Locations Retaining Data for Long Storing all types of Data Delivering Fast Response Time Keeping Business Online and Serving Customers
  • 31. www.edureka.in/cassandraSlide 31 Cassandra Characteristics… For More Details, visit our Blog post…http://www.edureka.in/blog/cassandra-advantages/
  • 32. www.edureka.in/cassandraSlide 32 Column Oriented Emp_no Dept_id Hire_date Emp_In Emp_fn 1 2 2010-08-05 Teresa Annie 2 4 2012-03-10 Ronald Susane 3 3 2012-11-06 Brown Donald 4 3 2011-07-03 Ruth David 5 1 2010-09-12 Stancy Elizabeth 6 2 2012-10-03 Catherine Amelia 1 2 2010-08-05 Teresa Annie 2 4 2012-03-10 Ronald Susane 3 3 2012-11-06 Brown Donald 1 2 3 4 5 2010- 08-05 2012- 03-10 2012- 11-06 2011- 07-03 2010- 09-12 2 4 3 3 1 Row-Oriented Database Column-Oriented Database
  • 33. www.edureka.in/cassandraSlide 33 Schema Free Primary Key First Name Last Name E-mail ID 1 Avril D’Souza NULL 2 David Gomes davidgomes1@yahoo.com 3 Susane NULL NULL First Name Last Name Avril D’Souza First Name Last Name E-mail ID David Gomes davidgomes1@yahoo.com First Name Susane Schema Based Table Schema Free
  • 34. www.edureka.in/cassandraSlide 34 Brewer’s CAP Theorem http://www.w3resource.com/mongodb/nosql.php Consistency Partition Tolerance Availability CA CP AP RDBMS MongoDB HBase Redis CouchDB Cassandra DynamoDB Riak
  • 35. www.edureka.in/cassandraSlide 35 NoSQL Landscape Scalability&Speed Query and Navigational Complexity Performance Key-Value Stores Dynamo (Amazon), Voldemort (LinkedIn), Citrusleaf, Membase, Riak, Tokyo Cabinet Big Table Clones BigTable (Google), Cassandra, HBase, Hypertable Document Database CouchOne, MongoDB, Terrastore, OrientDB Graph Databases FlockDB (Twitter), AllegroGraph, DEX, InfoGrid, Neo4J, Sones
  • 36. www.edureka.in/cassandraSlide 36 Cassandra Usecase – Deep Drive 5000 TPS Caching Layer 300 ~ 500 SQL Transaction 100 ~ 200 SQL Transaction 1000 TPS WEB APPLICATION RDBMS1 Applications Changing Data RDBMS1 Elastic Scale
  • 37. www.edureka.in/cassandraSlide 37 Using Cassandra 1000 TPS Elastic Scale WEB APPLICATION Applications Changing Data Elastic Scale CASSANDRA 300 ~ 500 SQL Transaction 100 ~ 200 SQL Transaction 5000 TPS
  • 38. www.edureka.in/cassandraSlide 38  E-Commerce (Travel Portal)  Both B2B & B2C Consumers  High volume of shopping transactions (> 500 Million Visits / Day)  High volume supply changes (Manual & System) generated.  Huge Inventory Database (Millions of hotels)  High Read/Write (Thousands Reads & Writes/Second)  Application has to 99.99% Available  Fault Tolerant & Reliable.  Fast & Quick Shopping Experience.  Elastic Scale  Innovative Recommendations & Algorithms.  Should be fast for new changes  Should be cost effective for maintenance.  Development Approaches  Legacy Way (Pure RDBMS)  Augmented (RDBMS + Caching, Heavy Database Hardware)  Using Cassandra Cassandra Usecase - Summary
  • 39. www.edureka.in/cassandraSlide 39 Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, Tuneably consistent, column-oriented database. What is Apache Cassandra? Cassandra Features Open Source Distributed Decentralized Elastically Scalable Highly Scalable Fault Tolerant Tuneably Consistent Column Oriented
  • 40. www.edureka.in/cassandraSlide 40 Distributed and Decentralized Post Office Decentralised Post Office Centralised CCY Exchange stationary Letter/Couriers Ccy Courier Stationary CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers Ccy Courier Stationary
  • 41. www.edureka.in/cassandraSlide 41  Every Node Is Identical.  Peer to Peer Protocol and uses Gossip Protocol to maintain and keep the List of nodes in Sync.  No Single Point of Failure.  No Special Host to Coordinate Activities.  Easier to Operate and Maintain because all nodes are same. CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers Ccy Courier Stationary Distributed and Decentralized
  • 42. www.edureka.in/cassandraSlide 42 Types of Scalability  Vertical Scalability  Horizontal Scalability What is Elastic Scalability?  This is special property of Horizontal Scalability.  The cluster can seamlessly scale up and scale back down without major disruption. Elastic Scalability
  • 43. www.edureka.in/cassandraSlide 43  Cluster must accept new nodes without major disruption or reconfiguration. ADD A NODE AND MOVE ON!! CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers Ccy Courier Stationary CCY, Stationary, Letter/Couriers  Process should not be restarted  Do not have to change application charges  Don’t have to rebalance data Elastic Scalability
  • 44. www.edureka.in/cassandraSlide 44  Highly Available  No Downtime High Availability and Fault Tolerance CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers CCY, Stationary, Letter/Couriers Ccy Courier Stationary
  • 45. www.edureka.in/cassandraSlide 45 Tunable Consistency Strong ConsistencyEventual Consistency  Cassandra enables us to tune the Consistency based on the Application Requirement
  • 46. www.edureka.in/cassandraSlide 46  Cassandra was designed specifically from the ground up to take full advantage of multiprocessor/ multicore machines, and to run across many dozens of these machines housed in multiple data centres.  It scales consistently and seamlessly to hundreds of terabytes.  Shows exceptional performance under heavy loads.  Consistently shows very fast throughput for writes per second on a basic commodity workstation. High Performance
  • 47. www.edureka.in/cassandraSlide 47 Use if your application has:  Big Data (Billions Of Records Rows & Columns)  Very High Velocity Random Reads & Writes  Flexible Sparse / Wide Column Requirements  No Multiple Secondary Index Needs  Low Latency Use Cases:  eCommerce Inventory Cache Use Cases  Time Series / Events Use Cases  Feed Based Activities / Use Cases Where to Use Cassandra?
  • 48. www.edureka.in/cassandraSlide 48 Where NOT to Use Cassandra? Don’t Use if you application has:  Secondary Indexes.  Relational Data.  Transactional (Rollback, Commit)  Primary & Financial Records.  Stringent Security & Authorization Needs On Data  Dynamic Queries on Columns.  Searching Column Data  Low Latency
  • 49. www.edureka.in/cassandraSlide 49  Cassandra Installation & Configuration  Conf/cassandra.yaml  Tools  Key Space Setup  Column Family / Data Model Setup  Key  Columns & Data Types  Indexes (Primary & Secondary)  Programmatic Consistency  Thrift Hector API  CQL3 API Application Demo
  • 50. www.edureka.in/cassandraSlide 50 Application Demo
  • 51. www.edureka.in/cassandraSlide 51 Application Demo
  • 52. www.edureka.in/cassandraSlide 52 Application Demo
  • 53. www.edureka.in/cassandraSlide 53 Application Demo
  • 54. www.edureka.in/cassandraSlide 54 Application Demo
  • 55. www.edureka.in/cassandraSlide 55 Application Demo
  • 56. www.edureka.in/cassandraSlide 56 Application Demo
  • 57. www.edureka.in/cassandraSlide 57 Module 2 Understanding Cassandra Data Model  Understand what database model is.  Understand the analogy between the RDBMS and Cassandra Data Model.  Understand the following Cassandra database elements:  Cluster  Keyspaces  Column Families  Columns  Super Columns  Rows  Indexes in Cassandra  Primary and Composite Keys and their limitations  Design Differences between RDBMS and Cassandra  Materialized Views  Valueless Columns  Aggregate Keys
  • 58. www.edureka.in/cassandraSlide 58 Hands On
  • 59. www.edureka.in/cassandraSlide 59 Questions?
  • 60. Thank You See You in Class Next Module