SlideShare a Scribd company logo
1 of 58
Jesus Alberto Guzmán Polanco
jguzman@datum.com.gt
Apache Cassandra Certified @Datum
• Cassandra Overview
• Cassandra Architecture
• Data Modeling
• Datastax Enterprise
"Apache Cassandra is an open source, distributed, decentralized, elastically
scalable, highly available, fault-tolerant, tuneably consistent, column-
oriented database that bases its distribution design on Amazon's Dynamo and
its data model on Google's Bigtable. Created at Facebook, it is now used at
some of the most popular sites on the Web."
Cassandra: The Definitive Guide.
BigTable Dynamo
• Must always be available
• 100% uptime
• Must be easy to manage and maintain
• Linear scalability at lowest cost
• Big Data
• Operational (OLTP) Data Store
• Masterless - No single point of failure
• Always on
• Linear scale performance
• Fast response times
• Always on reliability
• Data replication across multiple data centers and the cloud
• Large amounts of structured, semi-structured, and unstructured data
• Designed expecting failure
• Data partitioned among all nodes in the cluster
• Configurable data replication to ensure uptime
• Linear scalability (performance / storage)
• Keyspace
• Identified by name
• Contains tables ("column families")
• Determines replication factor
• Table
• Identified by name
• Has rows
• Row
• Contains columns (up to 2 billion!)
• Can have different number of columns
• Column
• Identified by name
• Has data type
• Node: A single instance of Cassandra
• Rack: A logical grouping of nodes (optional)
• Data Center: A logical grouping of racks or nodes
• Cluster: A logical grouping of data centers (1 to N)
• Required for each table
• Uniquely identifies row
• Partition Key
• Determines node
• Has one or more columns
• Cluster Key
• Determines disk location (order)
• Has zero or more columns
• Binary search
• Search by: >, >=, <=, <, =
Three Key concepts
• Partitioning (data distribution)
• Replication (fault tolerance)
• Consistency (performance tunable)
• Partitioner
• Generate tokens
• Data distribution
• Partition Keys are hashed into 128bit
• Murmur3 default
Node 1
Node 3
Node 2Node 4
- 263+ 263
• Simplified Token Range: Integers from 0 -> 100
Node 1
Node
3
Node 2Node
4
0100
25
50
75
ID NAME DOB
AB1 John Smith 10/11/1972
AB2 Bob Jones 3/1/1964
ZZ3 Mike West 4/22/1968
WX2 Sally
Thompson
10/15/1969
MNZ Bill Wright 6/6/1966
HASH 17
HASH 79
HASH 14
HASH 32
HASH 51
Node 2
Node 1
Node 2
Node 3
Node 4
• Provides fault tolerance
• Provides geographic distribution
• Copies of each partition are distributed to data centers
• Defined on a schema level (Replication Factor)
RF =1 RF = 2 RF = 3
A123 | JOHN SMITH | 11234
A147 | BOB MARTIN | 32235
B212 | JEN JONES | 43323
• Higher Replication Levels = Greater Fault Tolerance
RF =1 RF = 2 RF = 3
• Assign Replication Factor for each Data Center and schema
APP {
Toronto : 3
San Francisco : 3
Dubai : 3
New York : 3
}
San
Francisco
New York
Dubai
Toronto
• It is the number of REPLICAS that need to respond for a request to be
considered complete (reads and write/updates)
• Consistency Level can is set on every request (normally by default)
DC 1 DC 2
Some Consistency Levels
• Any** (Hints, only in write)
• ONE – one replica must respond
• Quorum – 51% of replicas must respond
• Local_Quorum – 51% of replicas in local data
center
• ALL – all replicas must respond DC 1 DC 2
RF=3 RF=3
How it works in Cassandra
WRITING DATA
RF=3 RF=3
CLIENT
CONSISTENCY LEVEL
LOCAL_QUORUM
How it works in Cassandra
READING DATA
CLIENT
CONSISTENCY LEVEL
ONE
Common:
• One
• Local_Quorum Reads / Writes
• Light Weight Transactions (LWT)
• Application Level Locking (ING*)
DC 1 DC 2
RF=3 RF=3
• Operation = Write/Read
• Operation = Write/Read
• Hints
Coordinator stores missed mutations for later replay
Time out after 3 hours
• Read Repair
• Mismatched results at read trigger a repair for that partition
• Read Repair Chance setting triggers validation of all replicas on small
percentage of reads
• Repair
• Process run on Node / Keyspace to true up data
• Can be run automatically via Opscenter in DSE
• Ensures tombstones are properly evicted during compaction
• Snapshots
• By table, keyspace, node, cluster
• So fast
• So Hard-Link
• Do you need Backups ?
• Data replication
• Data across all nodes
• Cassandra is not an RDBMS
• Distributed changes the rules
• OLTP (not Analytics / Search / ad hoc query)
• Rows are accessed by Partition Key
• De-normalization (No joins)
• Multiple query tables
• Use Solr for Search, Hadoop/Spark for Analytics
• Cassandra Query Language (CQL) is a query language
for the Cassandra database.
• A SQL-like query language for communicating with
Cassandra
• CQLSH
• No Joins
• JSON support
• Upserts
• TTL
• Timestamps
• Collections:
• Set
• List
• Map
• User defined types (UTD)
• Tuples
Track customer transactions by type
DATE CUST_ID TYPE  TIME  CUST NAME LOCATION AMOUNT
PARTITION KEY CLUSTERING COLUMNS
PRIMARY KEY
Track customer transactions by type
DATE CUST_ID TYPE  TIME  CUST NAME LOCATION AMOUNT
10/15/14 A11 DEPOSIT 09:24:33.55 JOHN SMITH 30132 252.50
10/15/14 A11 DEPOSIT 09:25:53.21 JOHN SMITH 30132 63.49
10/15/14 A11 WITHDRAW 12:45:22.23 JOHN SMITH 30060 -300.00
10/15/14 B23 DEPOSIT 08:12:22.32 BOB BARKER 94123 500.00
Partition size considerations
• Defines transitions between models
• Query-driven methodology
• Formal analysis and validation
• Defines a scientific approach to data
modeling
• Modeling rules
• Mapping patterns
• Schema optimization techniques
• ER diagram (Chen notation)
• Describes entities, relationships, roles, keys, cardinalities
• What is possible and what is not in existing or future data
Simple Order Management (queries)
• Q1: Customers by Customer ID
• Q2: Customer by email
• Q3: Product by Product ID
• Q4: Product by Name
• Q5: Product By Category
• Q6: Order Details by Order ID
• Q7: Order Details by Customer / Date
• Logical-level shows column names and properties
• Physical-level also shows the column data type
Founded in April 2010
~40 600+
Santa Clara, Austin, New York, London, Paris
480+
Employees Percent Customers
• Certified Production Cassandra
• Enterprise Security Options
• Integrated Search
• Integrated Analytics (Spark)
• DSE Graph
• Workload Segregation
• In Memory
• OpsCenter
• Management Services
• MDM: Customer 360, Product Catalog
• Personalization and Recommendation
• Internet of Things and Time Series
• Fraud Detection
• List Management
• Messaging
• Inventory Management
• Authentication
• Visual, browser-based user interface.
• Installation, configuration, and administration tasks
carried out in point-and-click fashion.
• Visually supports DataStax Automatic Management
Services.
Introduction to Apache Cassandra

More Related Content

What's hot

OOUG: Oracle transaction locking
OOUG: Oracle transaction lockingOOUG: Oracle transaction locking
OOUG: Oracle transaction lockingKyle Hailey
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksKyle Hailey
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
Oracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesOracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesKyle Hailey
 
Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)Noriyoshi Shinoda
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Mydbops
 
Lucene revolution 2011
Lucene revolution 2011Lucene revolution 2011
Lucene revolution 2011Takahiko Ito
 
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)Tesora
 
Database administration commands
Database administration commands Database administration commands
Database administration commands Varsha Ajith
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
"PostgreSQL and Python" Lightning Talk @EuroPython2014
"PostgreSQL and Python" Lightning Talk @EuroPython2014"PostgreSQL and Python" Lightning Talk @EuroPython2014
"PostgreSQL and Python" Lightning Talk @EuroPython2014Henning Jacobs
 
0888 learning-mysql
0888 learning-mysql0888 learning-mysql
0888 learning-mysqlsabir18
 

What's hot (20)

Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
 
OOUG: Oracle transaction locking
OOUG: Oracle transaction lockingOOUG: Oracle transaction locking
OOUG: Oracle transaction locking
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction Locks
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Oracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesOracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueues
 
Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Rac nonrac clone
Rac nonrac cloneRac nonrac clone
Rac nonrac clone
 
Lucene revolution 2011
Lucene revolution 2011Lucene revolution 2011
Lucene revolution 2011
 
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)
 
Oracle ORA Errors
Oracle ORA ErrorsOracle ORA Errors
Oracle ORA Errors
 
Database administration commands
Database administration commands Database administration commands
Database administration commands
 
Unix Basics For Testers
Unix Basics For TestersUnix Basics For Testers
Unix Basics For Testers
 
8 - OOP - Syntax & Messages
8 - OOP - Syntax & Messages8 - OOP - Syntax & Messages
8 - OOP - Syntax & Messages
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
"PostgreSQL and Python" Lightning Talk @EuroPython2014
"PostgreSQL and Python" Lightning Talk @EuroPython2014"PostgreSQL and Python" Lightning Talk @EuroPython2014
"PostgreSQL and Python" Lightning Talk @EuroPython2014
 
0888 learning-mysql
0888 learning-mysql0888 learning-mysql
0888 learning-mysql
 
9.1 Mystery Tour
9.1 Mystery Tour9.1 Mystery Tour
9.1 Mystery Tour
 
9.1 Grand Tour
9.1 Grand Tour9.1 Grand Tour
9.1 Grand Tour
 
MySQL SQL Tutorial
MySQL SQL TutorialMySQL SQL Tutorial
MySQL SQL Tutorial
 

Viewers also liked

Auctionata brand development concept studio design DRAFT
Auctionata brand development concept studio design DRAFTAuctionata brand development concept studio design DRAFT
Auctionata brand development concept studio design DRAFTMontblanc
 
Design principles
Design principlesDesign principles
Design principlesAhmad Riaz
 
Lecture 2 principals-of-design
Lecture 2 principals-of-designLecture 2 principals-of-design
Lecture 2 principals-of-designKamini Singh
 
Datum Design & Artwork Brochure
Datum Design & Artwork BrochureDatum Design & Artwork Brochure
Datum Design & Artwork BrochureScott Pearce
 
ELEMENTS IN ARCHITECTURE (SPECIAL EMPHASIS ON MUGHAL ARCHITECTURE)
ELEMENTS IN ARCHITECTURE (SPECIAL EMPHASIS ON MUGHAL ARCHITECTURE)ELEMENTS IN ARCHITECTURE (SPECIAL EMPHASIS ON MUGHAL ARCHITECTURE)
ELEMENTS IN ARCHITECTURE (SPECIAL EMPHASIS ON MUGHAL ARCHITECTURE)Sufi Abdul Hakeem
 
Theory of architecture
Theory of architectureTheory of architecture
Theory of architectureKrishna Jhawar
 
03 architectural principles & elements
03 architectural principles & elements03 architectural principles & elements
03 architectural principles & elementsJan Echiverri-Quintano
 
Principles of Design - Graphic Design Theory
Principles of Design - Graphic Design TheoryPrinciples of Design - Graphic Design Theory
Principles of Design - Graphic Design TheoryAbanoub Hanna
 
Gd & t datum targets
Gd & t datum targetsGd & t datum targets
Gd & t datum targetsTarunJujare
 

Viewers also liked (16)

Auctionata brand development concept studio design DRAFT
Auctionata brand development concept studio design DRAFTAuctionata brand development concept studio design DRAFT
Auctionata brand development concept studio design DRAFT
 
Design principles
Design principlesDesign principles
Design principles
 
Services oriented architecture
Services oriented architectureServices oriented architecture
Services oriented architecture
 
Datum systems-DFMA- in precision engg
Datum systems-DFMA- in precision enggDatum systems-DFMA- in precision engg
Datum systems-DFMA- in precision engg
 
Lecture 2 principals-of-design
Lecture 2 principals-of-designLecture 2 principals-of-design
Lecture 2 principals-of-design
 
Datum Design & Artwork Brochure
Datum Design & Artwork BrochureDatum Design & Artwork Brochure
Datum Design & Artwork Brochure
 
Design principal
Design principalDesign principal
Design principal
 
Module 7
Module 7Module 7
Module 7
 
ELEMENTS IN ARCHITECTURE (SPECIAL EMPHASIS ON MUGHAL ARCHITECTURE)
ELEMENTS IN ARCHITECTURE (SPECIAL EMPHASIS ON MUGHAL ARCHITECTURE)ELEMENTS IN ARCHITECTURE (SPECIAL EMPHASIS ON MUGHAL ARCHITECTURE)
ELEMENTS IN ARCHITECTURE (SPECIAL EMPHASIS ON MUGHAL ARCHITECTURE)
 
Tod principle of architecture
Tod principle of architectureTod principle of architecture
Tod principle of architecture
 
Principle of architecture
Principle of architecturePrinciple of architecture
Principle of architecture
 
Theory of architecture
Theory of architectureTheory of architecture
Theory of architecture
 
03 architectural principles & elements
03 architectural principles & elements03 architectural principles & elements
03 architectural principles & elements
 
Principles of Design - Graphic Design Theory
Principles of Design - Graphic Design TheoryPrinciples of Design - Graphic Design Theory
Principles of Design - Graphic Design Theory
 
Gd & t datum targets
Gd & t datum targetsGd & t datum targets
Gd & t datum targets
 
Principles of design
Principles of designPrinciples of design
Principles of design
 

Similar to Introduction to Apache Cassandra

Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataAmazon Web Services
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - DenverJon Haddad
 
Cassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraCassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraDataStax Academy
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Viswanath J
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 

Similar to Introduction to Apache Cassandra (20)

Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - Denver
 
Cassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraCassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache Cassandra
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 

Recently uploaded

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 

Recently uploaded (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 

Introduction to Apache Cassandra

  • 1.
  • 2. Jesus Alberto Guzmán Polanco jguzman@datum.com.gt Apache Cassandra Certified @Datum
  • 3. • Cassandra Overview • Cassandra Architecture • Data Modeling • Datastax Enterprise
  • 4.
  • 5.
  • 6.
  • 7. "Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column- oriented database that bases its distribution design on Amazon's Dynamo and its data model on Google's Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web." Cassandra: The Definitive Guide.
  • 9. • Must always be available • 100% uptime • Must be easy to manage and maintain • Linear scalability at lowest cost • Big Data
  • 10. • Operational (OLTP) Data Store • Masterless - No single point of failure • Always on • Linear scale performance • Fast response times • Always on reliability • Data replication across multiple data centers and the cloud • Large amounts of structured, semi-structured, and unstructured data
  • 11. • Designed expecting failure • Data partitioned among all nodes in the cluster • Configurable data replication to ensure uptime • Linear scalability (performance / storage)
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. • Keyspace • Identified by name • Contains tables ("column families") • Determines replication factor • Table • Identified by name • Has rows • Row • Contains columns (up to 2 billion!) • Can have different number of columns • Column • Identified by name • Has data type
  • 17. • Node: A single instance of Cassandra • Rack: A logical grouping of nodes (optional) • Data Center: A logical grouping of racks or nodes • Cluster: A logical grouping of data centers (1 to N)
  • 18.
  • 19. • Required for each table • Uniquely identifies row • Partition Key • Determines node • Has one or more columns • Cluster Key • Determines disk location (order) • Has zero or more columns • Binary search • Search by: >, >=, <=, <, =
  • 20. Three Key concepts • Partitioning (data distribution) • Replication (fault tolerance) • Consistency (performance tunable)
  • 21. • Partitioner • Generate tokens • Data distribution • Partition Keys are hashed into 128bit • Murmur3 default Node 1 Node 3 Node 2Node 4 - 263+ 263
  • 22. • Simplified Token Range: Integers from 0 -> 100 Node 1 Node 3 Node 2Node 4 0100 25 50 75 ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones 3/1/1964 ZZ3 Mike West 4/22/1968 WX2 Sally Thompson 10/15/1969 MNZ Bill Wright 6/6/1966 HASH 17 HASH 79 HASH 14 HASH 32 HASH 51 Node 2 Node 1 Node 2 Node 3 Node 4
  • 23. • Provides fault tolerance • Provides geographic distribution • Copies of each partition are distributed to data centers • Defined on a schema level (Replication Factor) RF =1 RF = 2 RF = 3 A123 | JOHN SMITH | 11234 A147 | BOB MARTIN | 32235 B212 | JEN JONES | 43323
  • 24. • Higher Replication Levels = Greater Fault Tolerance RF =1 RF = 2 RF = 3
  • 25. • Assign Replication Factor for each Data Center and schema APP { Toronto : 3 San Francisco : 3 Dubai : 3 New York : 3 } San Francisco New York Dubai Toronto
  • 26.
  • 27. • It is the number of REPLICAS that need to respond for a request to be considered complete (reads and write/updates) • Consistency Level can is set on every request (normally by default) DC 1 DC 2
  • 28. Some Consistency Levels • Any** (Hints, only in write) • ONE – one replica must respond • Quorum – 51% of replicas must respond • Local_Quorum – 51% of replicas in local data center • ALL – all replicas must respond DC 1 DC 2 RF=3 RF=3
  • 29. How it works in Cassandra WRITING DATA RF=3 RF=3 CLIENT CONSISTENCY LEVEL LOCAL_QUORUM
  • 30. How it works in Cassandra READING DATA CLIENT CONSISTENCY LEVEL ONE
  • 31. Common: • One • Local_Quorum Reads / Writes • Light Weight Transactions (LWT) • Application Level Locking (ING*) DC 1 DC 2 RF=3 RF=3
  • 32. • Operation = Write/Read
  • 33. • Operation = Write/Read
  • 34.
  • 35.
  • 36.
  • 37. • Hints Coordinator stores missed mutations for later replay Time out after 3 hours • Read Repair • Mismatched results at read trigger a repair for that partition • Read Repair Chance setting triggers validation of all replicas on small percentage of reads • Repair • Process run on Node / Keyspace to true up data • Can be run automatically via Opscenter in DSE • Ensures tombstones are properly evicted during compaction
  • 38.
  • 39. • Snapshots • By table, keyspace, node, cluster • So fast • So Hard-Link • Do you need Backups ? • Data replication • Data across all nodes
  • 40.
  • 41. • Cassandra is not an RDBMS • Distributed changes the rules • OLTP (not Analytics / Search / ad hoc query) • Rows are accessed by Partition Key • De-normalization (No joins) • Multiple query tables • Use Solr for Search, Hadoop/Spark for Analytics
  • 42. • Cassandra Query Language (CQL) is a query language for the Cassandra database. • A SQL-like query language for communicating with Cassandra • CQLSH • No Joins • JSON support • Upserts • TTL • Timestamps
  • 43.
  • 44. • Collections: • Set • List • Map • User defined types (UTD) • Tuples
  • 45. Track customer transactions by type DATE CUST_ID TYPE  TIME  CUST NAME LOCATION AMOUNT PARTITION KEY CLUSTERING COLUMNS PRIMARY KEY
  • 46. Track customer transactions by type DATE CUST_ID TYPE  TIME  CUST NAME LOCATION AMOUNT 10/15/14 A11 DEPOSIT 09:24:33.55 JOHN SMITH 30132 252.50 10/15/14 A11 DEPOSIT 09:25:53.21 JOHN SMITH 30132 63.49 10/15/14 A11 WITHDRAW 12:45:22.23 JOHN SMITH 30060 -300.00 10/15/14 B23 DEPOSIT 08:12:22.32 BOB BARKER 94123 500.00 Partition size considerations
  • 47. • Defines transitions between models • Query-driven methodology • Formal analysis and validation • Defines a scientific approach to data modeling • Modeling rules • Mapping patterns • Schema optimization techniques
  • 48. • ER diagram (Chen notation) • Describes entities, relationships, roles, keys, cardinalities • What is possible and what is not in existing or future data
  • 49. Simple Order Management (queries) • Q1: Customers by Customer ID • Q2: Customer by email • Q3: Product by Product ID • Q4: Product by Name • Q5: Product By Category • Q6: Order Details by Order ID • Q7: Order Details by Customer / Date
  • 50. • Logical-level shows column names and properties • Physical-level also shows the column data type
  • 51.
  • 52.
  • 53.
  • 54. Founded in April 2010 ~40 600+ Santa Clara, Austin, New York, London, Paris 480+ Employees Percent Customers
  • 55. • Certified Production Cassandra • Enterprise Security Options • Integrated Search • Integrated Analytics (Spark) • DSE Graph • Workload Segregation • In Memory • OpsCenter • Management Services
  • 56. • MDM: Customer 360, Product Catalog • Personalization and Recommendation • Internet of Things and Time Series • Fraud Detection • List Management • Messaging • Inventory Management • Authentication
  • 57. • Visual, browser-based user interface. • Installation, configuration, and administration tasks carried out in point-and-click fashion. • Visually supports DataStax Automatic Management Services.