SlideShare a Scribd company logo
1 of 18
Download to read offline
Cassandra
part 2
diegopacheco
@diego_pacheco
Diego Pacheco
@diego_pacheco
❏ Cat's Father
❏ Principal Software Architect
❏ Agile Coach
❏ SOA Expert
❏ DevOps Practitioner
❏ Speaker
❏ Author
diegopacheco
http://diego-pacheco.blogspot.com.br/
https://goo.gl/eEqvzl
About me...
Agenda
❏ RE-CAP
❏ Cassandra Write Path
❏ Tombstones
❏ Compaction Strategies
❏ Row Cache
❏ Bloom Filter
❏ SASI Index
❏ Materialized Views
❏ Counter Families
❏ Anti-Patterns
❏ Cassandra running at UBER in MESOS use case
❏ Q&A
http://cassandra.apache.org/
RE-CAP: Partition Strategy
Cassandra Write Path
❏ SSTable => Sorted Array of Strings.
❏ Write to Disk: Merges and Pre-sorts
happens.
❏ SSTables are IMMUTABLE.
❏ Compaction happens:
❏ Time to time
❏ Prune deleted data
❏ Has thread-offs
Tombstones
❏ Deleted data is MARKED as Removed == Tombstone
❏ Data is deleted and removed during compaction
❏ Compaction can happen in few days depending of the
configs.
❏ Queries on partition with lots of tombstones requires lots of
filtering which can slow down the CASS performance.
❏ Collections operations can lead to tombstones depending
on what you do.
❏ There are Compaction Trade-Offs.
Compaction Strategies
❏ STCS
❏ Default
❏ Insert-Heavy
❏ General Workloads
❏ LCS
❏ Read Heavy
❏ More Updates than
Inserts
❏ DTCS
❏ Time Series
❏ Inserts out of order
❏ Updates for old data
Cassandra ROW CACHE
❏ Buffer FULL merged row into memory
❏ Increase a lot the throughput
❏ Row Cache works with Key Cache
❏ Key Cache = Where the partition is on DISK.
CREATE TABLE status (
user text,
status_id timeuuid,
status text,
PRIMARY KEY (user, status_id))
WITH CLUSTERING ORDER BY (status_id DESC)
AND caching = '{"keys":"ALL", "rows_per_partition":"10"}'
Cassandra Bloom Filter
❏ Bloom Filter: Technique created on the 70s to filter db matches.
❏ Space Efficient
❏ Probabilistic Data Structures
❏ For each SSTable there is a Bloom Filter
❏ Used for Index scans - not used to range scans
❏ Stored OFF HEAP
❏ Tunable per TABLE
❏ Cassandra uses bloom filters to know if the data is on the ROW or not.
Cassandra READ Path
SASI
❏ Secondary Index: Not the primary key.
❏ Lookup tables: bySomething
❏ Distributed Index
❏ Search Like Capabilities: %diego%
❏ Great when:
❏ Multi fields Search
❏ You know the partition key
❏ Indexing static columns
❏ Issues:
❏ More than 1000 rows returned
❏ Searching in Large Partitions
❏ Aggressive Read SLOs
❏ Search for Analytics(Use Spark/Flink)
❏ Ordering Search is important
SASI
Samples
❏ SELECT * FROM users WHERE firstname LIKE 'Die%';
❏ SELECT * FROM users WHERE lastname LIKE '%ie%';
❏ SELECT * FROM users WHERE
created_date > '2015-01-02' AND created_date < '2017-01-02';
Materialized Views
❏ Automated - Table managed for you, Denormalization
❏ Copies of the data in different partitions / replicas
❏ Some Write penalty but acceptable performance
❏ Store results in table which can be indexed
❏ Update ASYNC
❏ Great For:
❏ Caching
❏ Result Sets
❏ Dashbaords
SAMPLE
CREATE MATERIALIZED VIEW all_time_high AS
SELECT user FROM scores WHERE
game IS NOT NULL AND
score IS NOT NULL
PRIMARY KEY (game,score) WITH CLUSTERING ORDER BY (score DESC)
Cassandra Counter Family
❏ Static VS Dynamic Column families
❏ Dynamic Column families A.K.A Wide Rows
❏ Wide Rows is good for: Ordering,Grouping and Filtering.
❏ Wide Rows are not split into NODES.
❏ Counters Internally:
❏ Calculated and sum of all replicas
❏ Split into fragments called SHARDs.
❏ Logical clock monotonically increased
❏ 3 tuple = { NODE_COUNTER_ID, SHARD_LOGICAL_CLOCK, SHARD_VALUE }
Anti-Patterns
❏ Using Cassandra as a queue or queue-like table
❏ Tombstones
❏ Lots of deleted columns(expiry) and slice-queries don't play well
❏ http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
❏ CQL Nulls
❏ Reading Tombstones
❏ Write NULL create tombstones
❏ Intensive Updates on SAME column
❏ Sensor table (ID,VALUE)
❏ Physical Limits
❏ Solution: Timestamp as cluster key.
Cassandra at UBER using MESOS (2016 data)
Cassandra
part 2
diegopacheco
@diego_pacheco
Diego Pacheco

More Related Content

Viewers also liked

Lean/Agile/DevOps 2016 part 3
Lean/Agile/DevOps 2016 part 3Lean/Agile/DevOps 2016 part 3
Lean/Agile/DevOps 2016 part 3Diego Pacheco
 
Microservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWSMicroservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWSDiego Pacheco
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Diego Pacheco
 
Lean/Agile/DevOps 2016 part 1
Lean/Agile/DevOps 2016  part 1Lean/Agile/DevOps 2016  part 1
Lean/Agile/DevOps 2016 part 1Diego Pacheco
 
DevOps: Plain English Business Benefits
DevOps: Plain English Business BenefitsDevOps: Plain English Business Benefits
DevOps: Plain English Business BenefitsDiego Pacheco
 
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014Amazon Web Services
 
TI na ERA DEVOPS
TI na ERA DEVOPSTI na ERA DEVOPS
TI na ERA DEVOPSilegra
 
Stream Processing with Kafka and Samza
Stream Processing with Kafka and SamzaStream Processing with Kafka and Samza
Stream Processing with Kafka and SamzaDiego Pacheco
 
Spring framework 2.5
Spring framework 2.5Spring framework 2.5
Spring framework 2.5Diego Pacheco
 

Viewers also liked (20)

Elassandra
ElassandraElassandra
Elassandra
 
Lean/Agile/DevOps 2016 part 3
Lean/Agile/DevOps 2016 part 3Lean/Agile/DevOps 2016 part 3
Lean/Agile/DevOps 2016 part 3
 
Dev opsdaykeynote
Dev opsdaykeynoteDev opsdaykeynote
Dev opsdaykeynote
 
Microservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWSMicroservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWS
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
 
Lean/Agile/DevOps 2016 part 1
Lean/Agile/DevOps 2016  part 1Lean/Agile/DevOps 2016  part 1
Lean/Agile/DevOps 2016 part 1
 
Microservices
MicroservicesMicroservices
Microservices
 
Pattern matchind and case classes
Pattern matchind and case classesPattern matchind and case classes
Pattern matchind and case classes
 
Apresentação play framework
Apresentação play frameworkApresentação play framework
Apresentação play framework
 
Play Framework
Play FrameworkPlay Framework
Play Framework
 
Pattern matching and case classes
Pattern matching and case classesPattern matching and case classes
Pattern matching and case classes
 
Scala
ScalaScala
Scala
 
Highorderfunctions
HighorderfunctionsHighorderfunctions
Highorderfunctions
 
Apresentação angular js
Apresentação angular jsApresentação angular js
Apresentação angular js
 
DevOps: Plain English Business Benefits
DevOps: Plain English Business BenefitsDevOps: Plain English Business Benefits
DevOps: Plain English Business Benefits
 
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
(ARC312) Processing Money in the Cloud | AWS re:Invent 2014
 
TI na ERA DEVOPS
TI na ERA DEVOPSTI na ERA DEVOPS
TI na ERA DEVOPS
 
Stream Processing with Kafka and Samza
Stream Processing with Kafka and SamzaStream Processing with Kafka and Samza
Stream Processing with Kafka and Samza
 
Spring framework 2.5
Spring framework 2.5Spring framework 2.5
Spring framework 2.5
 
Cassandra
CassandraCassandra
Cassandra
 

Similar to Apache Cassandra - part 2

Experiences building a multi region cassandra operations orchestrator on aws
Experiences building a multi region cassandra operations orchestrator on awsExperiences building a multi region cassandra operations orchestrator on aws
Experiences building a multi region cassandra operations orchestrator on awsDiego Pacheco
 
Cloud-Native DevOps Engineering
Cloud-Native DevOps EngineeringCloud-Native DevOps Engineering
Cloud-Native DevOps EngineeringDiego Pacheco
 
My Database Skills Killed the Server
My Database Skills Killed the ServerMy Database Skills Killed the Server
My Database Skills Killed the ServerColdFusionConference
 
My SQL Skills Killed the Server
My SQL Skills Killed the ServerMy SQL Skills Killed the Server
My SQL Skills Killed the ServerdevObjective
 
Don't you (forget about me) - PHP Meetup Lisboa 2023
Don't you (forget about me) - PHP Meetup Lisboa 2023Don't you (forget about me) - PHP Meetup Lisboa 2023
Don't you (forget about me) - PHP Meetup Lisboa 2023Bernd Alter
 
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...Gianmario Spacagna
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaJose Mº Muñoz
 
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customersLunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customersDaniel Zivkovic
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupGianmario Spacagna
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScyllaDB
 
Dynomite Eureka Registry With Prana
Dynomite Eureka Registry With PranaDynomite Eureka Registry With Prana
Dynomite Eureka Registry With PranaDiego Pacheco
 
Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)
Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)
Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)Richard Low
 
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark DataStax Academy
 
Using cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb dataUsing cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb dataRamesh Veeramani
 

Similar to Apache Cassandra - part 2 (20)

NoSQL
NoSQLNoSQL
NoSQL
 
Experiences building a multi region cassandra operations orchestrator on aws
Experiences building a multi region cassandra operations orchestrator on awsExperiences building a multi region cassandra operations orchestrator on aws
Experiences building a multi region cassandra operations orchestrator on aws
 
Cloud-Native DevOps Engineering
Cloud-Native DevOps EngineeringCloud-Native DevOps Engineering
Cloud-Native DevOps Engineering
 
My Database Skills Killed the Server
My Database Skills Killed the ServerMy Database Skills Killed the Server
My Database Skills Killed the Server
 
My SQL Skills Killed the Server
My SQL Skills Killed the ServerMy SQL Skills Killed the Server
My SQL Skills Killed the Server
 
Sql killedserver
Sql killedserverSql killedserver
Sql killedserver
 
Don't you (forget about me) - PHP Meetup Lisboa 2023
Don't you (forget about me) - PHP Meetup Lisboa 2023Don't you (forget about me) - PHP Meetup Lisboa 2023
Don't you (forget about me) - PHP Meetup Lisboa 2023
 
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Cassandra Redis
Cassandra RedisCassandra Redis
Cassandra Redis
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customersLunch & Learn BigQuery & Firebase from other Google Cloud customers
Lunch & Learn BigQuery & Firebase from other Google Cloud customers
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
 
Dynomite Eureka Registry With Prana
Dynomite Eureka Registry With PranaDynomite Eureka Registry With Prana
Dynomite Eureka Registry With Prana
 
Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)
Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)
Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)
 
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark
 
Real world capacity
Real world capacityReal world capacity
Real world capacity
 
Using cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb dataUsing cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb data
 

More from Diego Pacheco

Naming Things Book : Simple Book Review!
Naming Things Book : Simple Book Review!Naming Things Book : Simple Book Review!
Naming Things Book : Simple Book Review!Diego Pacheco
 
Continuous Discovery Habits Book Review.pdf
Continuous Discovery Habits  Book Review.pdfContinuous Discovery Habits  Book Review.pdf
Continuous Discovery Habits Book Review.pdfDiego Pacheco
 
Encryption Deep Dive
Encryption Deep DiveEncryption Deep Dive
Encryption Deep DiveDiego Pacheco
 
Management: Doing the non-obvious! III
Management: Doing the non-obvious! IIIManagement: Doing the non-obvious! III
Management: Doing the non-obvious! IIIDiego Pacheco
 
Design is not Subjective
Design is not SubjectiveDesign is not Subjective
Design is not SubjectiveDiego Pacheco
 
Architecture & Engineering : Doing the non-obvious!
Architecture & Engineering :  Doing the non-obvious!Architecture & Engineering :  Doing the non-obvious!
Architecture & Engineering : Doing the non-obvious!Diego Pacheco
 
Management doing the non-obvious II
Management doing the non-obvious II Management doing the non-obvious II
Management doing the non-obvious II Diego Pacheco
 
Testing in production
Testing in productionTesting in production
Testing in productionDiego Pacheco
 
Nine lies about work
Nine lies about workNine lies about work
Nine lies about workDiego Pacheco
 
Management: doing the nonobvious!
Management: doing the nonobvious!Management: doing the nonobvious!
Management: doing the nonobvious!Diego Pacheco
 
Dealing with dependencies
Dealing  with dependenciesDealing  with dependencies
Dealing with dependenciesDiego Pacheco
 
Dealing with dependencies in tests
Dealing  with dependencies in testsDealing  with dependencies in tests
Dealing with dependencies in testsDiego Pacheco
 

More from Diego Pacheco (20)

Naming Things Book : Simple Book Review!
Naming Things Book : Simple Book Review!Naming Things Book : Simple Book Review!
Naming Things Book : Simple Book Review!
 
Continuous Discovery Habits Book Review.pdf
Continuous Discovery Habits  Book Review.pdfContinuous Discovery Habits  Book Review.pdf
Continuous Discovery Habits Book Review.pdf
 
Holacracy
HolacracyHolacracy
Holacracy
 
AWS IAM
AWS IAMAWS IAM
AWS IAM
 
Encryption Deep Dive
Encryption Deep DiveEncryption Deep Dive
Encryption Deep Dive
 
Sec 101
Sec 101Sec 101
Sec 101
 
Management: Doing the non-obvious! III
Management: Doing the non-obvious! IIIManagement: Doing the non-obvious! III
Management: Doing the non-obvious! III
 
Design is not Subjective
Design is not SubjectiveDesign is not Subjective
Design is not Subjective
 
Architecture & Engineering : Doing the non-obvious!
Architecture & Engineering :  Doing the non-obvious!Architecture & Engineering :  Doing the non-obvious!
Architecture & Engineering : Doing the non-obvious!
 
Management doing the non-obvious II
Management doing the non-obvious II Management doing the non-obvious II
Management doing the non-obvious II
 
Testing in production
Testing in productionTesting in production
Testing in production
 
Nine lies about work
Nine lies about workNine lies about work
Nine lies about work
 
Management: doing the nonobvious!
Management: doing the nonobvious!Management: doing the nonobvious!
Management: doing the nonobvious!
 
AI and the Future
AI and the FutureAI and the Future
AI and the Future
 
Dealing with dependencies
Dealing  with dependenciesDealing  with dependencies
Dealing with dependencies
 
Dealing with dependencies in tests
Dealing  with dependencies in testsDealing  with dependencies in tests
Dealing with dependencies in tests
 
Kanban 2020
Kanban 2020Kanban 2020
Kanban 2020
 
Lean 2020
Lean 2020Lean 2020
Lean 2020
 
Hardening
HardeningHardening
Hardening
 
Design 101
Design 101Design 101
Design 101
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Apache Cassandra - part 2

  • 2. @diego_pacheco ❏ Cat's Father ❏ Principal Software Architect ❏ Agile Coach ❏ SOA Expert ❏ DevOps Practitioner ❏ Speaker ❏ Author diegopacheco http://diego-pacheco.blogspot.com.br/ https://goo.gl/eEqvzl About me...
  • 3. Agenda ❏ RE-CAP ❏ Cassandra Write Path ❏ Tombstones ❏ Compaction Strategies ❏ Row Cache ❏ Bloom Filter ❏ SASI Index ❏ Materialized Views ❏ Counter Families ❏ Anti-Patterns ❏ Cassandra running at UBER in MESOS use case ❏ Q&A
  • 6. Cassandra Write Path ❏ SSTable => Sorted Array of Strings. ❏ Write to Disk: Merges and Pre-sorts happens. ❏ SSTables are IMMUTABLE. ❏ Compaction happens: ❏ Time to time ❏ Prune deleted data ❏ Has thread-offs
  • 7. Tombstones ❏ Deleted data is MARKED as Removed == Tombstone ❏ Data is deleted and removed during compaction ❏ Compaction can happen in few days depending of the configs. ❏ Queries on partition with lots of tombstones requires lots of filtering which can slow down the CASS performance. ❏ Collections operations can lead to tombstones depending on what you do. ❏ There are Compaction Trade-Offs.
  • 8. Compaction Strategies ❏ STCS ❏ Default ❏ Insert-Heavy ❏ General Workloads ❏ LCS ❏ Read Heavy ❏ More Updates than Inserts ❏ DTCS ❏ Time Series ❏ Inserts out of order ❏ Updates for old data
  • 9. Cassandra ROW CACHE ❏ Buffer FULL merged row into memory ❏ Increase a lot the throughput ❏ Row Cache works with Key Cache ❏ Key Cache = Where the partition is on DISK. CREATE TABLE status ( user text, status_id timeuuid, status text, PRIMARY KEY (user, status_id)) WITH CLUSTERING ORDER BY (status_id DESC) AND caching = '{"keys":"ALL", "rows_per_partition":"10"}'
  • 10. Cassandra Bloom Filter ❏ Bloom Filter: Technique created on the 70s to filter db matches. ❏ Space Efficient ❏ Probabilistic Data Structures ❏ For each SSTable there is a Bloom Filter ❏ Used for Index scans - not used to range scans ❏ Stored OFF HEAP ❏ Tunable per TABLE ❏ Cassandra uses bloom filters to know if the data is on the ROW or not.
  • 12. SASI ❏ Secondary Index: Not the primary key. ❏ Lookup tables: bySomething ❏ Distributed Index ❏ Search Like Capabilities: %diego% ❏ Great when: ❏ Multi fields Search ❏ You know the partition key ❏ Indexing static columns ❏ Issues: ❏ More than 1000 rows returned ❏ Searching in Large Partitions ❏ Aggressive Read SLOs ❏ Search for Analytics(Use Spark/Flink) ❏ Ordering Search is important
  • 13. SASI Samples ❏ SELECT * FROM users WHERE firstname LIKE 'Die%'; ❏ SELECT * FROM users WHERE lastname LIKE '%ie%'; ❏ SELECT * FROM users WHERE created_date > '2015-01-02' AND created_date < '2017-01-02';
  • 14. Materialized Views ❏ Automated - Table managed for you, Denormalization ❏ Copies of the data in different partitions / replicas ❏ Some Write penalty but acceptable performance ❏ Store results in table which can be indexed ❏ Update ASYNC ❏ Great For: ❏ Caching ❏ Result Sets ❏ Dashbaords SAMPLE CREATE MATERIALIZED VIEW all_time_high AS SELECT user FROM scores WHERE game IS NOT NULL AND score IS NOT NULL PRIMARY KEY (game,score) WITH CLUSTERING ORDER BY (score DESC)
  • 15. Cassandra Counter Family ❏ Static VS Dynamic Column families ❏ Dynamic Column families A.K.A Wide Rows ❏ Wide Rows is good for: Ordering,Grouping and Filtering. ❏ Wide Rows are not split into NODES. ❏ Counters Internally: ❏ Calculated and sum of all replicas ❏ Split into fragments called SHARDs. ❏ Logical clock monotonically increased ❏ 3 tuple = { NODE_COUNTER_ID, SHARD_LOGICAL_CLOCK, SHARD_VALUE }
  • 16. Anti-Patterns ❏ Using Cassandra as a queue or queue-like table ❏ Tombstones ❏ Lots of deleted columns(expiry) and slice-queries don't play well ❏ http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets ❏ CQL Nulls ❏ Reading Tombstones ❏ Write NULL create tombstones ❏ Intensive Updates on SAME column ❏ Sensor table (ID,VALUE) ❏ Physical Limits ❏ Solution: Timestamp as cluster key.
  • 17. Cassandra at UBER using MESOS (2016 data)