This document discusses replacing external caching solutions with using the internal caching capabilities of ScyllaDB. It provides examples of companies that improved performance, reduced costs and complexity by moving from Redis or Elasticsearch with an external cache to using ScyllaDB's embedded cache instead. The document also outlines some of the advantages of ScyllaDB's cache like improved latency, coherency with the database and observability compared to external caching layers.
3. Tomasz Grabiec (Tomek), Distinguished Engineer at ScyllaDB
Felipe Mendes, Solution Architect at ScyllaDB
Replacing Your Cache
with ScyllaDB
4. + For data-intensive applications that require high
throughput and predictable low latencies
+ Close-to-the-metal design takes full advantage of
modern infrastructure
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ Compatible with Apache Cassandra and Amazon
DynamoDB
+ DBaaS/Cloud, Enterprise and Open Source
solutions
The Database for Gamechangers
4
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
5. 5
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate fleet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M
transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance
for on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine
6. Introductions
Tomasz Grabiec, Distinguished Engineer at ScyllaDB
+ Core engineer and maintainer at ScyllaDB since its inception
+ Started coding when Commodore 64 was still a thing
+ Lives in Cracow, Poland
Felipe Mendes, Solution Architect at ScyllaDB
+ Published Author on Linux and Databases
+ Helps teams solve their most challenging problems
+ Years of experience with Linux and distributed systems
7. Agenda
+ Why Cache?
+ How can ScyllaDB help?
+ Caching Strategies
+ ScyllaDB Cache design
+ External Cache Hiccups
+ ScyllaDB as a Cache Replacement
9. unique
9
Our technology
Horizontal & Vertical Scaling
Unique Close-to-Metal Architecture
Built in C++
(no Java overhead)
Everything
Asynchronous
Shared Nothing Shard per Core Specialized Cache
Network
Processor NUMA
Storage
10. Lower Consistent Latency -> Higher
Revenue
insideline.com site to reduce load times
from nine seconds to 1.4 seconds, ad
revenue increased three percent, and page
views-per-session went up 17 percent.
https://www.thinkwithgoogle.com/future-of-marketing/digital-transformation/the-
google-gospel-of-speed-urs-hoelzle/
https://www.globaldots.com/resources/blog/latency-is-having-a-huge-negative-impact-on-ecommerce-
companies
https://www.fastcompany.com/1825005/how-one-second-could-cost-amazon-16-billion-sales
14. 14
962 C* nodes to 78
60% TCO
95% latency
“By moving to ScyllaDB Enterprise software
running on AWS EC2 infrastructure and on-
premises, Comcast improved P99 latency by
more than 95% and were able to rip out a UI
cache layer “
15. From Redis + Elasticsearch to ScyllaDB
15
<1ms P99
Zero downtime
TCO
33. Inefficient use of memory:
+ Need to cache whole buffers to cache a single row
+ Access locality not likely if data set >> RAM
33
Why not buffer cache?
SSTable page (4K)
Row (300B)
34. Poor negative caching:
+ Need to cache whole data buffer to indicate absent data
34
Why not buffer cache?
SSTable page (4K)
?
35. Inefficient use of memory:
+ Redundant buffers due to LSM
+ Read may touch multiple SSTables
+ Memory waste remark pronounced
35
Why not buffer cache?
sstable sstable
sstable
Read
36. High CPU overhead for reads:
+ Reads need to merge data from multiple sstables
36
Why not buffer cache?
sstable sstable
sstable
Read
37. High CPU overhead for reads:
+ SSTable format optimized for compact storage, not read speed
+ Parsing overhead:
+ Need to parse index buffers sequentially
+ Need to parse the data file
37
Why not buffer cache?
38. Premature cache eviction due to SSTable compaction:
+ SSTable compaction removes old files => buffer invalidation
+ Hurts read performance by incurring misses
38
Why not buffer cache?
sstable
sstable
sstable
sstable
39. + Object cache
+ Like memtable
+ Optimized for low CPU overhead
+ Fast reads
+ Row-granularity caching
+ Reflects data in all relevant SSTables for a given object (e.g. row)
39
Cache structure
40. + ScyllaDB reserves and manages most of the memory on a node
+ Small reserve for the OS
+ No use of Linux page cache (only direct I/O)
+ Cache uses all available free memory
+ Shrinked on pressure from memtable and other allocations
40
Memory management
memtable
cache other
42. 42
Thread-per-core architecture
task task task task task task task
+ All processing in a single thread per CPU
+ Short tasks executed serially
+ Cooperative preemption
48. ScyllaDB cache highlights
+ ScyllaDB has a fast cache
+ Efficient access & maintenance
+ Thanks to collocation with replica and design
+ Takes care of consistency guarantees
+ Handles complexities of data and query model
49. External
Cache Hiccups
49
+ Increased latency
+ Elevated costs
+ Decreased availability
+ Increased complexity
+ Ruins the DB caching
+ Ignores DB own cache
+ Reduced security
54. Databases hold a lot of context about the data:
+ ScyllaDB is wide-column (Key-Key-Value), while a cache might by Key-Value only.
+ Structured data: Tables, User Defined Types…
+ Cache settings and hit rates per table
+ Time To Live (TTL)
+ Materialized View and Secondary Indexes
+ Much more…
54
Ignores the database knowledge
55. An external caching layer introduces noise:
+ Ignores built-in RBAC
+ Ineffective caching
+ Data consistency concerns
+ Data availability concerns
+ Scan-resistant caching
55
Ruins database own cache
56. 56
ScyllaDB as a Cache
Replacement
The features you are already familiar with, embedded to your database
58. SELECT * FROM users BYPASS CACHE;
SELECT name, occupation FROM users WHERE userid IN
(199, 200, 207) BYPASS CACHE;
SELECT * FROM users WHERE birth_year = 1981 AND
country = 'FR' ALLOW FILTERING BYPASS CACHE;
CQL Extension – BYPASS CACHE
59. SSTable index caching
■ The whole of index can now
be cached in memory
■ Populated on access (read-
through)
■ Evicted on memory
pressure
■ Partition index summary
still non-evictable and
always resident
RAM
Disk
60. SSTable indexing - large partition example
Partition size: 10 GB, Rows: 10 M, Index file size: 5 MB
scylla-5.0 -c1 -m4G
scylla-bench -workload uniform -mode read -limit 1 -concurrency 100 -partition-count 1
-clustering-row-count 10000000 -duration 60m
Before: 2’011 Rows/s
After: 6’191Rows/s
(the node was bound by disk bandwidth, ~530 MB/s)
61. Summary
+ Placing a cache in front of your Database can fire back
+ A cache lacks the context the DB has under the workload
+ ScyllaDB Cache is optimized to work with zero overhead
+ Multiple users have replaced their cache with ScyllaDB
+ ScyllaDB counts with several optimizations in its implementation
63. Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/
Editor's Notes
PRESENTER - Felipe
9:59:45 AM PT – Marisa, Cynthia, Julia mute themselves. Then Marisa to START WEBINAR IN ZOOM.
Felipe starts talking at 10:00AM PT
Good morning everyone and welcome to our webinar. We are going to give people a few more seconds as they funnel in and we will begin shortly.
Felipe to wait 30 seconds as people join the webinar.
Felipe to start talking again at 10:00:30 AM PT
Hi everyone and welcome! Before we get started, I’d like to quickly review a couple of housekeeping items.
We welcome your questions. Please use the Q&A button, located at the bottom of your screen to ask your questions. Remember, you can enter them any time during the webinar -- you don’t have to wait till the end. We will answer as many questions as we can get to at the end of the presentation.
Also, please note that today’s webinar is being recorded. We will email you a link to the recording and the slides following the event.
PRESENTER - Felipe
Before we begin we are pushing a quick poll question.
PRESENTER - Felipe
9:59:45 AM PT – Marisa, Cynthia, Julia mute themselves. Then Marisa to START WEBINAR IN ZOOM.
Felipe starts talking at 10:00AM PT
Good morning everyone and welcome to our webinar. We are going to give people a few more seconds as they funnel in and we will begin shortly.
Felipe to wait 30 seconds as people join the webinar.
Felipe to start talking again at 10:00:30 AM PT
Hi everyone and welcome! Before we get started, I’d like to quickly review a couple of housekeeping items.
We welcome your questions. Please use the Q&A button, located at the bottom of your screen to ask your questions. Remember, you can enter them any time during the webinar -- you don’t have to wait till the end. We will answer as many questions as we can get to at the end of the presentation.
Also, please note that today’s webinar is being recorded. We will email you a link to the recording and the slides following the event.
PRESENTER - Felipe
For those of you who are not familiar with ScyllaDB yet, it is the database behind gamechangers - organizations whose success depends upon delivering engaging experiences with impressive speed.
ScyllaDB was built with a close-to-the-metal design that squeezes every possible ounce of performance out of modern infrastructure.
This translates to predictable low latency even at high throughputs.
With such consistent innovation the adoption of our database technology has grown to over 400 key players worldwide
PRESENTER - Felipe
Many of you will recognize some of the companies among the selection pictured here, such as Starbucks who leverage ScyllaDB for inventory management, Zillow for real-time property listing and updates, and Comcast Xfinity who power all DVR scheduling with ScyllaDB.
As it can be seen, ScyllaDB is used across many different industries and for entirely different types of use cases. More than often, your company probably has a use case that is a perfect fit for ScyllaDB and it may be that you don’t know it yet!
SHARE LINKS IN CHAT (Marisa)
Learn more about ScyllaDB Architecture at https://www.scylladb.com/product/technology/
Purpose: Customer case study (Recommendation/Personalization - Media Streaming; Media & Entertainment)
Audience: Mixed
“Comcast Cable Communications which many know as Xfinity, is a telecommunications giant headquartered in the US that provides cable TV, internet, telephone, and wireless services
“The Comcast X1 platform is a cable TV and streaming video service that incorporates a cloud DVR scheduling system for 15 million households, with 2B+ RESTful calls (reads/writes) and 200+M new objects per day.
“Beginning first with Oracle and later moving to Cassandra, the scheduler engineering team struggled with database latency at scale.
(click) “By moving to ScyllaDB Enterprise software running on AWS EC2 infrastructure and on-premises, Comcast improved P99, P999, and P9999 latency by more than 95% and were able to rip out a UI cache layer
(click) “They dramatically reduced their total database infrastructure from 962 Cassandra nodes (across multiple clusters) to 78 ScyllaDB nodes.
(click) “and they reduced total costs by more than 60%, saving Comcast over $2.5M annually in infrastructure costs and staff overhead.
Note
Philip Zimich featured in blog and recorded Summit presentation leads the architecture, development and operations of the Comcast’s X1 Scheduler system that powers the DVR and program reminder experience for the X1 platform
Blog/recorded presentation: 78 nodes is total for 6 clusters across 3 data centers using Enterprise subscriptions with AWS infrastructure and on-premises
Salesforce: today 5 clusters, 4 in production (2 on EC2, 2 on premises) and totaling 100+ nodes
Purpose: Customer case study - (Recommendation/Personalization - Media Streaming; Media & Entertainment)
Audience: Mixed
“Based in India, Disney+ Hotstar provides on-demand streaming services to more than 18 million paid subscribers and 300 million monthly active users.
“Disney + Hotstar’s “Continue Watching” feature tracks every show for every user, capturing timestamps when last watched so users can pick up where left off on any device, to prompt users to watch next episodes, and alert users to new episodes of favorite shows.
“Using Kafka for streaming data and Redis (500GB) coupled with Elasticsearch (20TB) for their 20+TB data environment, the engineering team was running into scaling, data complexity, and cost issues. They considered a number of alternatives, from Cassandra and Apache HBase to DynamoDB, ultimately selecting our database-as- a-service ScyllaDB Cloud. The gains were compelling with Disney+ Hotstar…,
(click) “achieving sub-millisecond p99 latency at scale
(click) “a simplified data architecture with significantly lower TCO
Note
Blog: calls out 20TB, sub millisecond P99.
Purpose: Customer case study (Recommendation/Personalization - Media Streaming; Media & Entertainment)
Audience: Mixed
“HQ’d in Singapore, Grab is an on-demand transportation company - whether for personal rides or food or package delivery - and one of the most used mobile apps in Southeast Asia.
Grab relies on Kafka to stream data for a variety of business use cases. To read the streams they needed a powerful, low-latency metadata store to aggregate the streams and initially used Redis - but it couldn’t keep up with the load. So Grab looked at Cassandra, ScyllaDB, and other NoSQL solutions, and after extensive testing, selected ScyllaDB.
(click) ScyllaDB performance was on par with Redis…
(click) …but without the scalability and related cost challenges. It also proved much easier than managing Cassandra.
Grab now uses ScyllaDB for a variety of use cases including fraud detection, ad targeting, and data store for their front end UI.
Purpose: Customer case study (Recommendation/Personalization - Media Streaming; Media & Entertainment)
Audience: Mixed
“Now part of Fox, Tubi is an ad-supported media streaming service with over 50 millions active users.
“Tubi uses ML and an innovative experimentation process to personalize movie recommendations.
“Tubi initially used Redis for the recommendation database, but later moved to Cassandra. As their environment grew, so did the need for better latency, throughput, fault tolerance, and maintainability.
“So they moved to ScyllaDB Cloud running on AWS. In addition to eliminating JVM tuning, (click) average read latency during peak times was reduced to sub-millisecond (click) and P99 was reduced to 4-8ms.
And yes, we write to the commitlog for crash recovery
Cache is inserted like this
Represents subset of data in sstables
An improvement would be to manage most of memory inside Scylla. Still..
An improvement would be to manage most of memory inside Scylla. Still..
An improvement would be to manage most of memory inside Scylla. Still..
… and this is enabled by the fact that cache is collocated with the replica
An improvement would be to manage most of memory inside Scylla. Still..
Cache is inserted like this
Represents subset of data in sstables
Cache is inserted like this
Represents subset of data in sstables
Cache is inserted like this
Represents subset of data in sstables
Repeated scans never go to disk
Mention HWLB
When a cache node fails, latency jump because the DB cache is cold - Ruins the database caching!
This is not the case for ScyllaDB! Since each info element is replicated (usally 3 times) there is at least 2 nodes with hot cache.
ScyllaDB has a HWLB features which allow it to gradually warm the node.
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
URL
ScyllaDB Cloud: https://www.scylladb.com/product/scylla-cloud/
Database Performance at Scale Masterclass: https://lp.scylladb.com/database-performance-scale-masterclass-register
ScyllaDB University Live: https://lp.scylladb.com/university-live-2023-12-registration
Contact Us:
Tomasz Grabiec: tgrabiec@scylladb.com
Tzach Livyatan: tzach@scylladb.com
Join our Slack Channel ScyllaDB Slack
Ask your questions on our user forum ScyllaDB Community NoSQL Forum