Many NoSQL DBaaS vendors limit what cloud platform you can run on, the size of the data you can run and require you to over-provision cloud infrastructure resources while failing to deliver performance and low latency at scale.
In this session, we will compare the performance and Total Cost of Ownership (TCO) of competing NoSQL DBaaS offerings. We will also review how to migrate to Scylla Cloud, our fully managed database service.
You will learn:
- The true cost of ownership for selected NoSQL DBaaS offerings
- The 8 essentials for selecting a NoSQL DBaaS
- Migration options from Apache Cassandra, DynamoDB and other databases
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
The True Cost of NoSQL DBaaS Options
1. The True Cost of NoSQL
DBaaS Options
Eyal Gutkind — VP of Solutions
2. 2
Presenter
Eyal Gutkind
Eyal Gutkind is VP of Solutions at ScyllaDB. Prior to joining
ScyllaDB, Eyal held product management roles at Mirantis
and DataStax, and spent 12 years with Mellanox
Technologies in various engineering management and
product marketing roles.
5. DBaaS Trends
“The Future of Database Management System is Cloud”
- Gartner 2019
5
$10.4B DBMS Cloud Services
18.4%
DBMS market grows from
2017 to 2018
6. DBaaS Essentials
6
Full Ecosystem
It needs to integrate seamlessly with Kafka, Spark,
JanusGraph, Presto and many more.
Elastic
Must be resilient to deal with huge spikes. It needs to scale both
up and out and support any number of operations. Don’t leave
your customers hanging on Black Friday!
7. 7
Zero Downtime
Only databases with masterless architecture can
guarantee no outages. Pick a DBaaS that’s purpose-built
for HA/DR.
Real-time All the Time
Latency spikes can lead to poor user experience, missed
SLAs and customer abandonment. Your DBaaS needs to
reliably deliver low-single-digit milliseconds latencies.
DBaaS Essentials
8. DBaaS Essentials
8
Cost Effective at Scale
Expect to get more for less. Choose a DBaaS that
delivers high OPS for a cost that doesn't break the bank.
Not Lock You In
Use common APIs such as Cassandra and DynamoDB
and you’ll have freedom to migrate to any database or
compute platform — cloud or on-prem.
9. DBaaS Essentials
9
No Limits!
Got large partitions? Wide rows? Huge payloads?
Multi-terabyte or petabytes of data? No problem!
Observability and Auto-tuning
Plus an intuitive dashboarding UI for cluster monitoring
with 360 visibility and insight into critical DB metrics.
12. TCO - From Idea to Implementation
12
Item Expected
Duration
Involved Stakeholders Target Success Criteria
Use case review 2-3 weeks Business, Application,
Infrastructure, Evaluators
Proof of Concept
document
Buy in from all
stakeholders on PoC
document
System Installation 1-2 weeks Infrastructure, Evaluators Working Setup, well
configured loaders and
monitoring
Database installation
completion
….. ….. ….. …. …..
Resilience tests 1 week Infrastructure, Evaluators Create disaster events
measure cluster
availability
Application layer stay
uninterrupted during
stress
Re-tune systems 1 week Application, Infrastructure,
Evaluators
Improve settings based
on benchmark learnings
Optimized deployment
13. TCO - From Idea to Implementation
13
Item Expected
Duration
Involved Stakeholders Target Success Criteria
Use case review 2-3 weeks Business, Application,
Infrastructure, Evaluators
Proof of Concept
document
Buy in from all
stakeholders on PoC
document
System Installation 1-2 weeks Infrastructure, Evaluators Working Setup, well
configured loaders and
monitoring
Database installation
completion
….. ….. ….. …. …..
Resilience tests 1 week Infrastructure, Evaluators Create disaster events
measure cluster
availability
Application layer stay
uninterrupted during
stress
Re-tune systems 1 week Application, Infrastructure,
Evaluators
Improve settings based
on benchmark learnings
Optimized deployment
14. + Infrastructure vendor selection leaves buying power in your hands
+ Multiple and Adjustable workload, versatile deployment
14
TCO - Who Controls the System?
15. TCO - What is the Transformation Goal?
+ Users will see different TCO when migrating from different systems
+ Relational → NoSQL
+ NoSQL → NoSQL
+ Cloud native implementation
+ Efficient DevOps
+ Time-to-market
+ High availability
+ Scale
+ Resiliency
15
16. TCO - Hidden or Obvious Costs!?
16
Storage, Operations/Sec, traffic
Additional indexes
Changes in payload
Latency budget
High Availability - Multi-Datacenter,
Multi Cloud
Scaling
17. Scylla Cloud vs. C* DBaaS Solutions
17
AWS Keyspaces Vs.
Azure Cosmos Vs.
DataStax Astra Vs.
Scylla Cloud
Storage Cost [$/month/TB] 0.3 0.25
8 x C40 Capacity units
Hassle free
3 x i3.8xlarge instances
Unit Read Cost [$] 0.1095 0.0496
Unit Write Cost [$] 0.5475 0.0496
Total Storage cost [$/month] $900.00 $750
Total Write Cost [$/month] $32,850.00 $8,614
Total Read Cost [$/month] $54,750.00 $25,842
Total$ / Month $88,500 $35,206 $28,224 $9,450
Use case: 400,000 operations per second peak, latency guarantee is <20ms for read/writes. 1KB payload, 75:25 Read:Write ratio, 3TB of unreplicated data, annually provisioned, monthly pay.
23. Online Migration
+ Migrate Online from Cassandra and DynamoDB compatible solutions
+ No down time required
23
24. 24
+ Add attributes to data during migrations
+ Migrate from relational database
+ Migrate from other NoSQL technologies
Migration and Data Enrichment
32. 32
+ Using DBaaS is the way forward
+ Collect your current and future application needs
+ Incorporate all metrics into your TCO
+ Scaling can be costly
+ Do not give away your buying power
+ Migration with zero down time is possible
+ Migration from one NoSQL/RDBMS to NoSQL is possible , use case dependant
Summary
33. 33
Explore Scylla Cloud for Free:
https://www.scylladb.com/product/scylla-cloud/
Benchmarks Scylla Cloud vs DynamoDB:
https://www.scylladb.com/product/benchmarks/dynamodb-benchmark/
Resources
35. United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!
36. United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!
37. United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!
38. United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!
39. United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!
40. United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!
41. United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!
42. Finding TCO
+ No apples to apples
+ Workload type and size are the main drivers
+ Serverless DBaaS - zero visibility, measure by throughput
+ Performance matters
+ Ramp up cost
+ Management cost/DevOps
Other points:
- What is your starting point?
- You will have different TCO if you come from RDBMS, other NoSQL solutions and/or a fresh
install.
- What is your goal?
- You will have different TCO if your goals are performance, future scalability, high availability,42
43. 43
+ Scenarios
+ Heavy Reads up to 1M OPS/s
+ Heavy Writes up to 1M OPS/s
+ 50/50
+ Data set
+ Small
+ Medium
+ Large
+ Item size 1-4KB
+ Scylla sizing calculations
+ Pricing comparison
DynamoDB
44. 44
+ Scenarios
+ Heavy Reads up to 1M OPS/s
+ Heavy Writes up to 1M OPS/s
+ 50/50
+ Data set
+ Small
+ Medium
+ Large
+ Item size 1-4KB
+ Scylla sizing calculations
+ Pricing comparison
Keyspaces
45. 45
+ Use sizing calculation to derive OPS/s
+ Match instances by
+ Total Available Storage
+ vCPU/RAM
+ OPS
+ Compare i3 to CXX and i3en to DXX, we should aim at D40
+ Assumptions
+ 1 CPU - 12,500 operations per second (based on benchmarks - field experience)
+ Cassandra is 2x-4x slower than Scylla (based on benchmarks)
+ In CQL mode, each vCPU gives ~ 6250 reads, 8000 writes sustained; post compactions, repairs
etc
DataStax Astra
46. Existing data migration strategies
▪ CQL COPY
▪ SSTableloader
▪ Spark Migrator
When performing an online migration, always use a strategy
that preserves timestamps, unless all keys are unique
46
47. Databases: Under the hood
+ Native CQL to native CQL
+ Scylla Spark Migrator
+ SSTable files to CQL
+ SSTableloader
+ Arbitrary data files to CQL
+ COPY
47
48. Highly volatile data with low TTL
▪ Establish dual writes
▪ Keep running until last record in the old DB is expired
▪ Turn off dual writes
▪ Phase out old DB
SQL
NoSQL
48
52. COPY and CSV files
Pros
+ Simplicity
+ CSV transparency
+ Easy to validate
+ Destination schema can have less
columns than the original
+ Can be tweaked, plenty of language
support
+ Can be used for any data ingestion, not
necessarily from Scylla/Cassandra
(incompatible DBs)
+ Compatible with Cassandra, Scylla and
Scylla Cloud
Cons
+ Not for large data sizes
+ Timestamps not preserved - be
careful with online migrations
52
54. ▪ Create snapshot of each Cassandra node
▪ Run sstableloader from each Cassandra node, or from intermediate servers
▪ Use throttling -t to limit the leader throughput if needed
▪ Run several sstableloaders in parallel
Both Cassandra and Scylla ship with an sstableloader utility.
While they are similar, there are differences between the two:
▪ You MUST use Scylla’s sstableloader to migrate to Scylla
SSTableloader
Scylla’s
SSTable
Loader
SSTables CQL
SQL
NoSQL
54
55. SStableloader continued
+ No Schema update during forklifting
+ Scylla’s sstableloader has support for simple column renames
+ Assuming RF=3, you end up with 9 copies of the data until compaction happens
Failure handling:
+ What should I do if sstableloader fails?
+ What should I do if a source node fails?
+ What should I do if a destination node fails?
+ How to rollback and start from scratch?
https://docs.scylladb.com/operating-scylla/procedures/cassandra_to_scylla_migration_process
55
57. ▪ Highly resilient to failures, and will retry reads and
writes throughout the job
• Continuously writes savepoint files which can be used
to resume the transfer from the point at which it stopped
▪ Access compatible Databases using a native connector
▪ High performance parallelized reads and writes
▪ Unlimited streaming power
▪ Reduce data transfer costs
▪ Can be configured to preserve the WRITETIME and TTL attributes of the
fields that are copied
▪ Can handle column rename as part of the transfer
SQL
NoSQL
Scylla Spark Migrator
57
58. Scylla Spark Migrator
A very simple and easy to use tool
▪ Install the standard Spark stack (Java JRE and JDK, SBT)
▪ Edit configuration file
▪ Run
Links:
▪ https://www.scylladb.com/2019/02/07/moving-from-cassandr
a-to-scylla-via-apache-spark-scylla-migrator/
▪ https://github.com/scylladb/scylla-migrator/
58
59. Settings to take into considerations
● Spark 2.3.1 or later
● Cassandra-Spark connector 2.3
--conf spark.scylla.source.connections=CONNECTION_COUNT
--conf spark.scylla.source.keyspace=SOURCE_KEYSPACE
--conf spark.scylla.source.table=SOURCE_TABLE
--conf spark.scylla.source.splitCount=SOURCE_SPLIT
--conf spark.scylla.dest.connections=CONNECTION_COUNT
--conf spark.scylla.dest.keyspace=DEST_KEYSPACE
--conf spark.scylla.dest.table=DEST_TABLE
https://github.com/scylladb/scylla-migrator
59
61. STDOUT:
2019-03-25 20:30:04 INFO migrator:405 -
Created a savepoint config at
/tmp/savepoints/savepoint_1553545804.yaml
due to schedule. Ranges added:
Set((49660176753484882,50517483720003777),
(1176795029308651734,1264410883115973030),
(-246387809075769230,-238284145977950153),
(-735372055852897323,-726956712682417148),
(6875465462741850487,6973045836764204908),
(-467003452415310709,-4589291437737669003)
...
Scylla Spark
migrator
61
62. Takeaways: Existing Data Migration
+ If Source database is SQL, MongoDB, etc:
+ Use COPY command
+ If you have access to the original SSTable files:
+ Use SSTableloader
+ Want a fully flexible streaming solution and can afford the extra load in the
source:
+ Use the Scylla Spark Migrator
62
63. 63
Style Defaults
TEXT/BACKGROUND ACCENT
Logo Colors Text, Header and Bullet Used in charts and shapes
R 58
G 46
B 84
R 179
G 150
B 208
R 83
G 83
B 74
R 101
G 123
B 197
R 101
G 123
B 197
R 179
G 150
B 208
R 86
G 204
B 138
R 252
G 182
B 48
R 75
G 188
B 215
R 0
G 140
B 186
OBJECT DEFAULTS
This is a Default Text Box Style
Roboto Condensed Regular 13.5pt.
R 231
G 230
B 230
64. United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Thank You!