Expedia Group is a multibillion dollar online travel company. Learn why they decided to move from Apache Cassandra to Scylla to further their corporate growth, what they learned in the migration process, and how Scylla improved their operations.
Hello everyone.....It gives us immense pleasure in sharing our journey towards ScyllaDB, during this ScyllaSummit 2021. As part of this presentation, would take you all through who are we, what we do, why we choose ScyllaDB and finally the outcome.
I'm Singa, I will be joined by Dilip copresentor for this talk. We are both passionate database engineers at Expedia Group, working with multiple NoSQL technologies and strive to align use cases that makes the best out of the underlying datastore.
Expedia Group, Inc. is one of world’s largest travel platforms. At ExpediaGroup – Our mission is to bring the world within reach. We firmly believe that travel has the power to change lives!
We do that through the power of our brands
Alright lets get into the nitty gritty of why we picked ScyllaDB and how it helped our developer journey. Currently at EG, there are multiple applications built on top of Apache cassandra, which comes with its own set of challenges. We will be going through some of them, throughout this deck.
Apache Cassandra written in Java, brings in the onus of managing GC & making sure its appropriately tuned for the workload in hand. Though GC is tunable, it takes significant amount of time and effort as well as expertise required to handle/tune GC pause for every specific use case. With burst traffic or sudden peak in the workload, there is significant disturbance to the P99 response time. So, we end up adding buffer nodes to handle this peak capacity, which results in more infrastructure costs. Another significant worry is, based on past 4 years history the number of Apache Cassandra releases has significantly slowed down.
We would like to compare the open source commits in Cassandra vs ScyllaDB here and highlight the amount of releases that Scylla has gone through the same past 3 year period. As you can see, it gives enough confidence towards ScyllaDB that given an issue/bug with a specific release, it will be soon addressed with a patch. In contrast with Apache Cassandra one might have to wait longer.
So why did we end up with ScyllaDB? From an Apache Cassandra codebase, its frictionless for developers to switch over to ScyllaDB. For the use cases that we tried, there wasn’t any data model changes necessary and the scylladb driver was entirely compatible and a swap in replacement with Cassandra driver dependency. With a few tweaks to our automation framework that provisions Apache Cassandra cluster, we were able to provision ScyllaDB open source cluster. Thanks to C++ backend of ScyllaDB, we no longer have to worry about stop the world GC pauses. Also we were able to store more data per node and achieve more throughput per node, thereby saving significant $$$ for company. Clear roadmap and support from ScyllaDB slack community comes in very handy.
The candidate application chosen for this POC, is our geo system that provides information about geographical entities and the relationships between them. It aggregates data from multiple systems like hotel location info, 3rd party data , etc. This rich geography dataset enables different types of data searches, using a simple REST API while guarantying single digit msec P99 read response time. To speed up API responses, we are using multi layered cache with redis as first layer and cassandra as second level. With ScyllaDB as a swap in replacement for Cassandra, I’m handing it over to Dilip for going over the infra setup, benchmark results and next steps.
Thank you Singa. Our POC cluster in ScyllaDB were to store around 25TB of data exactly like our existing PROD Cassandra cluster. To begin with we provisioned same total number of instances between Cassandra and ScyllaDB but the instance type chosen was I3.2XL which is 35% cheaper than I3EN.2XL.
The use case demands a high read throughput, while tiny write throughput. As shown in the first graph whether its ScyllaDB or Cassandra the writes are almost negligible or flat line at bottom. While the real winner is on Reads where the Cassandra P99 throughput is flaky as shown by the spikes, while the ScyllaDB P99 read response times are relatively flat. This is of significant advantage to our read heavy application. In terms of throughput comparison as shown in second graph, we were able to push almost double the TPS with ScyllaDB when compared with Cassandra, especially with a flat P99 SLA.
Here are some of the facts that made ScyllaDB benchmark stand out. We were able to get triple the throughput with flat single digit P99 read response times, at the same time achieve over 35% reduction in total cost of ownership. At this point it was a no brainer to switch towards ScyllaDB for this application production workload.
Huge shoutout to our automation team which made the provisioning of ScyllaDB cluster a breeze., made possible via our internal tool called Cerebro. We use this same internal tool for managing over 7 different NoSQL technologies with the aim of enabling our application teams to focus on bringing great products to the market, without having to worry about managing databases.
This application in hand currently uses L1 cache (Redis) before hitting this backend persistent store ScyllaDB. With the advantage of ScyllaDB supporting Redis compatible API and proven P99 improves to be under single digit msec, we are thinking about turning off the in memory cache engine and rely completely on ScyllaDB as only database backend for the application. This will bring in significant additional cost advantage both in terms of infrastructure and application code. Also we recently learned about Scylla Alternator and are currently evaluating if it’s a viable alternative to DynamoDB as advertised.
Logs are being pushed to syslog and there isn’t a configuration to route them to a customer folder of your choice. The CDC functionality is significantly better compared to Apache Cassandra, so this might entice applications that rely on change streams. A good thing about ScyllaDB node replacements either during scale up or scale down are resumable. Please pay caution while using large partition, the performance might vary depending on how large the partitions are.
If you are interested in what you heard and want to build great products with us....Please join hands
Thanks for this opportunity to present to ScyllaDB enthuasists all over the world. We enjoyed every moment of putting this together and hope you did too.