Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB

•

0 likes•1,958 views

Expedia Group is a multibillion dollar online travel company. Learn why they decided to move from Apache Cassandra to Scylla to further their corporate growth, what they learned in the migration process, and how Scylla improved their operations.

Technology

EG’s Migration Journey to Scylla
Singaram Ragunathan & Dilip Kolasani
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Singaram Ragunathan &
Dilip Kolasani
NoSQL Enthusiasts, Expedia Group
Singaram is a seasoned data architect passionate
about distributed database systems and the design of
highly scalable big data models, data pipeline, and
application software. He has fun building database
infrastructure as code and enabling application
teams with the journey to cloud.
Dilip is a database specialist with over 12 years of
experience in the IT industry. He has a deep
understanding of relational, as well as NoSQL
databases and specializes in enabling application
transitions to NoSQL databases in the cloud.
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

We match global travelers with the best options for their needs,
and connect travel partners with the widest range
of travelers and technology.
*As of December 2019*
200+
Travel
Booking Sites
in 70+ Countries
43%
International Revenue
Full Year 2019
Over 40%
of Transactions
on Consumer OTA
Brands Booked via Mobile
500+
Airlines, Dozens of
Cruise Lines, and
210K+ Unique Activities.
2.1M+
Vrbo Online
Bookable Listings
1.6M
Properties on Core
Lodging Platform
$12.1B
Revenue
389M
Room Nights
$ 107.9B
In Gross Bookings
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Technical Challenge
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Challenges with Apache Cassandra
<250MS
Garbage
Collection
Burst
Traffic
Infrastructure
Costs
Infrequent
Releases
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

3.0.2
3.0.3
3.0.4
3.0.8
3.0.9
3.0.10
3.0.11
2.3.1
2.3.2
2.3.3
3.11.4
3.11.6
Q3 Q4 Q3 Q4Q1 Q2 2019 20202018
Cassandra
ScyllaDB
Apache Cassandra Vs ScyllaDB Release Timeline
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential
3.11.2
3.11.3
3.11.5
Q1 Q3
3.11.7
3.11.8
3.11.9
4.1.1
4.1.2
4.1.3
4.1.4
4.1.5
4.1.6
4.1.7
4.1.8
4.1.9
3.0.5
3.0.6
3.0.7
Q2

Code Compatibility Majority of use cases required no code changes from C*
Ease of migration Infra provisioning framework and concepts remain unchanged
Total cost of ownership Can do more with less $$
Roadmap ScyllaDB has a well-defined roadmap
Vendor Support Truly impressive turnaround time via Scylla community slack channel
Why ScyllaDB?
8© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Initial Use Case
Geo RealTime Geo Content Geo tools
• Web service with REST/Json APIs
• Access geography data
• Perform geography-based
selection
• Other geospatial functionality
• Geography content and
related data are
• created
• aggregated
• managed
• Admin Tools
• Part of the back-office system
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Infrastructure - Apache Cassandra vs Scylla
Current Apache
Cassandra Cluster
ScyllaDB
Cluster
• 24 Nodes
• 25 TB
• I3EN.2XL AWS INSTANCE
• Cassandra 3.0.9
• 24 Nodes
• 25 TB
• I3.2XL AWS INSTANCE
• ScyllaDB 4.1.4
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Performance - Apache Cassandra Vs ScyllaDB
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential
Throughput
Read/Write

ScyllaDB Benchmark Results
Throughput P99 Read
Performance
Infrastructure
Savings
3x <5ms 35%
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Cerebro – Internal NoSQL Automation Framework
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Next Steps
• L1 Cache – Redis
• L2 Cache – ScyllaDB
• ScyllaDB supports Redis API support – Possible
replacement of L1 Cache
• Evaluate Scylla Alternator as opensource
replacement to DynamoDB
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Considerations
1
2
3
4
Logs are piped to syslog
Ease of use with Change Data Capture (CDC)
Cluster scale out/in are resumable
Support for large partitions
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

Did you know we’re hiring?
Join our global team at Expedia Group!
Data, AI, Platforms & Systems is looking for tech rock stars to join our team.
Check out the EG Careers Site for more details. (lifeatexpediagroup.com)
© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

YouThank
Singaram Ragunathan & Dilip Kolasani
© 2020 Expedia, Inc., an Expedia Group company. All rights reserved.
Trademarks and logos are property of their respective owners.
https://www.linkedin.com/in/singaram-ragunathan-2a11201b/ https://www.linkedin.com/in/dilip-kolasani-13308526/

What's hot

Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScyllaDB

Captial One: Why Stream Data as Part of Data Transformation?ScyllaDB

Real-World Resiliency: Surviving Datacenter DisasterScyllaDB

Augury: Real-Time Insights for the Industrial IoTScyllaDB

Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia ForwardScyllaDB

SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...ScyllaDB

Free & Open DynamoDB API for EveryoneScyllaDB

Scylla Cloud on Display: Functionality, Performance and DemosScyllaDB

Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...ScyllaDB

Scylla Summit 2018: Scylla 3.0 and BeyondScyllaDB

Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...ScyllaDB

Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...ScyllaDB

How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...ScyllaDB

Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformScyllaDB

Introducing Scylla Open Source 4.0ScyllaDB

Scylla Summit 2016: ScyllaDB, Present and FutureScyllaDB

Workshop - How to benchmark your databaseScyllaDB

High-Load Storage of Users’ Actions with ScyllaDB and HDDsScyllaDB

Scylla Summit 2018: Getting the Most Out of Scylla on KubernetesScyllaDB

Scylla Summit 2019 Keynote - Avi KivityScyllaDB

What's hot (20)

Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB

Captial One: Why Stream Data as Part of Data Transformation?

Real-World Resiliency: Surviving Datacenter Disaster

Augury: Real-Time Insights for the Industrial IoT

Scylla Summit 2018: Grab and Scylla: Driving Southeast Asia Forward

SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...

Free & Open DynamoDB API for Everyone

Scylla Cloud on Display: Functionality, Performance and Demos

Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...

Scylla Summit 2018: Scylla 3.0 and Beyond

Scylla Summit 2018: Kiwi.com Migration to Scylla - The Why, the How, the Fail...

Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...

How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...

Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform

Introducing Scylla Open Source 4.0

Scylla Summit 2016: ScyllaDB, Present and Future

Workshop - How to benchmark your database

High-Load Storage of Users’ Actions with ScyllaDB and HDDs

Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes

Scylla Summit 2019 Keynote - Avi Kivity

Similar to Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB

Interactive Zero-Touch Enterprise Networks: Nuage SD-WAN on AWS (TLC310) - AW...Amazon Web Services

The Cloud Revolution - Philippines Cloud SummitRandy Bias

Production-Ready Environments for Kubernetes (CON307-S) - AWS re:Invent 2018Amazon Web Services

AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS SummitAmazon Web Services

Transforming Enterprise IT - Virtual Transformation Day Feb 2019Amazon Web Services

Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAmazon Web Services

MongoDB World 2019: Implementation and Operationalization of MongoDB Sharding...MongoDB

When Open Source Meets the EnterpriseMariaDB plc

Scale - Implementing a Data Warehouse on AWSAmazon Web Services

AWS Summit London 2023 - Migrating 600 Databases To AWSMatt Houghton

(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...Amazon Web Services

Cisco Connect Ottawa 2018 consuming public and private cloudsCisco Canada

EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads Srikanth Ramakrishnan

Migrate and Modernize Your DatabaseAmazon Web Services

Building Real-Time Serverless Data Applications With Joseph Morais and Adam W...HostedbyConfluent

Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 EditionThousandEyes

Building a Cross Cloud Data Protection EngineDatabricks

Managed Relational Databases - Amazon RDSAmazon Web Services

06_08_emea_how_to_evaluate_rollout_and_operationalize_your_sdwan_projects_web...ThousandEyes

Implementing a Data Warehouse on AWS in a Hybrid EnvironmentAmazon Web Services

Similar to Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB (20)

Interactive Zero-Touch Enterprise Networks: Nuage SD-WAN on AWS (TLC310) - AW...

The Cloud Revolution - Philippines Cloud Summit

Production-Ready Environments for Kubernetes (CON307-S) - AWS re:Invent 2018

AWS Data Transfer Services: Deep Dive - SRV302 - Chicago AWS Summit

Transforming Enterprise IT - Virtual Transformation Day Feb 2019

Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks

MongoDB World 2019: Implementation and Operationalization of MongoDB Sharding...

When Open Source Meets the Enterprise

Scale - Implementing a Data Warehouse on AWS

AWS Summit London 2023 - Migrating 600 Databases To AWS

(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re...

Cisco Connect Ottawa 2018 consuming public and private clouds

EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads

Migrate and Modernize Your Database

Building Real-Time Serverless Data Applications With Joseph Morais and Adam W...

Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition

Building a Cross Cloud Data Protection Engine

Managed Relational Databases - Amazon RDS

06_08_emea_how_to_evaluate_rollout_and_operationalize_your_sdwan_projects_web...

Implementing a Data Warehouse on AWS in a Hybrid Environment

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Scaling API-first – The story of a global engineering organizationRadu Cotescu

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Slack Application Development 101 Slidespraypatel2

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Presentation on how to chat with PDF using ChatGPT code interpreter

CNv6 Instructor Chapter 6 Quality of Service

Boost PC performance: How more available memory can improve productivity

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Scaling API-first – The story of a global engineering organization

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Partners Life - Insurer Innovation Award 2024

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

How to Troubleshoot Apps for the Modern Connected Worker

Handwritten Text Recognition for manuscripts and early printed texts

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

The Codex of Business Writing Software for Real-World Solutions 2.pptx

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

08448380779 Call Girls In Friends Colony Women Seeking Men

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Slack Application Development 101 Slides

Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB

2. Singaram Ragunathan & Dilip Kolasani NoSQL Enthusiasts, Expedia Group Singaram is a seasoned data architect passionate about distributed database systems and the design of highly scalable big data models, data pipeline, and application software. He has fun building database infrastructure as code and enabling application teams with the journey to cloud. Dilip is a database specialist with over 12 years of experience in the IT industry. He has a deep understanding of relational, as well as NoSQL databases and specializes in enabling application transitions to NoSQL databases in the cloud. © 2020 Expedia Group. All rights reserved. Expedia Group Confidential

3. We match global travelers with the best options for their needs, and connect travel partners with the widest range of travelers and technology. *As of December 2019* 200+ Travel Booking Sites in 70+ Countries 43% International Revenue Full Year 2019 Over 40% of Transactions on Consumer OTA Brands Booked via Mobile 500+ Airlines, Dozens of Cruise Lines, and 210K+ Unique Activities. 2.1M+ Vrbo Online Bookable Listings 1.6M Properties on Core Lodging Platform $12.1B Revenue 389M Room Nights $ 107.9B In Gross Bookings © 2020 Expedia Group. All rights reserved. Expedia Group Confidential

7. 3.0.2 3.0.3 3.0.4 3.0.8 3.0.9 3.0.10 3.0.11 2.3.1 2.3.2 2.3.3 3.11.4 3.11.6 Q3 Q4 Q3 Q4Q1 Q2 2019 20202018 Cassandra ScyllaDB Apache Cassandra Vs ScyllaDB Release Timeline © 2020 Expedia Group. All rights reserved. Expedia Group Confidential 3.11.2 3.11.3 3.11.5 Q1 Q3 3.11.7 3.11.8 3.11.9 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.1.6 4.1.7 4.1.8 4.1.9 3.0.5 3.0.6 3.0.7 Q2

8. Code Compatibility Majority of use cases required no code changes from C* Ease of migration Infra provisioning framework and concepts remain unchanged Total cost of ownership Can do more with less $$ Roadmap ScyllaDB has a well-defined roadmap Vendor Support Truly impressive turnaround time via Scylla community slack channel Why ScyllaDB? 8© 2020 Expedia Group. All rights reserved. Expedia Group Confidential

9. Initial Use Case Geo RealTime Geo Content Geo tools • Web service with REST/Json APIs • Access geography data • Perform geography-based selection • Other geospatial functionality • Geography content and related data are • created • aggregated • managed • Admin Tools • Part of the back-office system © 2020 Expedia Group. All rights reserved. Expedia Group Confidential

10. Infrastructure - Apache Cassandra vs Scylla Current Apache Cassandra Cluster ScyllaDB Cluster • 24 Nodes • 25 TB • I3EN.2XL AWS INSTANCE • Cassandra 3.0.9 • 24 Nodes • 25 TB • I3.2XL AWS INSTANCE • ScyllaDB 4.1.4 © 2020 Expedia Group. All rights reserved. Expedia Group Confidential

14. Next Steps • L1 Cache – Redis • L2 Cache – ScyllaDB • ScyllaDB supports Redis API support – Possible replacement of L1 Cache • Evaluate Scylla Alternator as opensource replacement to DynamoDB © 2020 Expedia Group. All rights reserved. Expedia Group Confidential

15. Considerations 1 2 3 4 Logs are piped to syslog Ease of use with Change Data Capture (CDC) Cluster scale out/in are resumable Support for large partitions © 2020 Expedia Group. All rights reserved. Expedia Group Confidential

16. Did you know we’re hiring? Join our global team at Expedia Group! Data, AI, Platforms & Systems is looking for tech rock stars to join our team. Check out the EG Careers Site for more details. (lifeatexpediagroup.com) © 2020 Expedia Group. All rights reserved. Expedia Group Confidential

17. YouThank Singaram Ragunathan & Dilip Kolasani © 2020 Expedia, Inc., an Expedia Group company. All rights reserved. Trademarks and logos are property of their respective owners. https://www.linkedin.com/in/singaram-ragunathan-2a11201b/ https://www.linkedin.com/in/dilip-kolasani-13308526/

Editor's Notes

Hello everyone.....It gives us immense pleasure in sharing our journey towards ScyllaDB, during this ScyllaSummit 2021. As part of this presentation, would take you all through who are we, what we do, why we choose ScyllaDB and finally the outcome.
I'm Singa, I will be joined by Dilip copresentor for this talk. We are both passionate database engineers at Expedia Group, working with multiple NoSQL technologies and strive to align use cases that makes the best out of the underlying datastore.
Expedia Group, Inc. is one of world’s largest travel platforms. At ExpediaGroup – Our mission is to bring the world within reach. We firmly believe that travel has the power to change lives!
We do that through the power of our brands
Alright lets get into the nitty gritty of why we picked ScyllaDB and how it helped our developer journey. Currently at EG, there are multiple applications built on top of Apache cassandra, which comes with its own set of challenges. We will be going through some of them, throughout this deck.
Apache Cassandra written in Java, brings in the onus of managing GC & making sure its appropriately tuned for the workload in hand. Though GC is tunable, it takes significant amount of time and effort as well as expertise required to handle/tune GC pause for every specific use case. With burst traffic or sudden peak in the workload, there is significant disturbance to the P99 response time. So, we end up adding buffer nodes to handle this peak capacity, which results in more infrastructure costs. Another significant worry is, based on past 4 years history the number of Apache Cassandra releases has significantly slowed down.
We would like to compare the open source commits in Cassandra vs ScyllaDB here and highlight the amount of releases that Scylla has gone through the same past 3 year period. As you can see, it gives enough confidence towards ScyllaDB that given an issue/bug with a specific release, it will be soon addressed with a patch. In contrast with Apache Cassandra one might have to wait longer.
So why did we end up with ScyllaDB? From an Apache Cassandra codebase, its frictionless for developers to switch over to ScyllaDB. For the use cases that we tried, there wasn’t any data model changes necessary and the scylladb driver was entirely compatible and a swap in replacement with Cassandra driver dependency. With a few tweaks to our automation framework that provisions Apache Cassandra cluster, we were able to provision ScyllaDB open source cluster. Thanks to C++ backend of ScyllaDB, we no longer have to worry about stop the world GC pauses. Also we were able to store more data per node and achieve more throughput per node, thereby saving significant $$$ for company. Clear roadmap and support from ScyllaDB slack community comes in very handy.
The candidate application chosen for this POC, is our geo system that provides information about geographical entities and the relationships between them. It aggregates data from multiple systems like hotel location info, 3rd party data , etc. This rich geography dataset enables different types of data searches, using a simple REST API while guarantying single digit msec P99 read response time. To speed up API responses, we are using multi layered cache with redis as first layer and cassandra as second level. With ScyllaDB as a swap in replacement for Cassandra, I’m handing it over to Dilip for going over the infra setup, benchmark results and next steps.
Thank you Singa. Our POC cluster in ScyllaDB were to store around 25TB of data exactly like our existing PROD Cassandra cluster. To begin with we provisioned same total number of instances between Cassandra and ScyllaDB but the instance type chosen was I3.2XL which is 35% cheaper than I3EN.2XL.
The use case demands a high read throughput, while tiny write throughput. As shown in the first graph whether its ScyllaDB or Cassandra the writes are almost negligible or flat line at bottom. While the real winner is on Reads where the Cassandra P99 throughput is flaky as shown by the spikes, while the ScyllaDB P99 read response times are relatively flat. This is of significant advantage to our read heavy application. In terms of throughput comparison as shown in second graph, we were able to push almost double the TPS with ScyllaDB when compared with Cassandra, especially with a flat P99 SLA.
Here are some of the facts that made ScyllaDB benchmark stand out. We were able to get triple the throughput with flat single digit P99 read response times, at the same time achieve over 35% reduction in total cost of ownership. At this point it was a no brainer to switch towards ScyllaDB for this application production workload.
Huge shoutout to our automation team which made the provisioning of ScyllaDB cluster a breeze., made possible via our internal tool called Cerebro. We use this same internal tool for managing over 7 different NoSQL technologies with the aim of enabling our application teams to focus on bringing great products to the market, without having to worry about managing databases.
This application in hand currently uses L1 cache (Redis) before hitting this backend persistent store ScyllaDB. With the advantage of ScyllaDB supporting Redis compatible API and proven P99 improves to be under single digit msec, we are thinking about turning off the in memory cache engine and rely completely on ScyllaDB as only database backend for the application. This will bring in significant additional cost advantage both in terms of infrastructure and application code. Also we recently learned about Scylla Alternator and are currently evaluating if it’s a viable alternative to DynamoDB as advertised.
Logs are being pushed to syslog and there isn’t a configuration to route them to a customer folder of your choice. The CDC functionality is significantly better compared to Apache Cassandra, so this might entice applications that rely on change streams. A good thing about ScyllaDB node replacements either during scale up or scale down are resumable. Please pay caution while using large partition, the performance might vary depending on how large the partitions are.
If you are interested in what you heard and want to build great products with us....Please join hands
Thanks for this opportunity to present to ScyllaDB enthuasists all over the world. We enjoyed every moment of putting this together and hope you did too.

Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB

Similar to Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB

Editor's Notes