SlideShare a Scribd company logo
1 of 21
Grab at Scale with Scylla
Chao Wang, Arun Sandu
Chao Wang
Engineer Dev Manager
2
Arun Sandu
Engineer
■ Chao is working on Grab’s Trust team and contributing on building the ML
driven fraud detection platform
■ Worked on moving the engineering and the machine learning challenges
forward in the last 5 more years
■ Arun works on design, automation of reliable and scalable NoSQL
datastores, operating in a cloud environment across Grab.
■ Worked at Starbucks building highly resilient scalable infrastructure and
a distributed datastore for Rewards program in North America and
Japan markets.
Agenda
▪ What is Grab?
▪ Fraud detection at Grab
▪ Use cases
▪ Optimizations
3
VIETNAM
MYANMAR
THAILAND
MALAYSIA
SINGAPORE
INDONESIA
PHILIPPINES
CAMBODIA
339 Cities in
8 countries
>205 million mobile
downloads
#1 Transportation platform
#1 FinTech Platform
Fastest-growing regional food
delivery service
All in 8 years....
#1 Consumer platform
Impacting Lives in SEA
4
Delivering everyday services to improve
the quality of life for Southeast Asians
TRANSPORT FOOD DELIVERY
REWARDS
MOBILE WALLET
FINANCIAL
SERVICES
GROCERIES
5
Protect users and enhance trust in Grab ecosystem
Add Card
Offline
<Periodic>
Online
<Real-time>
E-wallet Topup
Promo usage
Pre-Ride Pre-Allocation Pre-Charge
Topup with driver Mid-ride changes
Driver Cashout
Driver app login
BookingsCashless funding PayoutsApp login
Feature creation
■ Booking characteristics
■ Promo usage patterns
■ High risk card transactions
■ Driver fraud score
■ Passenger fraud score
Fraud verdicts
■ Booking fraud score
■ Card fraud score
■ Driver fraud score
■ Passenger fraud score
Incentive Payout
6
What do we protect?
■ Examples - number of rides a passenger completes within X hours; or
volume of declined transactions a driver, passenger pair within X hours
■ We use various types of counters to detect potential fraud/identity/safety risk,
e.g. if a passenger A and driver B together take more than 10 rides in last hour,
then it is very suspicious, we may take some action for further rides
■ e.g. booking_count:passenger_id:71008:driver_id:3546, value = 20
7
Counter Service – Real-time Aggregation
The conventional method
▪ Offline big data process
▪ Data analysts and engineers work on the scripts
Bottleneck
▪ Not in real time and it is important!
▪ Long development life-cycle for new data points.
Challenges
▪ Scalability
▪ Self-serving
▪ Manageable and extendable
8
9
Counter Write Workflow
10
Counter Read Workflow
11
Now
10:35am
10:00am09:00am08:00am07:45am07:30am
Query: 3 hours ago to now
Buckets: Daily, Hourly, 15 Mins
key timestamp value
pax_1 2020-10-11 07:30:00 1
pax_1 2020-10-11 07:45:00 5
key timestamp value
pax_1 2020-10-11 08:00:00 6
pax_1 2020-10-11 09:00:00 1
pax_1 2020-10-11 10:00:00 8
Min table Hourly table
Improve the Aggregation Performance
■ Ads
■ Kairosdb
■ Stream Processing
■ Segmentation Platform
Use cases at Grab
12
13
Ads
Supports logging every user event,
clicks, reporting impressions,
statistics, capping etc.
Kairosdb
Distributed scalable time series database
uses scylla as storage backend for
metrics data.
14
Stream Processing
Supports real-time data processing,
process time series data for millions
of transactions, data streaming with
apache kafka.
15
Segmentation Platform
■ Experiments on targeted segments
■ Driver loyalty rewards
■ Eligibility check of user and apply promo in
real-time
■ Target Ads and run campaigns
■ Promo recommendations using ML models
■ Target customers for any communication
16
Frontend UI
■ Create, delete and refresh segments.
■ Schedule jobs for segment creation
■ Passenger lookup in a segment
17
Optimizations
18
Latencies
Datadog agent created CPU hogging which affected scylla performance.
19
Before After
P99 Read Latency 100ms 25ms
Error rate 1% 0
20
Cost savings
■ TTL with default expiry
■ Rate limiter for writes and reads based on the desired qps
■ New mc storage format
■ Delete unused segments
Tombstones
■ A scheduled major compaction helped achieve better latencies.
Large Partitions
■ Dynamically create partitions based on the size of the segments.
Thank You
iamarunsandu@gmail.com
Arun Sandu
chao.wang@grab.com
Chao Wang

More Related Content

What's hot

Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, SisenseDatabase Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense✔ Eric David Benari, PMP
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudRobert Sanders
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
Code Once Use Often with Declarative Data Pipelines
Code Once Use Often with Declarative Data PipelinesCode Once Use Often with Declarative Data Pipelines
Code Once Use Often with Declarative Data PipelinesDatabricks
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLDatabricks
 
Building Data Quality Audit Framework using Delta Lake at Cerner
Building Data Quality Audit Framework using Delta Lake at CernerBuilding Data Quality Audit Framework using Delta Lake at Cerner
Building Data Quality Audit Framework using Delta Lake at CernerDatabricks
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraDataStax Academy
 
Rounds analytics pipeline
Rounds analytics pipelineRounds analytics pipeline
Rounds analytics pipelineAviv Laufer
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksDatabricks
 
How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...Databricks
 
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingReal-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingDatabricks
 
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at BoschLeveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at BoschDatabricks
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
 
Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British...
Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British...Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British...
Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British...DataStax
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemDatabricks
 
Spark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSpark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSAP Technology
 
Cost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceCost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceDatabricks
 
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Sanjay Sharma
 
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017Rendy Bambang Junior
 
Traveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsTraveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsRendy Bambang Junior
 

What's hot (20)

Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, SisenseDatabase Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
Database Camp 2016 @ United Nations, NYC - Amir Orad, CEO, Sisense
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the Cloud
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Code Once Use Often with Declarative Data Pipelines
Code Once Use Often with Declarative Data PipelinesCode Once Use Often with Declarative Data Pipelines
Code Once Use Often with Declarative Data Pipelines
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using ML
 
Building Data Quality Audit Framework using Delta Lake at Cerner
Building Data Quality Audit Framework using Delta Lake at CernerBuilding Data Quality Audit Framework using Delta Lake at Cerner
Building Data Quality Audit Framework using Delta Lake at Cerner
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with Cassandra
 
Rounds analytics pipeline
Rounds analytics pipelineRounds analytics pipeline
Rounds analytics pipeline
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with Databricks
 
How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...
 
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingReal-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
 
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at BoschLeveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
 
Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British...
Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British...Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British...
Streaming Customer Insights with DataStax Cassandra & Apache Kafta at British...
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an Ecosystem
 
Spark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSpark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business Operations
 
Cost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark ServiceCost Efficiency Strategies for Managed Apache Spark Service
Cost Efficiency Strategies for Managed Apache Spark Service
 
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
 
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
 
Traveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsTraveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analytics
 

Similar to Grab at Scale with Scylla

The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York CityThe Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York CityNeo4j
 
The Eight Building Blocks of Quote-to-Cash Transformation
The Eight Building Blocks of Quote-to-Cash TransformationThe Eight Building Blocks of Quote-to-Cash Transformation
The Eight Building Blocks of Quote-to-Cash TransformationApttus
 
CIO Leadership Summit 2018 - From Digital to Intelligent Enterprise
CIO Leadership Summit 2018 - From Digital to Intelligent EnterpriseCIO Leadership Summit 2018 - From Digital to Intelligent Enterprise
CIO Leadership Summit 2018 - From Digital to Intelligent EnterprisePhilippe Nemery
 
Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Fwdays
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital TransformationMukund Babbar
 
EVAM_Streaming Analytics_v1.5
EVAM_Streaming Analytics_v1.5EVAM_Streaming Analytics_v1.5
EVAM_Streaming Analytics_v1.5John Nikolaidis
 
CWIN17 london delivering devops and release automation in fs - duncan bradf...
CWIN17 london   delivering devops and release automation in fs - duncan bradf...CWIN17 london   delivering devops and release automation in fs - duncan bradf...
CWIN17 london delivering devops and release automation in fs - duncan bradf...Capgemini
 
Mohamed Taman short C.V version v1.0
Mohamed Taman short C.V version v1.0Mohamed Taman short C.V version v1.0
Mohamed Taman short C.V version v1.0Mohamed Taman
 
Saurabh_Shanbhag_Resume.pdf
Saurabh_Shanbhag_Resume.pdfSaurabh_Shanbhag_Resume.pdf
Saurabh_Shanbhag_Resume.pdfSaurabhShanbhag3
 
Data for Good Regina - 7shifts Presentation
Data for Good Regina - 7shifts PresentationData for Good Regina - 7shifts Presentation
Data for Good Regina - 7shifts PresentationData For Good Regina
 
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...Santiago Cabrera-Naranjo
 
SAP Process Mining in Action: Hear from Two Customers
SAP Process Mining in Action: Hear from Two CustomersSAP Process Mining in Action: Hear from Two Customers
SAP Process Mining in Action: Hear from Two CustomersCelonis
 
ABHAY SINGH BI CONSULTANT CV
ABHAY SINGH BI CONSULTANT CVABHAY SINGH BI CONSULTANT CV
ABHAY SINGH BI CONSULTANT CVAbhay Singh
 
Rnd point-case-out systems
Rnd point-case-out systemsRnd point-case-out systems
Rnd point-case-out systemsPST Labs
 
Muhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET JobMuhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET JobMuhammad Shafique
 
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...Databricks
 
MetaTags - Pitch Deck Gallery - Innovation Labs
MetaTags - Pitch Deck Gallery - Innovation LabsMetaTags - Pitch Deck Gallery - Innovation Labs
MetaTags - Pitch Deck Gallery - Innovation Labsstartupro
 

Similar to Grab at Scale with Scylla (20)

The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York CityThe Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
 
The Eight Building Blocks of Quote-to-Cash Transformation
The Eight Building Blocks of Quote-to-Cash TransformationThe Eight Building Blocks of Quote-to-Cash Transformation
The Eight Building Blocks of Quote-to-Cash Transformation
 
Resume
ResumeResume
Resume
 
CIO Leadership Summit 2018 - From Digital to Intelligent Enterprise
CIO Leadership Summit 2018 - From Digital to Intelligent EnterpriseCIO Leadership Summit 2018 - From Digital to Intelligent Enterprise
CIO Leadership Summit 2018 - From Digital to Intelligent Enterprise
 
Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital Transformation
 
EVAM_Streaming Analytics_v1.5
EVAM_Streaming Analytics_v1.5EVAM_Streaming Analytics_v1.5
EVAM_Streaming Analytics_v1.5
 
CWIN17 london delivering devops and release automation in fs - duncan bradf...
CWIN17 london   delivering devops and release automation in fs - duncan bradf...CWIN17 london   delivering devops and release automation in fs - duncan bradf...
CWIN17 london delivering devops and release automation in fs - duncan bradf...
 
Mohamed Taman short C.V version v1.0
Mohamed Taman short C.V version v1.0Mohamed Taman short C.V version v1.0
Mohamed Taman short C.V version v1.0
 
Saurabh_Shanbhag_Resume.pdf
Saurabh_Shanbhag_Resume.pdfSaurabh_Shanbhag_Resume.pdf
Saurabh_Shanbhag_Resume.pdf
 
Data for Good Regina - 7shifts Presentation
Data for Good Regina - 7shifts PresentationData for Good Regina - 7shifts Presentation
Data for Good Regina - 7shifts Presentation
 
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
 
SAP Process Mining in Action: Hear from Two Customers
SAP Process Mining in Action: Hear from Two CustomersSAP Process Mining in Action: Hear from Two Customers
SAP Process Mining in Action: Hear from Two Customers
 
ABHAY SINGH BI CONSULTANT CV
ABHAY SINGH BI CONSULTANT CVABHAY SINGH BI CONSULTANT CV
ABHAY SINGH BI CONSULTANT CV
 
Rnd point-case-out systems
Rnd point-case-out systemsRnd point-case-out systems
Rnd point-case-out systems
 
Hm corporate presentation
Hm corporate presentationHm corporate presentation
Hm corporate presentation
 
Muhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET JobMuhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET Job
 
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
 
Rise with SAP
Rise with SAPRise with SAP
Rise with SAP
 
MetaTags - Pitch Deck Gallery - Innovation Labs
MetaTags - Pitch Deck Gallery - Innovation LabsMetaTags - Pitch Deck Gallery - Innovation Labs
MetaTags - Pitch Deck Gallery - Innovation Labs
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

Grab at Scale with Scylla

  • 1. Grab at Scale with Scylla Chao Wang, Arun Sandu
  • 2. Chao Wang Engineer Dev Manager 2 Arun Sandu Engineer ■ Chao is working on Grab’s Trust team and contributing on building the ML driven fraud detection platform ■ Worked on moving the engineering and the machine learning challenges forward in the last 5 more years ■ Arun works on design, automation of reliable and scalable NoSQL datastores, operating in a cloud environment across Grab. ■ Worked at Starbucks building highly resilient scalable infrastructure and a distributed datastore for Rewards program in North America and Japan markets.
  • 3. Agenda ▪ What is Grab? ▪ Fraud detection at Grab ▪ Use cases ▪ Optimizations 3
  • 4. VIETNAM MYANMAR THAILAND MALAYSIA SINGAPORE INDONESIA PHILIPPINES CAMBODIA 339 Cities in 8 countries >205 million mobile downloads #1 Transportation platform #1 FinTech Platform Fastest-growing regional food delivery service All in 8 years.... #1 Consumer platform Impacting Lives in SEA 4
  • 5. Delivering everyday services to improve the quality of life for Southeast Asians TRANSPORT FOOD DELIVERY REWARDS MOBILE WALLET FINANCIAL SERVICES GROCERIES 5
  • 6. Protect users and enhance trust in Grab ecosystem Add Card Offline <Periodic> Online <Real-time> E-wallet Topup Promo usage Pre-Ride Pre-Allocation Pre-Charge Topup with driver Mid-ride changes Driver Cashout Driver app login BookingsCashless funding PayoutsApp login Feature creation ■ Booking characteristics ■ Promo usage patterns ■ High risk card transactions ■ Driver fraud score ■ Passenger fraud score Fraud verdicts ■ Booking fraud score ■ Card fraud score ■ Driver fraud score ■ Passenger fraud score Incentive Payout 6 What do we protect?
  • 7. ■ Examples - number of rides a passenger completes within X hours; or volume of declined transactions a driver, passenger pair within X hours ■ We use various types of counters to detect potential fraud/identity/safety risk, e.g. if a passenger A and driver B together take more than 10 rides in last hour, then it is very suspicious, we may take some action for further rides ■ e.g. booking_count:passenger_id:71008:driver_id:3546, value = 20 7 Counter Service – Real-time Aggregation
  • 8. The conventional method ▪ Offline big data process ▪ Data analysts and engineers work on the scripts Bottleneck ▪ Not in real time and it is important! ▪ Long development life-cycle for new data points. Challenges ▪ Scalability ▪ Self-serving ▪ Manageable and extendable 8
  • 11. 11 Now 10:35am 10:00am09:00am08:00am07:45am07:30am Query: 3 hours ago to now Buckets: Daily, Hourly, 15 Mins key timestamp value pax_1 2020-10-11 07:30:00 1 pax_1 2020-10-11 07:45:00 5 key timestamp value pax_1 2020-10-11 08:00:00 6 pax_1 2020-10-11 09:00:00 1 pax_1 2020-10-11 10:00:00 8 Min table Hourly table Improve the Aggregation Performance
  • 12. ■ Ads ■ Kairosdb ■ Stream Processing ■ Segmentation Platform Use cases at Grab 12
  • 13. 13 Ads Supports logging every user event, clicks, reporting impressions, statistics, capping etc. Kairosdb Distributed scalable time series database uses scylla as storage backend for metrics data.
  • 14. 14 Stream Processing Supports real-time data processing, process time series data for millions of transactions, data streaming with apache kafka.
  • 15. 15 Segmentation Platform ■ Experiments on targeted segments ■ Driver loyalty rewards ■ Eligibility check of user and apply promo in real-time ■ Target Ads and run campaigns ■ Promo recommendations using ML models ■ Target customers for any communication
  • 16. 16 Frontend UI ■ Create, delete and refresh segments. ■ Schedule jobs for segment creation ■ Passenger lookup in a segment
  • 17. 17
  • 19. Latencies Datadog agent created CPU hogging which affected scylla performance. 19 Before After P99 Read Latency 100ms 25ms Error rate 1% 0
  • 20. 20 Cost savings ■ TTL with default expiry ■ Rate limiter for writes and reads based on the desired qps ■ New mc storage format ■ Delete unused segments Tombstones ■ A scheduled major compaction helped achieve better latencies. Large Partitions ■ Dynamically create partitions based on the size of the segments.