SlideShare a Scribd company logo
1 of 36
Download to read offline
Scalable
Experimentation &
Data Capture
at Grab
Dr. Roman Atachiants // June 2019
Southeast Asia’s Super App
Our Services: Mobility, Fintech & Lifestyle
Product Insights & Experimentation Team
Product Insights & Experimentation at Grab
Experimentation Platform aims to
accelerate innovation, by allowing
us to find and focus on key ideas
evaluated through rigorous regime
of online controlled experiments. It
also promotes the culture of data
driven decision making through
experimentation.
Scribe is our in house mobile and
backend event collection solution. It
helps us track, monitor and analyse
any mobile event such as user clicks,
booking rides as well as mobile logs,
data usage and more.
Experimentation
Platform
Experimentation Platform: Mission
The Grab Experimentation Platform aims to accelerate innovation, by allowing
us to find and focus on key ideas evaluated through rigorous regime of online
controlled experiments. It also promotes the culture of data driven decision
making through experimentation.
To get there, we must lower the cost of experimentation significantly.
Experimentation Platform
Feature Toggles
Our Product Insights & Experimentation platform provides a dynamic feature
toggle capability to our engineering, data, product and business teams.
Feature toggles also let teams modify system behavior without changing
code. At Grab, we use feature toggles to:
1. Gate feature deployment in production for keeping new features hidden
until product and marketing teams are ready to share.
2. Run experiments (A/B tests) by dynamically changing feature toggles for
specific users, rides, etc. For example, a feature can only appear to a
particular group of people while experiment is running (treatment group).
Experimentation Platform
A/B testing is a method of comparing
two or more versions of a system
against each other to determine which
one performs best.
Example: an experiment where two
variants of a page are shown to users
at random, and statistical analysis is
used to determine which variation
performs better for a given conversion
goal (metric).
Experimentation Platform
Experimentation Process Prior to ExP
Example:
Automated Message
Experiment
Automated Message Experiment
Problem: We wanted to determine the best delay to send an
automated message to passengers asking for extra details.
● 70% of bookings had no message
● 10% of bookings got a message after 30s
● 10% of bookings got a message after 60s
● 10% of bookings got a message after 90s
The experiment ran on Singapore GC, GC+ and JustGrab for a
period of one week.
The hypothesis was that sending a message sooner will
consistently reduce cancellations.
Experiment Definition
"domain": "primary",
"name": "primary.automatedChatMessage",
"variables": [{
"name": "automatedMessageDelay",
"strategy": "weightedChoice",
"facets": ["bkg"],
"choices": [
{ "value": 0, "weight": 0.70 },
{ "value": 30, "weight": 0.10 },
{ "value": 60, "weight": 0.10 },
{ "value": 90, "weight": 0.10 }]
}],"constraints": [
"ts > 1400113600",
"ts < 1400718400",
"svc in [1,2,3]"]
Experiment Definition
In Grab Chat Service
SDK is used to retrieve a value for the variable
based on the configuration. Default value will
be used if no experiment is running or
constraints do not match.
In Booking Service
SDK is used to instrument required cancellation
metrics.
facets := sdk.NewFacets().
City(city).
Vehicle(vehicleType).
Passenger(passenger).
Booking(booking)
delay := sdk.GetVariable(ctx,
"automatedMessageDelay",facets).Int64(0)
sendMessage(delay) // logic
facets := sdk.NewFacets().
City(req.CityID).
Vehicle(vehicleType).
Passenger(passenger).
Booking(booking)
sdk.Track(ctx,"booking.cancelled",
reason, facets)
SDK and Architecture
Golang SDK
Our Go SDK provides a very
simple client which takes
care of everything.
GetVariable returns within
5-10 microseconds as there
no network calls in the hot
path.
Track buffers locally and
periodically flushes batches
of events to our ingestion
service (McD).
type SDK interface {
// GetVariable with name of the variable
GetVariable(ctx Context, name string,
facets *Facets) Variable
// Track an event with a value and metadata
Track(ctx Context, eventName string, value float64
facets *Facets)
// InjectLocally injects the facets to the local golang
// context. This will only be applied to this context and
// will not be propagated throughout the service stack.
InjectLocally(ctx Context, values Facets) Context
// InjectGlobally injects the facets to the global
// tracing baggage. This allows you to propagate
// facet values across services.
InjectGlobally(span opentracing.Span, values Facets)
}
Facets
Facets represent the context
describing an event. Each
GetVariable() call generates
an event as well and facets
are used to resolve the final
value.
We encourage our engineers
to provide as much context
as possible and we also have
automated enrichment in
place.
Facets {
Passenger int64 // The passenger identifier
Driver int64 // The driver identifier
Service int64 // The equivalent to a vehicle type
Booking string // The booking code
Location string // The location geohash
Session string // The session identifier
Request string // The request identifier
Device string // The device identifier
Tag string // The user-defined tag
...
}
Event Tracking
In order to track an event,
engineers are required to add
the name of the event, value
and facets as well.
facets := sdk.NewFacets().
Booking(req.BookingCode).
Custom(“key1”, “some value”).
Custom(“key2”, “some other value”)
X.Track(ctx,"customEvent", 42, facets)
SELECT avg(val) mean, city
FROM eventlog
WHERE event = 'myservice.customEvent'
GROUP BY city
ORDER BY 1 DESC
LIMIT 100
Architectural Overview
Kafka + Presto
Backend Services
The API allows to configure
experiments and includes a
planner which schedules
experiments into segments
based on the facets. The
planner ensures that there’s no
spillover between experiments.
Variable assignments
and metrics are sent to
ingestor service
PlannerDashboard
McD Gateway
SDK
The SDK retrieves the
experiment definitions and
autonomously makes
assignments.
UCM S3 Bucket
Reliability
We designed the platform to
be as reliable as possible,
making sure that we do not
affect our customers in any
way if we’re down.
Our data is stored in S3, and
if McD is down the data
simply won’t be written but
none of backend services will
be affected.
Event Tracking
McD - Event Ingestion System
Problem statement
To enable the construction of
platforms that will provide us
valuable business insight and
improve the quality of our products
and services.
What we built
McD - a general-purpose event
ingestion system to capture useful
mobile and backend events such as
impressions, user interactions (eg.
clicks), performance metrics (eg.
battery life).
McD - Event Ingestion System
Funnels
Need to capture unified funnels
(backend and mobile), especially
with proliferation of Grab Apps.
Experimentation
Need of fully-fledged and unified
experimentation platform for our
mobile and backend.
McD - Event Ingestion System
Monitoring and Logging
Need of simple monitoring and
debuggability tool for our mobile
apps (passenger, driver, food…)
User Profiles
Need to capture comprehensive
click streams and user profiles.
SDKs for Every Platform at Grab
● Available for Go, JS, Android and iOS
● Efficient, batched, binary event
transmission over HTTPS
● Dynamic reconfiguration of the SDK on
a per device level
● Built-in persistence & QoS in order to
prevent data loss for critical events
● Automatic tracking of common events
● Integrated feature flags & A/B testing
Scaling our Data Pipeline
Ingesting Billions of Events, Daily
our kafka broke
SDKs for Every Platform at Grab
Data Collection Architecture
Android SDK
McD
Analytics
Experiments
Alerting Service
Logging Service
User Profile
Other Consumers
Go SDK
iOS SDK
JS SDK Kafka Presto
S3 + Hive
(Batched)
TalariaDB
(Realtime)
ETL
Querying with SQL thanks to Presto
Querying Terabytes in Milliseconds
One of the goals of
TalariaDB is simplicity. We
achieve that by making sure
that system itself is not
responsible for things like
data transformation and
re-partitioning of data but
simply ingests and serves
data to Presto, nothing else.
How Ingestion Happens
// Append appends a set of events to the storage. It needs ingestion time and an event name to create
// a key which will then be used for retrieval.
func (s *Server) Append(payload []byte) error {
schema, err := orc.SplitByColumn(payload, column, func(event string, columnChunk []byte) bool {
_, err := orc.SplitBySize(columnChunk, 25000, func(chunk []byte) bool {
// Create a new block to store from orc buffer
blk, _ := block.FromOrc(chunk)
// Encode the block
buffer, _ := blk.Encode()
// Append this block to the store
s.store.Append( newKey(event, time.Unix(0, tsi)), buffer, time.Hour)
return false
})
return err != nil
})
return nil
}
Presto Integration
In order for Presto to retrieve
data from TalariaDB, we have
used thrift connector
provided. This takes care of
most of the heavy lifting.
// PrestoGetSplits returns a batch of splits.
func (s *Server) PrestoGetSplits(...) {
// Create a new query and validate it
...
// We need to generate as many splits as we have nodes in our cluster. Each
// split needs to contain the IP address of the node containing that split,
// so Presto can reach it and request the data.
batch = new(presto.PrestoThriftSplitBatch)
for _, m := range s.cluster.Members() {
for _, q := range queries {
batch.Splits = append(batch.Splits, &presto.PrestoThriftSplit{
SplitId: q.Encode(),
Hosts: []*presto.PrestoThriftHostAddress{{
Host: m, Port: s.port}},
})
}
}
return
}
Combining LSTM with Columnar Zero-Copy Codec
The keys are lexicographically
ordered and when a query
comes, it essentially seeks to
the first key for that metric
and stops iterating when
either next metric is found or
time bound is reached.
Decoding is efficient, without
any unnecessary memory
copy.
Read more on engineering.grab.com
● Querying Big Data in Real-Time with Presto & Grab's TalariaDB
● Building Grab’s Experimentation Platform
● Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab
● Orchestrating Chaos using Grab's Experimentation Platform
● A Lean and Scalable Data Pipeline To Capture Large Scale Events and
Support Experimentation Platform
Q&A
https://grab.careers
https://engineering.grab.com
Thanks!

More Related Content

Similar to Scaling Experimentation & Data Capture at Grab

Large scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at GrabLarge scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at GrabRoman
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdfStephanie Locke
 
Full accesspolicyconsolidation for event processing systems
Full accesspolicyconsolidation for event processing systemsFull accesspolicyconsolidation for event processing systems
Full accesspolicyconsolidation for event processing systemsviswanadhamsatish
 
Microservices with .Net - NDC Sydney, 2016
Microservices with .Net - NDC Sydney, 2016Microservices with .Net - NDC Sydney, 2016
Microservices with .Net - NDC Sydney, 2016Richard Banks
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appNeil Avery
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkShashank Gautam
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesBoyan Dimitrov
 
WSO2 Complex Event Processor - Product Overview
WSO2 Complex Event Processor - Product OverviewWSO2 Complex Event Processor - Product Overview
WSO2 Complex Event Processor - Product OverviewWSO2
 
Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: ...
Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: ...Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: ...
Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: ...Flink Forward
 
Sessionizing Uber Trips in Realtime - Flink Forward '18, Berlin
Sessionizing Uber Trips in Realtime  - Flink Forward '18, BerlinSessionizing Uber Trips in Realtime  - Flink Forward '18, Berlin
Sessionizing Uber Trips in Realtime - Flink Forward '18, BerlinAmey Chaugule
 
Testing in the IoT Era
Testing in the IoT EraTesting in the IoT Era
Testing in the IoT EraTechWell
 
Stream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysStream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysSmartNews, Inc.
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data cultureSimon Dittlmann
 
Pune microsoft azure developers 2nd meetup
Pune microsoft azure developers 2nd meetupPune microsoft azure developers 2nd meetup
Pune microsoft azure developers 2nd meetupratneshsinghparihar
 
How to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based WorldHow to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based WorldKen Owens
 

Similar to Scaling Experimentation & Data Capture at Grab (20)

Large scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at GrabLarge scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at Grab
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Async
AsyncAsync
Async
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature Delivery
 
Working with data using Azure Functions.pdf
Working with data using Azure Functions.pdfWorking with data using Azure Functions.pdf
Working with data using Azure Functions.pdf
 
Full accesspolicyconsolidation for event processing systems
Full accesspolicyconsolidation for event processing systemsFull accesspolicyconsolidation for event processing systems
Full accesspolicyconsolidation for event processing systems
 
Microservices with .Net - NDC Sydney, 2016
Microservices with .Net - NDC Sydney, 2016Microservices with .Net - NDC Sydney, 2016
Microservices with .Net - NDC Sydney, 2016
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming app
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing framework
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architectures
 
WSO2 Complex Event Processor - Product Overview
WSO2 Complex Event Processor - Product OverviewWSO2 Complex Event Processor - Product Overview
WSO2 Complex Event Processor - Product Overview
 
Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: ...
Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: ...Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: ...
Flink Forward Berlin 2018: Amey Chaugule - "Threading Needles in a Haystack: ...
 
Sessionizing Uber Trips in Realtime - Flink Forward '18, Berlin
Sessionizing Uber Trips in Realtime  - Flink Forward '18, BerlinSessionizing Uber Trips in Realtime  - Flink Forward '18, Berlin
Sessionizing Uber Trips in Realtime - Flink Forward '18, Berlin
 
Testing in the IoT Era
Testing in the IoT EraTesting in the IoT Era
Testing in the IoT Era
 
Stream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysStream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdays
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
 
Pune microsoft azure developers 2nd meetup
Pune microsoft azure developers 2nd meetupPune microsoft azure developers 2nd meetup
Pune microsoft azure developers 2nd meetup
 
How to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based WorldHow to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based World
 
Resume
ResumeResume
Resume
 

Recently uploaded

Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 

Recently uploaded (20)

Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 

Scaling Experimentation & Data Capture at Grab

  • 1. Scalable Experimentation & Data Capture at Grab Dr. Roman Atachiants // June 2019
  • 2.
  • 4. Our Services: Mobility, Fintech & Lifestyle
  • 5. Product Insights & Experimentation Team
  • 6. Product Insights & Experimentation at Grab Experimentation Platform aims to accelerate innovation, by allowing us to find and focus on key ideas evaluated through rigorous regime of online controlled experiments. It also promotes the culture of data driven decision making through experimentation. Scribe is our in house mobile and backend event collection solution. It helps us track, monitor and analyse any mobile event such as user clicks, booking rides as well as mobile logs, data usage and more.
  • 8. Experimentation Platform: Mission The Grab Experimentation Platform aims to accelerate innovation, by allowing us to find and focus on key ideas evaluated through rigorous regime of online controlled experiments. It also promotes the culture of data driven decision making through experimentation. To get there, we must lower the cost of experimentation significantly.
  • 9. Experimentation Platform Feature Toggles Our Product Insights & Experimentation platform provides a dynamic feature toggle capability to our engineering, data, product and business teams. Feature toggles also let teams modify system behavior without changing code. At Grab, we use feature toggles to: 1. Gate feature deployment in production for keeping new features hidden until product and marketing teams are ready to share. 2. Run experiments (A/B tests) by dynamically changing feature toggles for specific users, rides, etc. For example, a feature can only appear to a particular group of people while experiment is running (treatment group).
  • 10. Experimentation Platform A/B testing is a method of comparing two or more versions of a system against each other to determine which one performs best. Example: an experiment where two variants of a page are shown to users at random, and statistical analysis is used to determine which variation performs better for a given conversion goal (metric).
  • 13. Automated Message Experiment Problem: We wanted to determine the best delay to send an automated message to passengers asking for extra details. ● 70% of bookings had no message ● 10% of bookings got a message after 30s ● 10% of bookings got a message after 60s ● 10% of bookings got a message after 90s The experiment ran on Singapore GC, GC+ and JustGrab for a period of one week. The hypothesis was that sending a message sooner will consistently reduce cancellations.
  • 14. Experiment Definition "domain": "primary", "name": "primary.automatedChatMessage", "variables": [{ "name": "automatedMessageDelay", "strategy": "weightedChoice", "facets": ["bkg"], "choices": [ { "value": 0, "weight": 0.70 }, { "value": 30, "weight": 0.10 }, { "value": 60, "weight": 0.10 }, { "value": 90, "weight": 0.10 }] }],"constraints": [ "ts > 1400113600", "ts < 1400718400", "svc in [1,2,3]"]
  • 15.
  • 16. Experiment Definition In Grab Chat Service SDK is used to retrieve a value for the variable based on the configuration. Default value will be used if no experiment is running or constraints do not match. In Booking Service SDK is used to instrument required cancellation metrics. facets := sdk.NewFacets(). City(city). Vehicle(vehicleType). Passenger(passenger). Booking(booking) delay := sdk.GetVariable(ctx, "automatedMessageDelay",facets).Int64(0) sendMessage(delay) // logic facets := sdk.NewFacets(). City(req.CityID). Vehicle(vehicleType). Passenger(passenger). Booking(booking) sdk.Track(ctx,"booking.cancelled", reason, facets)
  • 18. Golang SDK Our Go SDK provides a very simple client which takes care of everything. GetVariable returns within 5-10 microseconds as there no network calls in the hot path. Track buffers locally and periodically flushes batches of events to our ingestion service (McD). type SDK interface { // GetVariable with name of the variable GetVariable(ctx Context, name string, facets *Facets) Variable // Track an event with a value and metadata Track(ctx Context, eventName string, value float64 facets *Facets) // InjectLocally injects the facets to the local golang // context. This will only be applied to this context and // will not be propagated throughout the service stack. InjectLocally(ctx Context, values Facets) Context // InjectGlobally injects the facets to the global // tracing baggage. This allows you to propagate // facet values across services. InjectGlobally(span opentracing.Span, values Facets) }
  • 19. Facets Facets represent the context describing an event. Each GetVariable() call generates an event as well and facets are used to resolve the final value. We encourage our engineers to provide as much context as possible and we also have automated enrichment in place. Facets { Passenger int64 // The passenger identifier Driver int64 // The driver identifier Service int64 // The equivalent to a vehicle type Booking string // The booking code Location string // The location geohash Session string // The session identifier Request string // The request identifier Device string // The device identifier Tag string // The user-defined tag ... }
  • 20. Event Tracking In order to track an event, engineers are required to add the name of the event, value and facets as well. facets := sdk.NewFacets(). Booking(req.BookingCode). Custom(“key1”, “some value”). Custom(“key2”, “some other value”) X.Track(ctx,"customEvent", 42, facets) SELECT avg(val) mean, city FROM eventlog WHERE event = 'myservice.customEvent' GROUP BY city ORDER BY 1 DESC LIMIT 100
  • 21. Architectural Overview Kafka + Presto Backend Services The API allows to configure experiments and includes a planner which schedules experiments into segments based on the facets. The planner ensures that there’s no spillover between experiments. Variable assignments and metrics are sent to ingestor service PlannerDashboard McD Gateway SDK The SDK retrieves the experiment definitions and autonomously makes assignments. UCM S3 Bucket Reliability We designed the platform to be as reliable as possible, making sure that we do not affect our customers in any way if we’re down. Our data is stored in S3, and if McD is down the data simply won’t be written but none of backend services will be affected.
  • 23. McD - Event Ingestion System Problem statement To enable the construction of platforms that will provide us valuable business insight and improve the quality of our products and services. What we built McD - a general-purpose event ingestion system to capture useful mobile and backend events such as impressions, user interactions (eg. clicks), performance metrics (eg. battery life).
  • 24. McD - Event Ingestion System Funnels Need to capture unified funnels (backend and mobile), especially with proliferation of Grab Apps. Experimentation Need of fully-fledged and unified experimentation platform for our mobile and backend.
  • 25. McD - Event Ingestion System Monitoring and Logging Need of simple monitoring and debuggability tool for our mobile apps (passenger, driver, food…) User Profiles Need to capture comprehensive click streams and user profiles.
  • 26. SDKs for Every Platform at Grab ● Available for Go, JS, Android and iOS ● Efficient, batched, binary event transmission over HTTPS ● Dynamic reconfiguration of the SDK on a per device level ● Built-in persistence & QoS in order to prevent data loss for critical events ● Automatic tracking of common events ● Integrated feature flags & A/B testing
  • 27. Scaling our Data Pipeline
  • 28. Ingesting Billions of Events, Daily our kafka broke
  • 29. SDKs for Every Platform at Grab Data Collection Architecture Android SDK McD Analytics Experiments Alerting Service Logging Service User Profile Other Consumers Go SDK iOS SDK JS SDK Kafka Presto S3 + Hive (Batched) TalariaDB (Realtime) ETL
  • 30. Querying with SQL thanks to Presto
  • 31. Querying Terabytes in Milliseconds One of the goals of TalariaDB is simplicity. We achieve that by making sure that system itself is not responsible for things like data transformation and re-partitioning of data but simply ingests and serves data to Presto, nothing else.
  • 32. How Ingestion Happens // Append appends a set of events to the storage. It needs ingestion time and an event name to create // a key which will then be used for retrieval. func (s *Server) Append(payload []byte) error { schema, err := orc.SplitByColumn(payload, column, func(event string, columnChunk []byte) bool { _, err := orc.SplitBySize(columnChunk, 25000, func(chunk []byte) bool { // Create a new block to store from orc buffer blk, _ := block.FromOrc(chunk) // Encode the block buffer, _ := blk.Encode() // Append this block to the store s.store.Append( newKey(event, time.Unix(0, tsi)), buffer, time.Hour) return false }) return err != nil }) return nil }
  • 33. Presto Integration In order for Presto to retrieve data from TalariaDB, we have used thrift connector provided. This takes care of most of the heavy lifting. // PrestoGetSplits returns a batch of splits. func (s *Server) PrestoGetSplits(...) { // Create a new query and validate it ... // We need to generate as many splits as we have nodes in our cluster. Each // split needs to contain the IP address of the node containing that split, // so Presto can reach it and request the data. batch = new(presto.PrestoThriftSplitBatch) for _, m := range s.cluster.Members() { for _, q := range queries { batch.Splits = append(batch.Splits, &presto.PrestoThriftSplit{ SplitId: q.Encode(), Hosts: []*presto.PrestoThriftHostAddress{{ Host: m, Port: s.port}}, }) } } return }
  • 34. Combining LSTM with Columnar Zero-Copy Codec The keys are lexicographically ordered and when a query comes, it essentially seeks to the first key for that metric and stops iterating when either next metric is found or time bound is reached. Decoding is efficient, without any unnecessary memory copy.
  • 35. Read more on engineering.grab.com ● Querying Big Data in Real-Time with Presto & Grab's TalariaDB ● Building Grab’s Experimentation Platform ● Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab ● Orchestrating Chaos using Grab's Experimentation Platform ● A Lean and Scalable Data Pipeline To Capture Large Scale Events and Support Experimentation Platform