SlideShare a Scribd company logo
1 of 33
Download to read offline
Large scale data capture and
experimentation platform at Grab
September 2018
Product Insights & Experimentation at Grab
Experimentation Platform aims to accelerate innovation, by allowing us to find and
focus on key ideas evaluated through rigorous regime of online controlled experiments.
It also promotes the culture of data driven decision making through experimentation.
Scribe is our in house mobile and backend event collection solution. It helps us track,
monitor and analyse any mobile event such as user clicks, booking rides as well as
mobile logs, data usage and more.
Team
Experimentation Platform
Feature Toggles
Grab’s Product Insights & Experimentation platform provides a dynamic feature toggle
capability to our engineering, data, product and even business teams. Feature toggles
also let teams modify system behavior without changing code.
At Grab, we use feature toggles to:
1. Gate feature deployment in production for keeping new features hidden until
product and marketing teams are ready to share.
2. Run experiments (A/B tests) by dynamically changing feature toggles for specific
users, rides, etc. For example, a feature can only appear to a particular group of
people while experiment is running (treatment group).
More on Feature Toggles: https://martinfowler.com/articles/feature-toggles.html
Our Mission
The Grab Experimentation Platform aims to accelerate innovation, by allowing us to find and
focus on key ideas evaluated through rigorous regime of online controlled experiments. It
also promotes the culture of data driven decision making through experimentation.
To get there, we must lower the cost of experimentation significantly.
What is Experimentation or Testing
Testing is a method of comparing two or more
versions of a system against each other to
determine which one performs best.
For example, in an experiment where two or more
variants of a page are shown to users at random,
and statistical analysis is used to determine
which variation performs better for a given
conversion goal.
Well known types of testing: A/B, Multivariate,
Factorial etc.
Long Time Ago… Prior to Experimentation Platform
Example:
Automated Message Experiment
Automated Message Experiment
Problem: We wanted to determine the best delay to send an
automated message to passengers asking for extra details.
● 70% of bookings had no message
● 10% of bookings got a message after 30s
● 10% of bookings got a message after 60s
● 10% of bookings got a message after 90s
The experiment ran on Singapore GC, GC+ and JustGrab for a
period of one week.
The hypothesis was that sending a message sooner will
consistently reduce cancellations.
Experiment Definition
"domain": "primary",
"name": "primary.automatedChatMessage",
"variables": [{
"name": "automatedMessageDelay",
"strategy": "weightedChoice",
"facets": ["bkg"],
"choices": [
{ "value": 0, "weight": 0.70 },
{ "value": 30, "weight": 0.10 },
{ "value": 60, "weight": 0.10 },
{ "value": 90, "weight": 0.10 }]
}],
"constraints": [
{"target": "ts", "op": ">", "value": "1400113600"},
{"target": "ts", "op": "<", "value": "1400718400"},
{"target": "svc", "op": "in", "value": "[1,2,3]"}
]
Experimentation Platform Portal
Engineering: Using the SDK
In Grab Chat Service
SDK is used to retrieve a value for the variable based
on the configuration. Default value will be used if no
experiment is running or constraints do not match.
In Booking Service
SDK is used to instrument required cancellation metrics.
facets := sdk.NewFacets().
City(city).
Vehicle(vehicleType).
Passenger(passenger).
Booking(booking)
delay := X.GetVariable(ctx,"automatedMessageDelay",
facets).Int64(0)
sendMessage(delay) // logic
facets := sdk.NewFacets().
City(req.CityID).
Vehicle(vehicleType).
Passenger(passenger).
Booking(booking)
X.Track(ctx,"booking.cancelled", reason, facets)
Feature Toggles
How we did Feature Toggles before...
Our legacy feature toggling system was essentially a library shipped with all of our
Golang services that wrapped calls to a shared Redis. Retrieving values involved
network calls and local caching to support our scale, but slowly it started to become a
single point of failure for the company as the number of backend microservices grew.
// Retrieve a feature flag using our legacy system
flag := sitevar.GetFeatureFlag("myFeature", 10, false)
Golang SDK
Our Go SDK provides a very simple client which will take care of everything.
type Client interface {
// GetVariable with name of the variable
GetVariable(ctx context.Context, name string, facets *Facets) Variable
// Track an event with a value and metadata
Track(ctx context.Context, eventName string, value float64, facets *Facets)
// InjectLocally injects the facets to the local golang context. This will only be
// applied to this context and will not be propagated throughout the service stack.
InjectLocally(ctx context.Context, values Facets) context.Context
// InjectGlobally injects the facets to the global tracing baggage. This allows you
// to propagate facet values across services.
InjectGlobally(span opentracing.Span, values Facets)
}
Events and Facets
Facets {
Passenger int64 // The passenger identifier
Driver int64 // The driver identifier
Service int64 // The equivalent to a vehicle type
Booking string // The booking code
Location string // The location (geohash or coordinates)
Session string // The session identifier
Request string // The request identifier
Device string // The device identifier
Tag string // The user-defined tag
...
}
The facets will need to be filled as much as possible.
The facets are set using a 3-step override (ie.: implicit global, implicit local, explicit).
● Step 1 - Global Implicit. If some facets are found in open tracing baggage span, set those facet values.
● Step 2 - Local Implicit. If some facets are found in local golang context, set and override those facet values.
● Step 3 - Explicit. The facets provided explicitly in the GetVariable/Track call should override all implicit ones.
Architectural Overview
Kafka + Presto
Backend Services
The API allows to configure
experiments and includes a
planner which schedules
experiments into segments
based on the facets. The
planner ensures that there’s no
spillover between experiments.
Variable assignments
and metrics are sent to
ingestor service
PlannerDashboard
McD Gateway
SDK
The SDK retrieves the
experiment definitions
and autonomously makes
assignments.
UCM S3 Bucket
Reliability
We designed the platform to be
as reliable as possible, making
sure that we do not affect our
customers (you guys) in any way
if we’re down.
Our data is stored in UCM S3
bucket, and if McD is down the
data simply won’t be written but
none of backend services will be
affected.
Tracking Events with
Scribe
McD - Data is Lovin’ It
Problem statement
To enable the construction of platforms that will provide us valuable business insight
and improve the quality of our products and services.
What we built
McD - a general-purpose event ingestion system to capture useful mobile and backend
events such as impressions, user interactions (eg. clicks), performance metrics (eg.
battery life).
Funnels and Experimentation
Funnels
Need to capture unified funnels
(backend and mobile), especially
with proliferation of Grab Apps.
Experimentation
Need of fully-fledged and unified
experimentation platform for our
mobile and backend.
Logging, Monitoring and Profiles
Monitoring and Logging
Need of simple monitoring and
debuggability tool for our
mobile apps (passenger, driver,
food…)
User Profiles
Need to capture comprehensive
click streams and user profiles.
SDKs for every platform
● Available for Golang, Android and iOS
● Efficient, batched, binary event transmission
● Dynamic configuration of the SDK (per
device)
● Built-in persistence & QoS to avoid data loss
● Automatic tracking of common events (e.g.:
“app start”)
● Integrated feature flags & A/B testing
● Dart and JavaScript SDKs are planned
Reliably Track Anything
facets := sdk.NewFacets().
Booking(req.BookingCode).
Custom(“key1”, “some value”).
Custom(“key2”, “some other value”)
X.Track(ctx,"customEvent", 42, facets)
Our SDK can be used to track any event.
● The name of the event, which gets
prefixed by the service name.
● The numeric value of the event.
● The facets representing contextual
information about this event. Users
are encouraged to provide as much
information as possible, for
example, passenger ID, booking
code, driver ID.
SELECT avg(val) mean, city
FROM eventlog
WHERE event = 'myservice.customEvent'
GROUP BY city
ORDER BY 1 DESC
LIMIT 100
Scaling our Data Pipeline
Data Collection Architecture
Android SDK
McD
Analytics
Experiments
Alerting Service
Logging Service
User Profile
Other Consumers
Go SDK
iOS SDK
JS SDK
Dart SDK
Kafka Presto
S3 + Hive
(Batched)
TalariaDB
(Realtime)
Spark
Querying with SQL
Querying Terabytes in Milliseconds
Simple and Reliable
One of the goals of TalariaDB is
simplicity. We achieve that by
making sure that system itself is
not responsible for things like
data transformation and
re-partitioning of data but simply
ingests and serves data to
Presto, nothing else.
How Ingestion Happens
// Append appends a set of events to the storage. It needs ingestion time and an event name to create
// a key which will then be used for retrieval.
func (s *Server) Append(payload []byte) error {
schema, err := orc.SplitByColumn(payload, column, func(event string, columnChunk []byte) bool {
_, err := orc.SplitBySize(columnChunk, 25000, func(chunk []byte) bool {
// Create a new block to store from orc buffer
blk, _ := block.FromOrc(chunk)
// Encode the block
buffer, _ := blk.Encode()
// Append this block to the store
s.store.Append( newKey(event, time.Unix(0, tsi)), buffer, time.Hour)
return false
})
return err != nil
})
return nil
}
Presto Interaction
// PrestoGetSplits returns a batch of splits.
func (s *Server) PrestoGetSplits(...) {
// Create a new query and validate it
...
// We need to generate as many splits as we have nodes in our cluster. Each
// split needs to contain the IP address of the node containing that split,
// so Presto can reach it and request the data.
batch = new(presto.PrestoThriftSplitBatch)
for _, m := range s.cluster.Members() {
for _, q := range queries {
batch.Splits = append(batch.Splits, &presto.PrestoThriftSplit{
SplitId: q.Encode(),
Hosts: []*presto.PrestoThriftHostAddress{{Host: m, Port: s.port}},
})
}
}
return
}
Combining LSMT with Columnar Zero-Copy Codec
TBs in Milliseconds
The keys are lexicographically
ordered and when a query
comes, it essentially seeks to
the first key for that metric and
stops iterating when either next
metric is found or time bound
is reached.
Decoding is efficient, without
any unnecessary memory copy.
Related Posts
Related engineering blog posts:
● Querying Big Data in Real-Time with Presto & Grab's TalariaDB
● Building Grab’s Experimentation Platform
● Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab
● Orchestrating Chaos using Grab's Experimentation Platform
● A Lean and Scalable Data Pipeline To Capture Large Scale Events and Support
Experimentation Platform
Q&A
https://grab.careers
https://engineering.grab.com

More Related Content

What's hot

Version control
Version controlVersion control
Version controlvisual28
 
Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101TIBCO Jaspersoft
 
Migrating to Java 11
Migrating to Java 11Migrating to Java 11
Migrating to Java 11Arto Santala
 
Cloud Native Java GraalVM 이상과 현실
Cloud Native Java GraalVM 이상과 현실Cloud Native Java GraalVM 이상과 현실
Cloud Native Java GraalVM 이상과 현실Taewan Kim
 
Developing RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBDeveloping RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBNicola Iarocci
 
Interactive Financial Exchange (IFX)
Interactive Financial Exchange (IFX)Interactive Financial Exchange (IFX)
Interactive Financial Exchange (IFX)Pratheeban Rajendran
 
Continuous integration
Continuous integrationContinuous integration
Continuous integrationamscanne
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners HubSpot
 
Behavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using CucumberBehavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using CucumberKMS Technology
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbtFabio Fumarola
 
Version control system and Git
Version control system and GitVersion control system and Git
Version control system and Gitramubonkuri
 
SpringOne Tour: Spring Boot 3 and Beyond
SpringOne Tour: Spring Boot 3 and BeyondSpringOne Tour: Spring Boot 3 and Beyond
SpringOne Tour: Spring Boot 3 and BeyondVMware Tanzu
 
EAP6 performance Tuning
EAP6 performance TuningEAP6 performance Tuning
EAP6 performance TuningPraveen Adupa
 
Spring Framework Petclinic sample application
Spring Framework Petclinic sample applicationSpring Framework Petclinic sample application
Spring Framework Petclinic sample applicationAntoine Rey
 
HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법
HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법
HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법Young D
 

What's hot (20)

BDD with Cucumber
BDD with CucumberBDD with Cucumber
BDD with Cucumber
 
Migrating To GitHub
Migrating To GitHub  Migrating To GitHub
Migrating To GitHub
 
Version control
Version controlVersion control
Version control
 
Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101
 
Migrating to Java 11
Migrating to Java 11Migrating to Java 11
Migrating to Java 11
 
Cloud Native Java GraalVM 이상과 현실
Cloud Native Java GraalVM 이상과 현실Cloud Native Java GraalVM 이상과 현실
Cloud Native Java GraalVM 이상과 현실
 
Spring GraphQL
Spring GraphQLSpring GraphQL
Spring GraphQL
 
Developing RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBDeveloping RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDB
 
Version control
Version controlVersion control
Version control
 
Interactive Financial Exchange (IFX)
Interactive Financial Exchange (IFX)Interactive Financial Exchange (IFX)
Interactive Financial Exchange (IFX)
 
Continuous integration
Continuous integrationContinuous integration
Continuous integration
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners
 
Behavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using CucumberBehavior Driven Development and Automation Testing Using Cucumber
Behavior Driven Development and Automation Testing Using Cucumber
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbt
 
Version control system and Git
Version control system and GitVersion control system and Git
Version control system and Git
 
SpringOne Tour: Spring Boot 3 and Beyond
SpringOne Tour: Spring Boot 3 and BeyondSpringOne Tour: Spring Boot 3 and Beyond
SpringOne Tour: Spring Boot 3 and Beyond
 
Zabbix
ZabbixZabbix
Zabbix
 
EAP6 performance Tuning
EAP6 performance TuningEAP6 performance Tuning
EAP6 performance Tuning
 
Spring Framework Petclinic sample application
Spring Framework Petclinic sample applicationSpring Framework Petclinic sample application
Spring Framework Petclinic sample application
 
HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법
HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법
HAProxy TCP 모드에서 내부 서버로 Source IP 전달 방법
 

Similar to Large scale data capture and experimentation platform at Grab

Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabRoman
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely
 
WSO2 Complex Event Processor - Product Overview
WSO2 Complex Event Processor - Product OverviewWSO2 Complex Event Processor - Product Overview
WSO2 Complex Event Processor - Product OverviewWSO2
 
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesHow To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesLightbend
 
How to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based WorldHow to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based WorldKen Owens
 
Building Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaBuilding Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaGuido Schmutz
 
Shuvam dutta | Performance tester
Shuvam dutta | Performance testerShuvam dutta | Performance tester
Shuvam dutta | Performance testerShuvam Dutta
 
Pune microsoft azure developers 2nd meetup
Pune microsoft azure developers 2nd meetupPune microsoft azure developers 2nd meetup
Pune microsoft azure developers 2nd meetupratneshsinghparihar
 
Bridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure WebinarBridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure Webinarconfluent
 
Technology Overview
Technology OverviewTechnology Overview
Technology OverviewLiran Zelkha
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
 
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di PalmaEvolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di PalmaMongoDB
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowDaniel Zivkovic
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
Implementare e gestire soluzioni per l'Internet of Things (IoT) in modo rapid...
Implementare e gestire soluzioni per l'Internet of Things (IoT) in modo rapid...Implementare e gestire soluzioni per l'Internet of Things (IoT) in modo rapid...
Implementare e gestire soluzioni per l'Internet of Things (IoT) in modo rapid...Amazon Web Services
 
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB
 
Qtp interview questions
Qtp interview questionsQtp interview questions
Qtp interview questionsRamu Palanki
 

Similar to Large scale data capture and experimentation platform at Grab (20)

Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at Grab
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature Delivery
 
Resume
ResumeResume
Resume
 
WSO2 Complex Event Processor - Product Overview
WSO2 Complex Event Processor - Product OverviewWSO2 Complex Event Processor - Product Overview
WSO2 Complex Event Processor - Product Overview
 
Async
AsyncAsync
Async
 
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesHow To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
 
How to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based WorldHow to Monitor Application Performance in a Container-Based World
How to Monitor Application Performance in a Container-Based World
 
Shuvam dutta
Shuvam duttaShuvam dutta
Shuvam dutta
 
Building Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaBuilding Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache Kafka
 
Shuvam dutta | Performance tester
Shuvam dutta | Performance testerShuvam dutta | Performance tester
Shuvam dutta | Performance tester
 
Pune microsoft azure developers 2nd meetup
Pune microsoft azure developers 2nd meetupPune microsoft azure developers 2nd meetup
Pune microsoft azure developers 2nd meetup
 
Bridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure WebinarBridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure Webinar
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di PalmaEvolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Implementare e gestire soluzioni per l'Internet of Things (IoT) in modo rapid...
Implementare e gestire soluzioni per l'Internet of Things (IoT) in modo rapid...Implementare e gestire soluzioni per l'Internet of Things (IoT) in modo rapid...
Implementare e gestire soluzioni per l'Internet of Things (IoT) in modo rapid...
 
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDB
 
Qtp interview questions
Qtp interview questionsQtp interview questions
Qtp interview questions
 

Recently uploaded

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 

Recently uploaded (20)

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 

Large scale data capture and experimentation platform at Grab

  • 1. Large scale data capture and experimentation platform at Grab September 2018
  • 2. Product Insights & Experimentation at Grab Experimentation Platform aims to accelerate innovation, by allowing us to find and focus on key ideas evaluated through rigorous regime of online controlled experiments. It also promotes the culture of data driven decision making through experimentation. Scribe is our in house mobile and backend event collection solution. It helps us track, monitor and analyse any mobile event such as user clicks, booking rides as well as mobile logs, data usage and more.
  • 5. Feature Toggles Grab’s Product Insights & Experimentation platform provides a dynamic feature toggle capability to our engineering, data, product and even business teams. Feature toggles also let teams modify system behavior without changing code. At Grab, we use feature toggles to: 1. Gate feature deployment in production for keeping new features hidden until product and marketing teams are ready to share. 2. Run experiments (A/B tests) by dynamically changing feature toggles for specific users, rides, etc. For example, a feature can only appear to a particular group of people while experiment is running (treatment group). More on Feature Toggles: https://martinfowler.com/articles/feature-toggles.html
  • 6. Our Mission The Grab Experimentation Platform aims to accelerate innovation, by allowing us to find and focus on key ideas evaluated through rigorous regime of online controlled experiments. It also promotes the culture of data driven decision making through experimentation. To get there, we must lower the cost of experimentation significantly.
  • 7. What is Experimentation or Testing Testing is a method of comparing two or more versions of a system against each other to determine which one performs best. For example, in an experiment where two or more variants of a page are shown to users at random, and statistical analysis is used to determine which variation performs better for a given conversion goal. Well known types of testing: A/B, Multivariate, Factorial etc.
  • 8. Long Time Ago… Prior to Experimentation Platform
  • 10. Automated Message Experiment Problem: We wanted to determine the best delay to send an automated message to passengers asking for extra details. ● 70% of bookings had no message ● 10% of bookings got a message after 30s ● 10% of bookings got a message after 60s ● 10% of bookings got a message after 90s The experiment ran on Singapore GC, GC+ and JustGrab for a period of one week. The hypothesis was that sending a message sooner will consistently reduce cancellations.
  • 11. Experiment Definition "domain": "primary", "name": "primary.automatedChatMessage", "variables": [{ "name": "automatedMessageDelay", "strategy": "weightedChoice", "facets": ["bkg"], "choices": [ { "value": 0, "weight": 0.70 }, { "value": 30, "weight": 0.10 }, { "value": 60, "weight": 0.10 }, { "value": 90, "weight": 0.10 }] }], "constraints": [ {"target": "ts", "op": ">", "value": "1400113600"}, {"target": "ts", "op": "<", "value": "1400718400"}, {"target": "svc", "op": "in", "value": "[1,2,3]"} ]
  • 13. Engineering: Using the SDK In Grab Chat Service SDK is used to retrieve a value for the variable based on the configuration. Default value will be used if no experiment is running or constraints do not match. In Booking Service SDK is used to instrument required cancellation metrics. facets := sdk.NewFacets(). City(city). Vehicle(vehicleType). Passenger(passenger). Booking(booking) delay := X.GetVariable(ctx,"automatedMessageDelay", facets).Int64(0) sendMessage(delay) // logic facets := sdk.NewFacets(). City(req.CityID). Vehicle(vehicleType). Passenger(passenger). Booking(booking) X.Track(ctx,"booking.cancelled", reason, facets)
  • 15. How we did Feature Toggles before... Our legacy feature toggling system was essentially a library shipped with all of our Golang services that wrapped calls to a shared Redis. Retrieving values involved network calls and local caching to support our scale, but slowly it started to become a single point of failure for the company as the number of backend microservices grew. // Retrieve a feature flag using our legacy system flag := sitevar.GetFeatureFlag("myFeature", 10, false)
  • 16. Golang SDK Our Go SDK provides a very simple client which will take care of everything. type Client interface { // GetVariable with name of the variable GetVariable(ctx context.Context, name string, facets *Facets) Variable // Track an event with a value and metadata Track(ctx context.Context, eventName string, value float64, facets *Facets) // InjectLocally injects the facets to the local golang context. This will only be // applied to this context and will not be propagated throughout the service stack. InjectLocally(ctx context.Context, values Facets) context.Context // InjectGlobally injects the facets to the global tracing baggage. This allows you // to propagate facet values across services. InjectGlobally(span opentracing.Span, values Facets) }
  • 17. Events and Facets Facets { Passenger int64 // The passenger identifier Driver int64 // The driver identifier Service int64 // The equivalent to a vehicle type Booking string // The booking code Location string // The location (geohash or coordinates) Session string // The session identifier Request string // The request identifier Device string // The device identifier Tag string // The user-defined tag ... } The facets will need to be filled as much as possible. The facets are set using a 3-step override (ie.: implicit global, implicit local, explicit). ● Step 1 - Global Implicit. If some facets are found in open tracing baggage span, set those facet values. ● Step 2 - Local Implicit. If some facets are found in local golang context, set and override those facet values. ● Step 3 - Explicit. The facets provided explicitly in the GetVariable/Track call should override all implicit ones.
  • 18. Architectural Overview Kafka + Presto Backend Services The API allows to configure experiments and includes a planner which schedules experiments into segments based on the facets. The planner ensures that there’s no spillover between experiments. Variable assignments and metrics are sent to ingestor service PlannerDashboard McD Gateway SDK The SDK retrieves the experiment definitions and autonomously makes assignments. UCM S3 Bucket Reliability We designed the platform to be as reliable as possible, making sure that we do not affect our customers (you guys) in any way if we’re down. Our data is stored in UCM S3 bucket, and if McD is down the data simply won’t be written but none of backend services will be affected.
  • 20. McD - Data is Lovin’ It Problem statement To enable the construction of platforms that will provide us valuable business insight and improve the quality of our products and services. What we built McD - a general-purpose event ingestion system to capture useful mobile and backend events such as impressions, user interactions (eg. clicks), performance metrics (eg. battery life).
  • 21. Funnels and Experimentation Funnels Need to capture unified funnels (backend and mobile), especially with proliferation of Grab Apps. Experimentation Need of fully-fledged and unified experimentation platform for our mobile and backend.
  • 22. Logging, Monitoring and Profiles Monitoring and Logging Need of simple monitoring and debuggability tool for our mobile apps (passenger, driver, food…) User Profiles Need to capture comprehensive click streams and user profiles.
  • 23. SDKs for every platform ● Available for Golang, Android and iOS ● Efficient, batched, binary event transmission ● Dynamic configuration of the SDK (per device) ● Built-in persistence & QoS to avoid data loss ● Automatic tracking of common events (e.g.: “app start”) ● Integrated feature flags & A/B testing ● Dart and JavaScript SDKs are planned
  • 24. Reliably Track Anything facets := sdk.NewFacets(). Booking(req.BookingCode). Custom(“key1”, “some value”). Custom(“key2”, “some other value”) X.Track(ctx,"customEvent", 42, facets) Our SDK can be used to track any event. ● The name of the event, which gets prefixed by the service name. ● The numeric value of the event. ● The facets representing contextual information about this event. Users are encouraged to provide as much information as possible, for example, passenger ID, booking code, driver ID. SELECT avg(val) mean, city FROM eventlog WHERE event = 'myservice.customEvent' GROUP BY city ORDER BY 1 DESC LIMIT 100
  • 25. Scaling our Data Pipeline
  • 26. Data Collection Architecture Android SDK McD Analytics Experiments Alerting Service Logging Service User Profile Other Consumers Go SDK iOS SDK JS SDK Dart SDK Kafka Presto S3 + Hive (Batched) TalariaDB (Realtime) Spark
  • 28. Querying Terabytes in Milliseconds Simple and Reliable One of the goals of TalariaDB is simplicity. We achieve that by making sure that system itself is not responsible for things like data transformation and re-partitioning of data but simply ingests and serves data to Presto, nothing else.
  • 29. How Ingestion Happens // Append appends a set of events to the storage. It needs ingestion time and an event name to create // a key which will then be used for retrieval. func (s *Server) Append(payload []byte) error { schema, err := orc.SplitByColumn(payload, column, func(event string, columnChunk []byte) bool { _, err := orc.SplitBySize(columnChunk, 25000, func(chunk []byte) bool { // Create a new block to store from orc buffer blk, _ := block.FromOrc(chunk) // Encode the block buffer, _ := blk.Encode() // Append this block to the store s.store.Append( newKey(event, time.Unix(0, tsi)), buffer, time.Hour) return false }) return err != nil }) return nil }
  • 30. Presto Interaction // PrestoGetSplits returns a batch of splits. func (s *Server) PrestoGetSplits(...) { // Create a new query and validate it ... // We need to generate as many splits as we have nodes in our cluster. Each // split needs to contain the IP address of the node containing that split, // so Presto can reach it and request the data. batch = new(presto.PrestoThriftSplitBatch) for _, m := range s.cluster.Members() { for _, q := range queries { batch.Splits = append(batch.Splits, &presto.PrestoThriftSplit{ SplitId: q.Encode(), Hosts: []*presto.PrestoThriftHostAddress{{Host: m, Port: s.port}}, }) } } return }
  • 31. Combining LSMT with Columnar Zero-Copy Codec TBs in Milliseconds The keys are lexicographically ordered and when a query comes, it essentially seeks to the first key for that metric and stops iterating when either next metric is found or time bound is reached. Decoding is efficient, without any unnecessary memory copy.
  • 32. Related Posts Related engineering blog posts: ● Querying Big Data in Real-Time with Presto & Grab's TalariaDB ● Building Grab’s Experimentation Platform ● Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab ● Orchestrating Chaos using Grab's Experimentation Platform ● A Lean and Scalable Data Pipeline To Capture Large Scale Events and Support Experimentation Platform