Experimentation Platform

•Download as PPTX, PDF•

5 likes•1,059 views

slashn

Technology

Experimentation Platform

Ashok Banerjee

Motivation
• Innovation iteration -> correct evaluation
– Blindingly obvious
– Clear but deductive reasoning (involved)
– A/B Testing
• Segment based optimization
• Multi dimensional impact and stochastic

• Incremental Radicalism

• Disclaimer: Some parts of this platform are in existence but
more will come to life and we will solicit more inputs and
involvement

Experimentation Platform
Components
• Bucketing (A or B)
– Web Bucketing on User Cohorts
– Supply Chain Bucketing on Order Basket or
Warehouse (e.g. Packing)
• Control variables – what is being tested
– Price
– Gift Wrap
– Position on Web Page
– Recommendation Positioning

Experimentation Platform
• Result variables (often studied for a week to a
month)
– Repeat Visit
– Repeat Buy
– Repeat Engagement
– Spend
• Result interpretation
– Z-test
– T-test
– Chi Squared

Bucketing (Web)
• Bucketing: Declarative Common Cohorts
– User (sync): Cohorts are complex queries often run
async. If sufficiently complex e.g.
• Users who bought Books with increasing spend but did not
buy electronics
• User Activity Store searches, clicks, views etc.
• Cached and hit at web scale

• Cohorts can be selected declaratively e.g.
– Category Purchased
– Search Ranking
– Email Marketing
– Spend slope

Bucketing (Fulfilment)
– Order Fulfilment (async): Rules
• RETE evaluation of rules: Predicates evaluate minimal
number of times 1000 rules
• Async process => on the fly evaluation
– Interaction Plots need to be looked into for
multiple experiments
– Exclusive buckets on control variables
• e.g. 2 experiments cannot both decide on gift wrap
• Price cannot be influenced by 2 different experiments

Control Variables
• Control Variables: Configuration Based delta
– Price elasticity
– Position on page
– Recommendation
– Gift Wrap
– Business Flow (e.g. in Mumbai a new Packing
technique) => BPM

Execution
• Execution
– Client Library to evaluate
– if (experiment45) { ….. }
– Configuration based deviators
• Better still evaluate experiment deviator e.g.
• SLA = SLA - experimentDelta (experimenting with early
delivery)
– experimentDelta comes from config service

Multi-armed bandit to apply the changes?
90% Greedy and 10% random

Binomial at Large # -> Normal
• Binomial (Most human decisions) -> Normal
(p + q)n = Sum(nCr prq(n-r))
Yr = nCr prq(n-r)
(Yr+1 – Yr)/Yr [Large n]

dy = -x2
Y (std dev)2

Interaction Plot
– From Peltier Stats on OKCupid Data
– Smile no interaction with eye contact
– Flirty face significant interaction

Beware of interaction
Between experiments

Result Interpretation

• Result Interpretation
– T-test: Samples less than 30 [Fatter tail]
– Z-test: (x-m)/(std dev) = 1.95 [Normal]
– Paired t-test: Return/Refund-> Gift -> Repeat Buys
– Chi Squared
– F test
• Do we lose anything by repeated testing until test
convergence?

Development Paradigm
– Simplify during experiment
– Scalability: Build experiment to work out of memory
– Availability: Fail-Open
– Sharding and Database: Not big scale
– Performance: In Memory for a few
– Figure out control variables

Upper bound of expected results -> 90% of experiments
may not need to be scaled out

Decision Paradigm
– No code needed to test an idea
– Experiments run in parallel
– Need to test for interaction and main effects

Development Paradigm
– Scalability: Build experiment to work out of
memory
– Availability: Fail-Open
– Sharding and Database: Not big scale
– Performance: In Memory for a few nodes

Upper bound of expected results -> 90% of
experiments may not need to be scaled out

Summary
• A/B Testing Platform becomes key beyond
trivially obvious
• Configuration based A/B tests (trivial to check
on curiousity)
• Result interpretation is non trivial and varies

Similar to Experimentation Platform

Apache con big data 2015 - Data Science from the trenchesVinay Shukla

Software testing performance testingGaneshKumarKanthiah

Test in action – week 1Yi-Huan Chan

Agile Testingdanielbilling

Finding Bugs Faster with Assertion Based Verification (ABV)DVClub

Class9_SW_Testing_Strategies.pdfFarjanaParvin5

(ARC310) Solving Amazon's Catalog Contention With Amazon KinesisAmazon Web Services

Context-Driven Performance TestingAlexander Podelko

Dealing with the Three Horrible Problems in VerificationDVClub

Parallel run selenium tests in a good wayCOMAQA.BY

Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...HostedbyConfluent

Multiple Dimensions of Load TestingAlexander Podelko

Modern agile & ESP proposal for TransformationRavi Tadwalkar

Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...Lionel Briand

Risk based testing and random testingHimanshu

Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion

Data mining guest lecture (CSE6331 University of Texas, Arlington) 2004Alan Walker

Large scale Click-streaming and tranaction log miningitstuff

IEEE.BigData.Tutorial.2.slidesNish Parikh

Andrew rusling 21 experiments to increase velocityScrum Australia Pty Ltd

Similar to Experimentation Platform (20)

Apache con big data 2015 - Data Science from the trenches

Software testing performance testing

Test in action – week 1

Agile Testing

Finding Bugs Faster with Assertion Based Verification (ABV)

Class9_SW_Testing_Strategies.pdf

(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis

Context-Driven Performance Testing

Dealing with the Three Horrible Problems in Verification

Parallel run selenium tests in a good way

Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...

Multiple Dimensions of Load Testing

Modern agile & ESP proposal for Transformation

Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...

Risk based testing and random testing

Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...

Data mining guest lecture (CSE6331 University of Texas, Arlington) 2004

Large scale Click-streaming and tranaction log mining

IEEE.BigData.Tutorial.2.slides

Andrew rusling 21 experiments to increase velocity

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

costume and set research powerpoint presentationphoebematthew05

Install Stable Diffusion in windows machinePadma Pradeep

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

APIForce Zurich 5 April Automation LPDGMarianaLemus7

AI as an Interface for Commercial BuildingsMemoori

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024

SQL Database Design For Developers at php[tek] 2024

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

costume and set research powerpoint presentation

Install Stable Diffusion in windows machine

DMCC Future of Trade Web3 - Special Edition

Unraveling Multimodality with Large Language Models.pdf

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

APIForce Zurich 5 April Automation LPDG

AI as an Interface for Commercial Buildings

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

Unblocking The Main Thread Solving ANRs and Frozen Frames

Connect Wave/ connectwave Pitch Deck Presentation

Unleash Your Potential - Namagunga Girls Coding Club

Pigging Solutions Piggable Sweeping Elbows

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Dev Dives: Streamline document processing with UiPath Studio Web

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

WordPress Websites for Engineers: Elevate Your Brand

Experimentation Platform

1. Experimentation Platform Ashok Banerjee

2. Motivation • Innovation iteration -> correct evaluation – Blindingly obvious – Clear but deductive reasoning (involved) – A/B Testing • Segment based optimization • Multi dimensional impact and stochastic • Incremental Radicalism • Disclaimer: Some parts of this platform are in existence but more will come to life and we will solicit more inputs and involvement

3. Experimentation Platform Components • Bucketing (A or B) – Web Bucketing on User Cohorts – Supply Chain Bucketing on Order Basket or Warehouse (e.g. Packing) • Control variables – what is being tested – Price – Gift Wrap – Position on Web Page – Recommendation Positioning

4. Experimentation Platform • Result variables (often studied for a week to a month) – Repeat Visit – Repeat Buy – Repeat Engagement – Spend • Result interpretation – Z-test – T-test – Chi Squared

5. Bucketing (Web) • Bucketing: Declarative Common Cohorts – User (sync): Cohorts are complex queries often run async. If sufficiently complex e.g. • Users who bought Books with increasing spend but did not buy electronics • User Activity Store searches, clicks, views etc. • Cached and hit at web scale • Cohorts can be selected declaratively e.g. – Category Purchased – Search Ranking – Email Marketing – Spend slope

6. Bucketing (Fulfilment) – Order Fulfilment (async): Rules • RETE evaluation of rules: Predicates evaluate minimal number of times 1000 rules • Async process => on the fly evaluation – Interaction Plots need to be looked into for multiple experiments – Exclusive buckets on control variables • e.g. 2 experiments cannot both decide on gift wrap • Price cannot be influenced by 2 different experiments

7. Control Variables • Control Variables: Configuration Based delta – Price elasticity – Position on page – Recommendation – Gift Wrap – Business Flow (e.g. in Mumbai a new Packing technique) => BPM

8. Execution • Execution – Client Library to evaluate – if (experiment45) { ….. } – Configuration based deviators • Better still evaluate experiment deviator e.g. • SLA = SLA - experimentDelta (experimenting with early delivery) – experimentDelta comes from config service Multi-armed bandit to apply the changes? 90% Greedy and 10% random

9. Binomial at Large # -> Normal • Binomial (Most human decisions) -> Normal (p + q)n = Sum(nCr prq(n-r)) Yr = nCr prq(n-r) (Yr+1 – Yr)/Yr [Large n] dy = -x2 Y (std dev)2

10. Interaction Plot – From Peltier Stats on OKCupid Data – Smile no interaction with eye contact – Flirty face significant interaction Beware of interaction Between experiments

11. Result Interpretation • Result Interpretation – T-test: Samples less than 30 [Fatter tail] – Z-test: (x-m)/(std dev) = 1.95 [Normal] – Paired t-test: Return/Refund-> Gift -> Repeat Buys – Chi Squared – F test • Do we lose anything by repeated testing until test convergence?

12. Development Paradigm – Simplify during experiment – Scalability: Build experiment to work out of memory – Availability: Fail-Open – Sharding and Database: Not big scale – Performance: In Memory for a few – Figure out control variables Upper bound of expected results -> 90% of experiments may not need to be scaled out

13. Decision Paradigm – No code needed to test an idea – Experiments run in parallel – Need to test for interaction and main effects

14. Development Paradigm – Scalability: Build experiment to work out of memory – Availability: Fail-Open – Sharding and Database: Not big scale – Performance: In Memory for a few nodes Upper bound of expected results -> 90% of experiments may not need to be scaled out

15. Summary • A/B Testing Platform becomes key beyond trivially obvious • Configuration based A/B tests (trivial to check on curiousity) • Result interpretation is non trivial and varies

Editor's Notes

Don’t be brave, you will be wrong. Your predecessors were bright too a good breakfast enhanced your mood more than your IQ. Experiment without fear. Free to experiment but not free to put things into production until sure that it will help.Try every experimentEnable everyone in the company to experiment

Experimentation Platform

Recommended

Recommended

More Related Content

Similar to Experimentation Platform

Similar to Experimentation Platform (20)

Recently uploaded

Recently uploaded (20)

Experimentation Platform

Editor's Notes