We've updated our privacy policy. Click here to review the details. Tap here to review the details.

Successfully reported this slideshow.

Your SlideShare is downloading.
×

Activate your 30 day free trial to unlock unlimited reading.

Activate your 30 day free trial to continue reading.

Top clipped slide

1 of 15
Ad

Download to read offline

A/B testing, i.e., measuring the impact of proposed variants of e.g. e-commerce websites, is fundamental for increasing conversion rates and other key business metrics.

We have developed a solution that makes it possible to run dozens of simultaneous A/B tests, obtain conclusive results sooner, and get more interpretable results than just statistical significance, but rather probabilities of the change having a positive effect, how much revenue is risked, etc.

To compute those metrics, we need to estimate the posterior distributions of the metrics, which are computed using Generalized Linear Models (GLMs). Since we process gigabytes of data, we use a PySpark implementation, which however does not provide standard errors of coefficients. We, therefore, use bootstrapping to estimate the distributions.

In this talk, I’ll describe how we’ve implemented parallelization of an already parallelized GLM computation to be able to scale this computation horizontally over a large cluster in Databricks and describe various tweaks and how they’ve improved the performance.

We have developed a solution that makes it possible to run dozens of simultaneous A/B tests, obtain conclusive results sooner, and get more interpretable results than just statistical significance, but rather probabilities of the change having a positive effect, how much revenue is risked, etc.

To compute those metrics, we need to estimate the posterior distributions of the metrics, which are computed using Generalized Linear Models (GLMs). Since we process gigabytes of data, we use a PySpark implementation, which however does not provide standard errors of coefficients. We, therefore, use bootstrapping to estimate the distributions.

In this talk, I’ll describe how we’ve implemented parallelization of an already parallelized GLM computation to be able to scale this computation horizontally over a large cluster in Databricks and describe various tweaks and how they’ve improved the performance.

- 1. Bootstrapping of (Py)Spark models for factorial A/B tests Ondrej Havlicek Data Scientist
- 2. Ondřej Havlíček • Senior data scientist • Background • Computer science, psychology, neuroscience • Focus • Inferential statistics, machine learning, ETL • Spark, Python, R • A/B testing, recommendation, search, ... • e-Commerce, social media, ...
- 3. Making data science and machine learning have a real impact on organizations. We are DataSentics PX Personalization for Banking and Insurance DS Innovate AI/ML driven innovation & startups DS TechScale Platforms for AI- intensive applications DS InRetail Improving the customer experience in Retail/FMCG Gold partner & Partner of the Year 2020 Professional partner 4th fastest growing in CE Rising stars award Partners & Awards: Selected Customers: Data science Machine learning specialists Data engineering Cloud specialists 10+ product owners 50+ 30+ Optimize and automate the thousands/millions of small decisions you do everyday Analyse positioning, out-of- stock, pricing and more from a photo. AI choice assistant for e- commerce AI extension for your adform
- 4. Agenda 1. Factorial A/B testing 2. Analysis of results 3. Bootstrapping 4. Performance tuning
- 5. A/B testing • What • A: Control version • B: Experimental version • Why • The only way to improve KPIs consistently • Evidence > HIPPO • Most of tested ideas actually incorrect • How • Usually isolated tests, in parallel or one after another Wikipedia: a user experience research methodology ... consist of a randomized experiment with two variants, A and B. It includes application of statistical hypothesis testing ... and determining which of the two variants is more effective.
- 6. Why factorial A/B testing? • Isolated tests are limiting • Few concurrent experiments or very long durations • Solution: Factorial design • Cross multiple tests orthogonally • Each visitor assigned into a variant in all tests • Allows running dozens of simultaneous tests • Each test runs at all traffic • Faster results https://hbr.org/2017/09/the-surprising-power-of-online-experiments
- 7. Analysis of results • What you often get • Version B has a statistically significant effect on CR, p = 0.04 • What we ideally want • Version B increases CR with 92.5% probability • most likely by 1.8 %, 95% CI: [-0.3; 3.9] Results of Test 1
- 8. Analysis of results • How: effect size • Big data: Spark GLM, e.g.: • is_conversion ~ T1 + T2 + T1 * T2 • family = "binomial" • link = "logit" • How: uncertainty • Std. errors generally not provided by Spark GLMs • Bootstrapping • A way to estimate distribution of some statistic • “Poor man’s Bayes”, noninformative prior Results of Test 1
- 9. Bootstrapping • Iterate many times (hundreds..): • Randomly resample data with replacement • Compute statistics of interest: GLM coefficients df_resample = df.sample(withReplacement=True, fraction=1.0) fitted_model = model.fit(df_resample) stats = extract_stats(fitted_model) • How in Spark? • Bootstrapping: Embarrassingly parallel • Spark parallelizes tasks of model fitting = within 1 iteration • How to scale? • Need to run many instances of model fitting in parallel
- 10. Bootstrapping of GLM in Spark in a parallel fashion • Multithreading • Prepare bootstrap iterations into batches: • Each batch contains sequential iterations • Each iteration performs a spark action • Stages have fewer tasks than cores Worker 1 Worker 2 Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6 Iteration 7 Iteration 8 Iteration 9 Iteration 10 Iteration 11 Iteration 12 ... ... ... ... Batch 1 Batch 2 Batch 3 Batch 4 • Submit the batches in parallel using multithreading • Tasks get scheduled in FIFO / FAIR fashion to the executors Iteration 1 Stage 1 Task 1 Task 3 Task 2 Task 4 Core 1 Core 2
- 11. Bootstrapping of GLM in Spark • Multithreading Worker 1 Worker 2 Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6 Iteration 7 Iteration 8 Iteration 9 Iteration 10 Iteration 11 Iteration 12 ... ... ... ... Batch 1 Batch 2 Batch 3 Batch 4 ret_vals = [] batch_size = math.floor(n_iterations / n_threads) batches = [{'batchnum': i + 1, 'reps': batch_size} for i in range(n_threads)] with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as executor: future_run = { executor.submit(run_batch, df, model, batch['reps']): batch for batch in batches } for future in concurrent.futures.as_completed(future_run): try: batch_result = future.result() ret_vals.append(batch_result) ...
- 12. Performance: don’t waste resources • How many parallel batches (threads)? • n_threads = n_cores / n_tasks * n_tasks_per_core • n_tasks: repartition to ~100 – 200 MB • n_tasks_per_core: empirical question, ca. 2 – 4 • Check Ganglia UI Worker 1 Worker 2 Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6 Iteration 7 Iteration 8 Iteration 9 Iteration 10 Iteration 11 Iteration 12 ... ... ... ... Batch 1 Batch 2 Batch 3 Batch 4
- 13. Performance test
- 14. Lessons learned • Spark better suited for ML than inferential stats • Bootstrapping helps • You can do parallelization^2 in Spark • Business users understand & like the outputs • Core of factorial AB testing is simple • Many interesting challenges in reality J • Overlaps, interactions, funnels, outliers, zero-inflated metrics, variance reduction, ...
- 15. Thank you! Want to know more? Drop me a line ondrej.havlicek@datasentics.com

No public clipboards found for this slide

You just clipped your first slide!

Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips.Hate ads?

Enjoy access to millions of presentations, documents, ebooks, audiobooks, magazines, and more **ad-free.**

The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.

Cancel anytime.Total views

283

On SlideShare

0

From Embeds

0

Number of Embeds

2

Unlimited Reading

Learn faster and smarter from top experts

Unlimited Downloading

Download to take your learnings offline and on the go

You also get free access to Scribd!

Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.

Read and listen offline with any device.

Free access to premium services like Tuneln, Mubi and more.

We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.

You can read the details below. By accepting, you agree to the updated privacy policy.

Thank you!

We've encountered a problem, please try again.