Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Bootstrapping of PySpark Models for Factorial A/B Tests

Download to read offline

A/B testing, i.e., measuring the impact of proposed variants of e.g. e-commerce websites, is fundamental for increasing conversion rates and other key business metrics.

We have developed a solution that makes it possible to run dozens of simultaneous A/B tests, obtain conclusive results sooner, and get more interpretable results than just statistical significance, but rather probabilities of the change having a positive effect, how much revenue is risked, etc.

To compute those metrics, we need to estimate the posterior distributions of the metrics, which are computed using Generalized Linear Models (GLMs). Since we process gigabytes of data, we use a PySpark implementation, which however does not provide standard errors of coefficients. We, therefore, use bootstrapping to estimate the distributions.

In this talk, I’ll describe how we’ve implemented parallelization of an already parallelized GLM computation to be able to scale this computation horizontally over a large cluster in Databricks and describe various tweaks and how they’ve improved the performance.

Bootstrapping of PySpark Models for Factorial A/B Tests

  1. 1. Bootstrapping of (Py)Spark models for factorial A/B tests Ondrej Havlicek Data Scientist
  2. 2. Ondřej Havlíček • Senior data scientist • Background • Computer science, psychology, neuroscience • Focus • Inferential statistics, machine learning, ETL • Spark, Python, R • A/B testing, recommendation, search, ... • e-Commerce, social media, ...
  3. 3. Making data science and machine learning have a real impact on organizations. We are DataSentics PX Personalization for Banking and Insurance DS Innovate AI/ML driven innovation & startups DS TechScale Platforms for AI- intensive applications DS InRetail Improving the customer experience in Retail/FMCG Gold partner & Partner of the Year 2020 Professional partner 4th fastest growing in CE Rising stars award Partners & Awards: Selected Customers: Data science Machine learning specialists Data engineering Cloud specialists 10+ product owners 50+ 30+ Optimize and automate the thousands/millions of small decisions you do everyday Analyse positioning, out-of- stock, pricing and more from a photo. AI choice assistant for e- commerce AI extension for your adform
  4. 4. Agenda 1. Factorial A/B testing 2. Analysis of results 3. Bootstrapping 4. Performance tuning
  5. 5. A/B testing • What • A: Control version • B: Experimental version • Why • The only way to improve KPIs consistently • Evidence > HIPPO • Most of tested ideas actually incorrect • How • Usually isolated tests, in parallel or one after another Wikipedia: a user experience research methodology ... consist of a randomized experiment with two variants, A and B. It includes application of statistical hypothesis testing ... and determining which of the two variants is more effective.
  6. 6. Why factorial A/B testing? • Isolated tests are limiting • Few concurrent experiments or very long durations • Solution: Factorial design • Cross multiple tests orthogonally • Each visitor assigned into a variant in all tests • Allows running dozens of simultaneous tests • Each test runs at all traffic • Faster results https://hbr.org/2017/09/the-surprising-power-of-online-experiments
  7. 7. Analysis of results • What you often get • Version B has a statistically significant effect on CR, p = 0.04 • What we ideally want • Version B increases CR with 92.5% probability • most likely by 1.8 %, 95% CI: [-0.3; 3.9] Results of Test 1
  8. 8. Analysis of results • How: effect size • Big data: Spark GLM, e.g.: • is_conversion ~ T1 + T2 + T1 * T2 • family = "binomial" • link = "logit" • How: uncertainty • Std. errors generally not provided by Spark GLMs • Bootstrapping • A way to estimate distribution of some statistic • “Poor man’s Bayes”, noninformative prior Results of Test 1
  9. 9. Bootstrapping • Iterate many times (hundreds..): • Randomly resample data with replacement • Compute statistics of interest: GLM coefficients df_resample = df.sample(withReplacement=True, fraction=1.0) fitted_model = model.fit(df_resample) stats = extract_stats(fitted_model) • How in Spark? • Bootstrapping: Embarrassingly parallel • Spark parallelizes tasks of model fitting = within 1 iteration • How to scale? • Need to run many instances of model fitting in parallel
  10. 10. Bootstrapping of GLM in Spark in a parallel fashion • Multithreading • Prepare bootstrap iterations into batches: • Each batch contains sequential iterations • Each iteration performs a spark action • Stages have fewer tasks than cores Worker 1 Worker 2 Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6 Iteration 7 Iteration 8 Iteration 9 Iteration 10 Iteration 11 Iteration 12 ... ... ... ... Batch 1 Batch 2 Batch 3 Batch 4 • Submit the batches in parallel using multithreading • Tasks get scheduled in FIFO / FAIR fashion to the executors Iteration 1 Stage 1 Task 1 Task 3 Task 2 Task 4 Core 1 Core 2
  11. 11. Bootstrapping of GLM in Spark • Multithreading Worker 1 Worker 2 Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6 Iteration 7 Iteration 8 Iteration 9 Iteration 10 Iteration 11 Iteration 12 ... ... ... ... Batch 1 Batch 2 Batch 3 Batch 4 ret_vals = [] batch_size = math.floor(n_iterations / n_threads) batches = [{'batchnum': i + 1, 'reps': batch_size} for i in range(n_threads)] with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as executor: future_run = { executor.submit(run_batch, df, model, batch['reps']): batch for batch in batches } for future in concurrent.futures.as_completed(future_run): try: batch_result = future.result() ret_vals.append(batch_result) ...
  12. 12. Performance: don’t waste resources • How many parallel batches (threads)? • n_threads = n_cores / n_tasks * n_tasks_per_core • n_tasks: repartition to ~100 – 200 MB • n_tasks_per_core: empirical question, ca. 2 – 4 • Check Ganglia UI Worker 1 Worker 2 Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6 Iteration 7 Iteration 8 Iteration 9 Iteration 10 Iteration 11 Iteration 12 ... ... ... ... Batch 1 Batch 2 Batch 3 Batch 4
  13. 13. Performance test
  14. 14. Lessons learned • Spark better suited for ML than inferential stats • Bootstrapping helps • You can do parallelization^2 in Spark • Business users understand & like the outputs • Core of factorial AB testing is simple • Many interesting challenges in reality J • Overlaps, interactions, funnels, outliers, zero-inflated metrics, variance reduction, ...
  15. 15. Thank you! Want to know more? Drop me a line ondrej.havlicek@datasentics.com
  • CalebFitzgerald3

    Jun. 25, 2021

A/B testing, i.e., measuring the impact of proposed variants of e.g. e-commerce websites, is fundamental for increasing conversion rates and other key business metrics. We have developed a solution that makes it possible to run dozens of simultaneous A/B tests, obtain conclusive results sooner, and get more interpretable results than just statistical significance, but rather probabilities of the change having a positive effect, how much revenue is risked, etc. To compute those metrics, we need to estimate the posterior distributions of the metrics, which are computed using Generalized Linear Models (GLMs). Since we process gigabytes of data, we use a PySpark implementation, which however does not provide standard errors of coefficients. We, therefore, use bootstrapping to estimate the distributions. In this talk, I’ll describe how we’ve implemented parallelization of an already parallelized GLM computation to be able to scale this computation horizontally over a large cluster in Databricks and describe various tweaks and how they’ve improved the performance.

Views

Total views

90

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

5

Shares

0

Comments

0

Likes

1

×