Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using machine learning to determine drivers of bounce and conversion

4,050 views

Published on

Recently, Google partnered with SOASTA to train a machine-learning model on a large sample of real-world performance, conversion, and bounce data. In this talk at Velocity 2016 Santa Clara, Pat Meenan and I offered an overview of the resulting model—able to predict the impact of performance work and other site metrics on conversion and bounce rates.

Published in: Technology
  • Be the first to comment

Using machine learning to determine drivers of bounce and conversion

  1. 1. Using machine learning to determine drivers of bounce and conversion Velocity 2016 Santa Clara
  2. 2. Pat Meenan @patmeenan Tammy Everts @tameverts
  3. 3. What we did (and why we did it)
  4. 4. Get the code https://github.com/WPO- Foundation/beacon-ml
  5. 5. Deep learning weights
  6. 6. Random forest Lots of random decision trees
  7. 7. Vectorizing the data • Everything needs to be numeric • Strings converted to several inputs as yes/no (1/0) • i.e. Device manufacturer • “Apple” would be a discrete input • Watch out for input explosion (UA String)
  8. 8. Balancing the data • 3% conversion rate • 97% accurate by always guessing no • Subsample the data for 50/50 mix
  9. 9. Validation data • Train on 80% of the data • Validate on 20% to prevent overfitting
  10. 10. Smoothing the data ML works best on normally distributed data scaler = StandardScaler() x_train = scaler.fit_transform(x_train) x_val = scaler.transform(x_val)
  11. 11. Input/output relationships • SSL highly correlated with conversions • Long sessions highly correlated with not bouncing • Remove correlated features from training
  12. 12. Training deep learning model = Sequential() model.add(...) model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"]) model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)
  13. 13. Training random forest clf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None) clf.fit(x_train, y_train)
  14. 14. Feature importances clf.feature_importances_
  15. 15. What we learned
  16. 16. What’s in our beacon? • Top-level – domain, timestamp, SSL • Session – start time, length (in pages), total load time • User agent – browser, OS, mobile ISP • Geo – country, city, organization, ISP, network speed • Bandwidth • Timers – base, custom, user-defined • Custom metrics • HTTP headers • Etc.
  17. 17. Conversion rate
  18. 18. Conversion rate
  19. 19. Bounce rate
  20. 20. Bounce rate
  21. 21. Finding 1 Number of scripts was a predictor… but not in the way we expected
  22. 22. Number of scripts per page (median)
  23. 23. Finding 2 When entire sessions were more complex, they converted less
  24. 24. Finding 3 Sessions that converted had 38% fewer images than sessions that didn’t
  25. 25. Number of images per page (median)
  26. 26. Finding 4 DOM ready was the greatest indicator of bounce rate
  27. 27. DOM ready (median)
  28. 28. Finding 5 Full load time was the second greatest indicator of bounce rate
  29. 29. timers_loaded (median)
  30. 30. Finding 6 Mobile-related measurements weren’t meaningful predictors of conversions
  31. 31. Conversions
  32. 32. Finding 7 Some conventional metrics were (almost) meaningless, too
  33. 33. Feature Importance (out of 93) DNS lookup 79 Start render 69
  34. 34. Takeaways
  35. 35. 1. YMMV 2. Do this with your own data 3. Gather your RUM data 4. Run the machine learning against it
  36. 36. Thanks!

×