Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Using machine learning
to determine drivers
of bounce and conversion
(part 2)
Velocity 2016 New York
Pat Meenan
@patmeenan
Tammy Everts
@tameverts
What we did
(and why we did it)
Get the code
https://github.com/WPO-
Foundation/beacon-ml
Deep learning
weights
Random forest
Lots of random decision trees
Vectorizing the data
• Everything needs to be numeric
• Strings converted to several inputs as yes/no
(1/0)
• i.e. Device ...
Balancing the data
• 3% conversion rate
• 97% accurate by always guessing no
• Subsample the data for 50/50 mix
Smoothing the data
ML works best on normally distributed data
scaler = StandardScaler()
x_train = scaler.fit_transform(x_t...
Validation data
• Train on 80% of the data
• Validate on 20% to prevent overfitting
–Training accuracy from validation set
Input/output relationships
• SSL highly correlated with conversions
• Long sessions highly correlated with
not bouncing
• ...
Training random forest
clf = RandomForestClassifier(n_estimators=FOREST_SIZE,
criterion='gini',
max_depth=None,
min_sample...
Feature importances
clf.feature_importances_
Training deep learning
model = Sequential()
model.add(...)
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
...
Understanding deep learning
Brute force FTW
• 93 input “features”
• Train 93 models with 1 input
– Measuring the prediction accuracy of each
• Train 9...
Visualizing the model
• Take trained model (X inputs)
• Vary inputs
–100ms to 20 seconds in 100ms intervals
• Apply the da...
What we learned
What’s in our beacon?
• Top-level – domain, timestamp, SSL
• Session – start time, length (in pages), total load time
• Us...
Finding 1
Maybe everything doesn’t matter
after all
Bounce rate
Finding 2
DOM ready (aka DOM content loaded)
and average session load time were
the best indicators of bounce rate
Up to 89.5% accuracy
Finding 3
When it came to getting high
predictability, conversion data was
tougher than bounce data
81% prediction accuracy was as high as we got
Finding 4
Pages with more scripts were
more less likely to convert
Finding 5
The number of DOM elements matters…
a lot
Finding 6
Mobile-related measurements weren’t
meaningful predictors of conversions
Finding 7
Some conventional metrics
were not as important as we thought
Feature
Importance
(bounce)
Start render 69 ~top 3
Things to watch out for
(other than dangling prepositions)
Yep, checkout pages are SLOW
Takeaways
1. YMMV
2. Do try this at home
3. Gather your RUM data (lots of it)
4. Run the machine learning against it
5. If you get u...
Thanks!
@patmeenan @tameverts
Using machine learning to determine drivers of bounce and conversion (part 2)
Using machine learning to determine drivers of bounce and conversion (part 2)
Using machine learning to determine drivers of bounce and conversion (part 2)
Using machine learning to determine drivers of bounce and conversion (part 2)
Using machine learning to determine drivers of bounce and conversion (part 2)
Using machine learning to determine drivers of bounce and conversion (part 2)
Upcoming SlideShare
Loading in …5
×

Using machine learning to determine drivers of bounce and conversion (part 2)

776 views

Published on

[2016 Velocity NY] There has been a lot of historical work that looks at the relationship between performance and conversions, but most of it has been after the fact or relied on linear models. Google partnered with SOASTA to train a machine-learning model on a large sample of real-world performance, conversion, and bounce data. Patrick Meenan and Tammy Everts offer an overview of the resulting model, able to predict the impact of performance work and other site metrics on conversion and bounce rates. The code used to generate the model is freely available.

Published in: Technology
  • Be the first to comment

Using machine learning to determine drivers of bounce and conversion (part 2)

  1. 1. Using machine learning to determine drivers of bounce and conversion (part 2) Velocity 2016 New York
  2. 2. Pat Meenan @patmeenan Tammy Everts @tameverts
  3. 3. What we did (and why we did it)
  4. 4. Get the code https://github.com/WPO- Foundation/beacon-ml
  5. 5. Deep learning weights
  6. 6. Random forest Lots of random decision trees
  7. 7. Vectorizing the data • Everything needs to be numeric • Strings converted to several inputs as yes/no (1/0) • i.e. Device manufacturer • “Apple” would be a discrete input • Watch out for input explosion (UA String)
  8. 8. Balancing the data • 3% conversion rate • 97% accurate by always guessing no • Subsample the data for 50/50 mix
  9. 9. Smoothing the data ML works best on normally distributed data scaler = StandardScaler() x_train = scaler.fit_transform(x_train) x_val = scaler.transform(x_val)
  10. 10. Validation data • Train on 80% of the data • Validate on 20% to prevent overfitting –Training accuracy from validation set
  11. 11. Input/output relationships • SSL highly correlated with conversions • Long sessions highly correlated with not bouncing • Remove correlated features from training
  12. 12. Training random forest clf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None) clf.fit(x_train, y_train)
  13. 13. Feature importances clf.feature_importances_
  14. 14. Training deep learning model = Sequential() model.add(...) model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"]) model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)
  15. 15. Understanding deep learning
  16. 16. Brute force FTW • 93 input “features” • Train 93 models with 1 input – Measuring the prediction accuracy of each • Train 92 models with 2 inputs – Top feature from first round – Measure combined prediction accuracy • Lather, rinse, repeat…
  17. 17. Visualizing the model • Take trained model (X inputs) • Vary inputs –100ms to 20 seconds in 100ms intervals • Apply the data smoothing from training set • model.predict_proba
  18. 18. What we learned
  19. 19. What’s in our beacon? • Top-level – domain, timestamp, SSL • Session – start time, length (in pages), total load time • User agent – browser, OS, mobile ISP • Geo – country, city, organization, ISP, network speed • Bandwidth • Timers – base, custom, user-defined • Custom metrics • HTTP headers https://docs.soasta.com/whatsinbeacon/
  20. 20. Finding 1 Maybe everything doesn’t matter after all
  21. 21. Bounce rate
  22. 22. Finding 2 DOM ready (aka DOM content loaded) and average session load time were the best indicators of bounce rate
  23. 23. Up to 89.5% accuracy
  24. 24. Finding 3 When it came to getting high predictability, conversion data was tougher than bounce data
  25. 25. 81% prediction accuracy was as high as we got
  26. 26. Finding 4 Pages with more scripts were more less likely to convert
  27. 27. Finding 5 The number of DOM elements matters… a lot
  28. 28. Finding 6 Mobile-related measurements weren’t meaningful predictors of conversions
  29. 29. Finding 7 Some conventional metrics were not as important as we thought
  30. 30. Feature Importance (bounce) Start render 69 ~top 3
  31. 31. Things to watch out for (other than dangling prepositions)
  32. 32. Yep, checkout pages are SLOW
  33. 33. Takeaways
  34. 34. 1. YMMV 2. Do try this at home 3. Gather your RUM data (lots of it) 4. Run the machine learning against it 5. If you get unexpected results, keep digging
  35. 35. Thanks! @patmeenan @tameverts

×