Successfully reported this slideshow.

Using machine learning to determine drivers of bounce and conversion

4

Share

Upcoming SlideShare
Container Orchestration Wars
Container Orchestration Wars
Loading in …3
×
1 of 36
1 of 36

Using machine learning to determine drivers of bounce and conversion

4

Share

Download to read offline

Recently, Google partnered with SOASTA to train a machine-learning model on a large sample of real-world performance, conversion, and bounce data. In this talk at Velocity 2016 Santa Clara, Pat Meenan and I offered an overview of the resulting model—able to predict the impact of performance work and other site metrics on conversion and bounce rates.

Recently, Google partnered with SOASTA to train a machine-learning model on a large sample of real-world performance, conversion, and bounce data. In this talk at Velocity 2016 Santa Clara, Pat Meenan and I offered an overview of the resulting model—able to predict the impact of performance work and other site metrics on conversion and bounce rates.

More Related Content

More from Tammy Everts

Related Books

Free with a 14 day trial from Scribd

See all

Using machine learning to determine drivers of bounce and conversion

  1. 1. Using machine learning to determine drivers of bounce and conversion Velocity 2016 Santa Clara
  2. 2. Pat Meenan @patmeenan Tammy Everts @tameverts
  3. 3. What we did (and why we did it)
  4. 4. Get the code https://github.com/WPO- Foundation/beacon-ml
  5. 5. Deep learning weights
  6. 6. Random forest Lots of random decision trees
  7. 7. Vectorizing the data • Everything needs to be numeric • Strings converted to several inputs as yes/no (1/0) • i.e. Device manufacturer • “Apple” would be a discrete input • Watch out for input explosion (UA String)
  8. 8. Balancing the data • 3% conversion rate • 97% accurate by always guessing no • Subsample the data for 50/50 mix
  9. 9. Validation data • Train on 80% of the data • Validate on 20% to prevent overfitting
  10. 10. Smoothing the data ML works best on normally distributed data scaler = StandardScaler() x_train = scaler.fit_transform(x_train) x_val = scaler.transform(x_val)
  11. 11. Input/output relationships • SSL highly correlated with conversions • Long sessions highly correlated with not bouncing • Remove correlated features from training
  12. 12. Training deep learning model = Sequential() model.add(...) model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"]) model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)
  13. 13. Training random forest clf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None) clf.fit(x_train, y_train)
  14. 14. Feature importances clf.feature_importances_
  15. 15. What we learned
  16. 16. What’s in our beacon? • Top-level – domain, timestamp, SSL • Session – start time, length (in pages), total load time • User agent – browser, OS, mobile ISP • Geo – country, city, organization, ISP, network speed • Bandwidth • Timers – base, custom, user-defined • Custom metrics • HTTP headers • Etc.
  17. 17. Conversion rate
  18. 18. Conversion rate
  19. 19. Bounce rate
  20. 20. Bounce rate
  21. 21. Finding 1 Number of scripts was a predictor… but not in the way we expected
  22. 22. Number of scripts per page (median)
  23. 23. Finding 2 When entire sessions were more complex, they converted less
  24. 24. Finding 3 Sessions that converted had 38% fewer images than sessions that didn’t
  25. 25. Number of images per page (median)
  26. 26. Finding 4 DOM ready was the greatest indicator of bounce rate
  27. 27. DOM ready (median)
  28. 28. Finding 5 Full load time was the second greatest indicator of bounce rate
  29. 29. timers_loaded (median)
  30. 30. Finding 6 Mobile-related measurements weren’t meaningful predictors of conversions
  31. 31. Conversions
  32. 32. Finding 7 Some conventional metrics were (almost) meaningless, too
  33. 33. Feature Importance (out of 93) DNS lookup 79 Start render 69
  34. 34. Takeaways
  35. 35. 1. YMMV 2. Do this with your own data 3. Gather your RUM data 4. Run the machine learning against it
  36. 36. Thanks!

Editor's Notes

  • mPulse is built above the boomerang JavaScript library that collects web performance data from a user’s web browser and sends that back to the mPulse servers on a beacon. The simple definition of a beacon is that it is an HTTP(S) request with a ton of data included either as HTTP headers or as part of the Request’s Query String.
  • Sessions that converted contained 48% more scripts (including third-party scripts, such as ads, analytics beacons, and social buttons) than sessions that didn’t.
  • Sessions that converted contained 48% more scripts (including third-party scripts, such as ads, analytics beacons, and social buttons) than sessions that didn’t.
    Why? One likely answer is that checkout pages are likely to be more scripted than other pages in the conversion funnel.
    Takeaway: Just because shoppers are converting on pages with lots of scripts doesn’t mean those pages are delivering the best possible user experience. More scripts -- especially third-party scripts, which are hosted externally -- can wreak havoc on page loads. Site owners should be aware of the performance impact of all their scripts.
  • While the previous finding tells us that more scripts correlates to increased conversions, when you add in more images and other elements that make pages more complex, those sessions converted less.

    Why? The culprit might be the cumulative performance impact of all those page elements. The more elements on a page, the greater the page’s weight (total number of kilobytes) and complexity.

    Takeaway: A typical web page today contains a hundred or so assets hosted on dozens of different servers. Many of these page assets are unoptimized, unmeasured, unmonitored — and therefore unpredictable. This unpredictability makes page loads volatile. Site owners can tackle this problem by setting performance budgets for their pages and culling unnecessary page elements. They should also audit and monitor all the third-party scripts on their sites.
  • When we talk about images, we’re referring to every single graphic element on a page -- from favicons to logos to product images. On a retail site, those images can quickly add up. On a typical retail page, images can easily comprise up to two thirds (in other words, hundreds of kilobytes) of a page’s total weight. The result: cumulatively slow page loads throughout a session.
  • “DOM ready” refers to the amount of time it takes for the page’s HTML to be received and parsed by the browser. Actual page elements, such as images, haven’t appeared yet. (It’s kind of like getting ready to cook. Your cookbook is open, your recipe is in front of you, and your ingredients are on standby.)
  • “DOM ready” refers to the amount of time it takes for the page’s HTML to be received and parsed by the browser. Actual page elements, such as images, haven’t appeared yet. (It’s kind of like getting ready to cook. Your cookbook is open, your recipe is in front of you, and your ingredients are on standby.)

    Our research found that bounced sessions had DOM ready times that were 55% slower than non-bounced sessions. We also found that the bounce rate was higher when the first page in a user session was slow.

    Takeaway: External blocking scripts (such as third-party ads, analytics, and social widgets) and styles (such as externally hosted CSS and fonts) have the greatest impact on DOM ready times. Site owners should measure the impact that these external elements have on their pages and conduct ongoing monitoring to ensure that scripts and styles are available and fast. Whenever possible, scripts should be served asynchronously (in parallel with the rest of the page) or in a non-blocking fashion.
  • Bounced sessions had median full page load times that were 53% slower than non-bounced sessions.

    Within the performance community, there has been a growing tendency to regard load time as a meaningless metric.
    With such a strong correlation between it and bounce rate, dismissing load time may be premature.
  • Shoppers who used low-bandwidth or mobile connections didn’t convert significantly less than shoppers on faster connections. This is interesting because it confirms that we’ve entered a “mobile everywhere” phase.

    Takeaway: Internet users don’t behave especially differently depending on what device they’re using. Site owners need to ensure they’re delivering consistent user experiences across device types.
  • DNS lookup is when the browser looks up the domain of the object being requested by the browser. Think of this as asking the “phone book” of the internet to find someone’s phone number using their first and last name.

    Start render tells you when content begins to display in the user’s browser. But it’s important to note that start render time doesn’t indicate whether that initial content is useful or important, or simply ads and widgets.

    This research found that neither of these metrics correlated to a significant impact on conversions. This finding is especially interesting as it pertains to start render time. Up until now, many user experience proponents who participate in the web performance community have placed some value on start render time. This makes sense, because -- on paper, anyway -- start render would seem to reflect the user’s perception of when a page begins to load. But this research suggests that start render isn’t an accurate measure of the user experience -- at least as it pertains to triggering more conversions.

    Takeaway: There’s an interesting observation to be made here about how performance measurement is driven by what we’re able to measure versus what we need to measure. Performance measurement tools can gather massive amounts of data about a wide swath of metrics, but are all those metrics meaningful? To what extent do we, as people who care about measuring the user experience, let the tail wag the dog?
  • ×