Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning RUM - Velocity 2016

676 views

Published on

Presentation from Velocity 2016 on using Machine Learning to determine the metrics that drive bounce and conversions

Published in: Technology
  • Be the first to comment

Machine Learning RUM - Velocity 2016

  1. 1. Using machine learning to determine drivers of bounce and conversion 2016 Velocity Santa Clara
  2. 2. Pat Meenan @patmeenan Tammy Everts @tameverts
  3. 3. What we did
  4. 4. Get the code https://github.com/WPO-Foundation/beacon-ml
  5. 5. Deep Learning Weights
  6. 6. Random Forest Lots of random decision trees
  7. 7. Vectorizing the data • Everything needs to be numeric • Strings converted to several inputs as yes/no (1/0) • i.e. Device Manufacturer – “Apple” would be a discrete input • Watch out for input explosion (UA String)
  8. 8. Balancing the data • 3% Conversion Rate • 97% Accurate by always guessing no • Subsample the data for 50/50 mix
  9. 9. Validation Data • Train on 80% of the data • Validate on 20% to prevent overfitting
  10. 10. Smoothing the data • ML works best on normally distributed data scaler = StandardScaler() x_train = scaler.fit_transform(x_train) x_val = scaler.transform(x_val)
  11. 11. Input/Output Relationships • SSL highly correlated with Conversions • Long sessions highly correlated with not bouncing • Remove correlated features from training
  12. 12. Training Deep Learning model = Sequential() model.add(...) model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"]) model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)
  13. 13. Training Random Forest clf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None) clf.fit(x_train, y_train)
  14. 14. Feature Importances clf.feature_importances_
  15. 15. What we learned
  16. 16. Takeaways
  17. 17. Thanks!

×