Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning in action

1,916 views

Published on

Advanced users segmentation using Google Analytics, Google Tag Manager and R.

Published in: Technology
  • Be the first to comment

Machine Learning in action

  1. 1. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Michal Brys Data Scientist @ Allegro Measure Camp | London, 05th March 2016 Machine Learning in action Advanced users segmentation using Google Analytics, Google Tag Manager and R.
  2. 2. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Michal Brys Data Scientist @ Allegro Specialized also in: + Google Analytics + Google Tag Manager michalbrys.com about.me/michal.brys
  3. 3. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Case study Website with online courses on 3 levels: Beginner Intermediate Advanced
  4. 4. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Case study Problem: I want to categorize users into groups of interests. Note: ● Users are not registered. No PII stored in Google Analytics
  5. 5. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Before
  6. 6. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 After Beginner AdvancedIntermediate Beginner
  7. 7. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 What I will use? Google Tag Manager + dataLayer Google Analytics R Studio
  8. 8. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (1) Google Tag Manager ● dataLayer - Content Grouping in Google Analytics ● dataLayer - browser fingerprint to identify users
  9. 9. Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (1) <script> dataLayer = [{ 'level': 'advanced' }]; </script> Export from your CMS.
  10. 10. Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (1) <script> dataLayer = [{ 'level': 'advanced', 'fingerprint' : '123456' }]; </script> https://goo.gl/1X84fY
  11. 11. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (2) Google Analytics ● Content Grouping ○ Training level ● Custom Dimension ○ Fingerprint
  12. 12. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (2) Google Analytics
  13. 13. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (2)
  14. 14. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (3) Send data from dataLayer to Google Analytics: Create new variable in GTM - dataLayer variable
  15. 15. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (3) Send data from dataLayer to Google Analytics:
  16. 16. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Data ready in Google Analytics Pageviews in content groups
  17. 17. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Why R? + the fastest growing statistic analysis language, + free, + many libraries/packages, + a lot of educational materials, + has big community support, + ready to different platforms - not efficient for *TB* of data - external libraries with errors - not out-of-the-box solution with graphic interface
  18. 18. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Export data to R Set up connection to Google Analytics API: https://goo.gl/2CGLsc Download and install R Studio: https://goo.gl/6xJwSy
  19. 19. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Google Developers Console
  20. 20. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 RGoogleAnalytics: Set up install.packages("RGoogleAnalytics") require(RGoogleAnalytics) client.id <- "xxxxxxxxxxxx.apps.googleusercontent.com" client.secret <- "zzzzzzzzzzzz" token <- Auth(client.id,client.secret) # Save the token object for future sessions save(token,file="./token_file")
  21. 21. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Authorize access to Google Analytics
  22. 22. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Hello world! # Get the Sessions by month in 2014 query.list <- Init(start.date = "2014-01-01", end.date = "2014-12-31", dimensions = "ga:month", metrics = "ga:sessions", table.id = "ga:000000") # Create the Query Builder object ga.query <- QueryBuilder(query.list) # Download data from GA and save it in a data-frame ga.data <- GetReportData(ga.query, token) Dim & metrics explorer: https://goo.gl/QvTbKn
  23. 23. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Hello world > head(ga.data) month sessions 1 01 906 2 02 1643 3 03 1755 4 04 963 5 05 407 6 06 490
  24. 24. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Users segmentation Unsupervised learning: k-Means k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean https://en.wikipedia.org/wiki/K-means_clustering
  25. 25. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Goal Segment your user in groups of interest.
  26. 26. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Users segmentation # Get the Unique Pageviews by User in Content Group query.list <- Init(start.date = "2014-01-01", end.date = "2014-12-31", dimensions = "ga:dimension1, ga:contentGroup1", metrics = "ga:contentGroupUniqueViews1", table.id = "ga:000000") # Create the Query Builder object ga.query <- QueryBuilder(query.list) # Extract the data and store it in a data-frame ga.data <- GetReportData(ga.query, token)
  27. 27. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Input data Browser fingerprint and sum of pageviews in each content group. > head(ga.data) beginner_pv intermediate_pv advanced_pv 191352 0 2 42 990977 0 4 32 770561 0 4 48 898022 0 5 21 277510 0 6 31 644227 0 6 44 Aggregated with tidyr::spread()
  28. 28. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 k-means clustering # K-Means Cluster Analysis fit <- kmeans(ga.data, 3) # 3 clusters ... # Append cluster assignment clustered_users <- data.frame(ga.data, fit$cluster) head(clustered_users)
  29. 29. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Added label for users Color is group of interest.
  30. 30. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Clustering results > clustered_users beginner_pv intermediate_pv advanced_pv fit.cluster 266876 9 45 4 1 965265 9 51 7 1 ... 981924 19 10 8 2 732529 19 16 1 2 ... 377795 2 7 38 3 918083 2 8 28 3
  31. 31. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Clustering results - centroids Check centroids to see which cluster means which level. Marked number of pageviews in each content group. > fit$centers beginner_pv intermediate_pv advanced_pv 1 7.011765 38.42353 5.023529 #intermediate 2 25.530435 10.06087 4.713043 #beginner 3 3.628571 5.90000 32.657143 #advanced
  32. 32. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Visualize results # 3d chart install.packages("plotly") library(plotly) clustered_users <- data.frame(ga.data, fit$cluster) plot_ly(result, x = clustered_users$beginner_pv, y = clustered_users$intermediate_pv, z = clustered_users$advanced_pv, type = "scatter3d", mode = "markers", color=factor(clustered_users$fit.cluster))
  33. 33. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Visualize results
  34. 34. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 What’s next? Upload as custom dimension to Google Analytics Upload as label to CRM Prepare well targeted e-mail campaign ...
  35. 35. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Summary ● Prepare data collection ○ Google Tag Manager ○ dataLayer ○ browser fingerprinting ● Data processing and aggregation ○ Google Analytics ■ Custom Dimensions ■ Content Grouping ● Advanced data analysis ■ R + Google Analytics API ■ Unsupervised Learning - k-Means algorithm
  36. 36. Q&A Michal Brys about.me/michal.brys github.com/michalbrys/R E-book (early release :) Google Analytics + R

×