Nell’iperspazio con Rocket: il Framework Web di Rust!
Machine Learning in action
1. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Michal Brys
Data Scientist @ Allegro
Measure Camp | London, 05th March 2016
Machine Learning
in action
Advanced users segmentation using
Google Analytics, Google Tag Manager and R.
2. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Michal Brys
Data Scientist @ Allegro
Specialized also in:
+ Google Analytics
+ Google Tag Manager
michalbrys.com
about.me/michal.brys
3. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Case study
Website with online courses on 3 levels:
Beginner Intermediate Advanced
4. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Case study
Problem:
I want to categorize users into groups of interests.
Note:
● Users are not registered.
No PII stored
in Google Analytics
5. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Before
6. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
After
Beginner AdvancedIntermediate Beginner
7. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
What I will use?
Google Tag Manager + dataLayer
Google Analytics
R Studio
8. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (1)
Google Tag Manager
● dataLayer - Content Grouping in Google Analytics
● dataLayer - browser fingerprint to identify users
9. Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (1)
<script>
dataLayer = [{
'level': 'advanced'
}];
</script>
Export from your CMS.
10. Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (1)
<script>
dataLayer = [{
'level': 'advanced',
'fingerprint' : '123456'
}];
</script>
https://goo.gl/1X84fY
11. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (2)
Google Analytics
● Content Grouping
○ Training level
● Custom Dimension
○ Fingerprint
12. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (2)
Google Analytics
13. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (2)
14. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (3)
Send data from dataLayer to Google Analytics:
Create new variable
in GTM -
dataLayer variable
15. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (3)
Send data from dataLayer to Google Analytics:
16. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Data ready in Google Analytics
Pageviews in
content groups
17. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Why R?
+ the fastest growing statistic
analysis language,
+ free,
+ many libraries/packages,
+ a lot of educational materials,
+ has big community support,
+ ready to different platforms
- not efficient for *TB* of data
- external libraries with errors
- not out-of-the-box solution with
graphic interface
18. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Export data to R
Set up connection to Google Analytics API:
https://goo.gl/2CGLsc
Download and install R Studio:
https://goo.gl/6xJwSy
19. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Google Developers Console
20. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
RGoogleAnalytics: Set up
install.packages("RGoogleAnalytics")
require(RGoogleAnalytics)
client.id <- "xxxxxxxxxxxx.apps.googleusercontent.com"
client.secret <- "zzzzzzzzzzzz"
token <- Auth(client.id,client.secret)
# Save the token object for future sessions
save(token,file="./token_file")
21. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Authorize access to Google Analytics
22. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Hello world!
# Get the Sessions by month in 2014
query.list <- Init(start.date = "2014-01-01",
end.date = "2014-12-31",
dimensions = "ga:month",
metrics = "ga:sessions",
table.id = "ga:000000")
# Create the Query Builder object
ga.query <- QueryBuilder(query.list)
# Download data from GA and save it in a data-frame
ga.data <- GetReportData(ga.query, token)
Dim & metrics explorer:
https://goo.gl/QvTbKn
24. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Users segmentation
Unsupervised learning: k-Means
k-means clustering aims to
partition n observations into k clusters
in which each observation
belongs to the cluster with the nearest mean
https://en.wikipedia.org/wiki/K-means_clustering
25. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Goal
Segment your user in groups of interest.
26. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Users segmentation
# Get the Unique Pageviews by User in Content Group
query.list <- Init(start.date = "2014-01-01",
end.date = "2014-12-31",
dimensions = "ga:dimension1,
ga:contentGroup1",
metrics = "ga:contentGroupUniqueViews1",
table.id = "ga:000000")
# Create the Query Builder object
ga.query <- QueryBuilder(query.list)
# Extract the data and store it in a data-frame
ga.data <- GetReportData(ga.query, token)
27. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Input data
Browser fingerprint and sum of pageviews
in each content group.
> head(ga.data)
beginner_pv intermediate_pv advanced_pv
191352 0 2 42
990977 0 4 32
770561 0 4 48
898022 0 5 21
277510 0 6 31
644227 0 6 44
Aggregated with
tidyr::spread()
29. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Added label for users
Color is group of interest.
31. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Clustering results - centroids
Check centroids to see which cluster means which level.
Marked number of pageviews in each content group.
> fit$centers
beginner_pv intermediate_pv advanced_pv
1 7.011765 38.42353 5.023529 #intermediate
2 25.530435 10.06087 4.713043 #beginner
3 3.628571 5.90000 32.657143 #advanced
32. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Visualize results
# 3d chart
install.packages("plotly")
library(plotly)
clustered_users <- data.frame(ga.data, fit$cluster)
plot_ly(result, x = clustered_users$beginner_pv, y =
clustered_users$intermediate_pv, z =
clustered_users$advanced_pv, type = "scatter3d", mode =
"markers", color=factor(clustered_users$fit.cluster))
33. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Visualize results
34. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
What’s next?
Upload as custom dimension to Google Analytics
Upload as label to CRM
Prepare well targeted e-mail campaign
...
35. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Summary
● Prepare data collection
○ Google Tag Manager
○ dataLayer
○ browser fingerprinting
● Data processing and aggregation
○ Google Analytics
■ Custom Dimensions
■ Content Grouping
● Advanced data analysis
■ R + Google Analytics API
■ Unsupervised Learning - k-Means algorithm