SlideShare a Scribd company logo
1 of 36
Download to read offline
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Michal Brys
Data Scientist @ Allegro
Measure Camp | London, 05th March 2016
Machine Learning
in action
Advanced users segmentation using
Google Analytics, Google Tag Manager and R.
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Michal Brys
Data Scientist @ Allegro
Specialized also in:
+ Google Analytics
+ Google Tag Manager
michalbrys.com
about.me/michal.brys
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Case study
Website with online courses on 3 levels:
Beginner Intermediate Advanced
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Case study
Problem:
I want to categorize users into groups of interests.
Note:
● Users are not registered.
No PII stored
in Google Analytics
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Before
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
After
Beginner AdvancedIntermediate Beginner
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
What I will use?
Google Tag Manager + dataLayer
Google Analytics
R Studio
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (1)
Google Tag Manager
● dataLayer - Content Grouping in Google Analytics
● dataLayer - browser fingerprint to identify users
Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (1)
<script>
dataLayer = [{
'level': 'advanced'
}];
</script>
Export from your CMS.
Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (1)
<script>
dataLayer = [{
'level': 'advanced',
'fingerprint' : '123456'
}];
</script>
https://goo.gl/1X84fY
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (2)
Google Analytics
● Content Grouping
○ Training level
● Custom Dimension
○ Fingerprint
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (2)
Google Analytics
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (2)
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (3)
Send data from dataLayer to Google Analytics:
Create new variable
in GTM -
dataLayer variable
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Prepare data collection (3)
Send data from dataLayer to Google Analytics:
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Data ready in Google Analytics
Pageviews in
content groups
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Why R?
+ the fastest growing statistic
analysis language,
+ free,
+ many libraries/packages,
+ a lot of educational materials,
+ has big community support,
+ ready to different platforms
- not efficient for *TB* of data
- external libraries with errors
- not out-of-the-box solution with
graphic interface
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Export data to R
Set up connection to Google Analytics API:
https://goo.gl/2CGLsc
Download and install R Studio:
https://goo.gl/6xJwSy
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Google Developers Console
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
RGoogleAnalytics: Set up
install.packages("RGoogleAnalytics")
require(RGoogleAnalytics)
client.id <- "xxxxxxxxxxxx.apps.googleusercontent.com"
client.secret <- "zzzzzzzzzzzz"
token <- Auth(client.id,client.secret)
# Save the token object for future sessions
save(token,file="./token_file")
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Authorize access to Google Analytics
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Hello world!
# Get the Sessions by month in 2014
query.list <- Init(start.date = "2014-01-01",
end.date = "2014-12-31",
dimensions = "ga:month",
metrics = "ga:sessions",
table.id = "ga:000000")
# Create the Query Builder object
ga.query <- QueryBuilder(query.list)
# Download data from GA and save it in a data-frame
ga.data <- GetReportData(ga.query, token)
Dim & metrics explorer:
https://goo.gl/QvTbKn
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Hello world
> head(ga.data)
month sessions
1 01 906
2 02 1643
3 03 1755
4 04 963
5 05 407
6 06 490
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Users segmentation
Unsupervised learning: k-Means
k-means clustering aims to
partition n observations into k clusters
in which each observation
belongs to the cluster with the nearest mean
https://en.wikipedia.org/wiki/K-means_clustering
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Goal
Segment your user in groups of interest.
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Users segmentation
# Get the Unique Pageviews by User in Content Group
query.list <- Init(start.date = "2014-01-01",
end.date = "2014-12-31",
dimensions = "ga:dimension1,
ga:contentGroup1",
metrics = "ga:contentGroupUniqueViews1",
table.id = "ga:000000")
# Create the Query Builder object
ga.query <- QueryBuilder(query.list)
# Extract the data and store it in a data-frame
ga.data <- GetReportData(ga.query, token)
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Input data
Browser fingerprint and sum of pageviews
in each content group.
> head(ga.data)
beginner_pv intermediate_pv advanced_pv
191352 0 2 42
990977 0 4 32
770561 0 4 48
898022 0 5 21
277510 0 6 31
644227 0 6 44
Aggregated with
tidyr::spread()
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
k-means clustering
# K-Means Cluster Analysis
fit <- kmeans(ga.data, 3) # 3 clusters
...
# Append cluster assignment
clustered_users <- data.frame(ga.data, fit$cluster)
head(clustered_users)
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Added label for users
Color is group of interest.
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Clustering results
> clustered_users
beginner_pv intermediate_pv advanced_pv fit.cluster
266876 9 45 4 1
965265 9 51 7 1
...
981924 19 10 8 2
732529 19 16 1 2
...
377795 2 7 38 3
918083 2 8 28 3
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Clustering results - centroids
Check centroids to see which cluster means which level.
Marked number of pageviews in each content group.
> fit$centers
beginner_pv intermediate_pv advanced_pv
1 7.011765 38.42353 5.023529 #intermediate
2 25.530435 10.06087 4.713043 #beginner
3 3.628571 5.90000 32.657143 #advanced
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Visualize results
# 3d chart
install.packages("plotly")
library(plotly)
clustered_users <- data.frame(ga.data, fit$cluster)
plot_ly(result, x = clustered_users$beginner_pv, y =
clustered_users$intermediate_pv, z =
clustered_users$advanced_pv, type = "scatter3d", mode =
"markers", color=factor(clustered_users$fit.cluster))
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Visualize results
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
What’s next?
Upload as custom dimension to Google Analytics
Upload as label to CRM
Prepare well targeted e-mail campaign
...
Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016
Summary
● Prepare data collection
○ Google Tag Manager
○ dataLayer
○ browser fingerprinting
● Data processing and aggregation
○ Google Analytics
■ Custom Dimensions
■ Content Grouping
● Advanced data analysis
■ R + Google Analytics API
■ Unsupervised Learning - k-Means algorithm
Q&A
Michal Brys
about.me/michal.brys
github.com/michalbrys/R
E-book (early release :)
Google Analytics + R

More Related Content

What's hot

Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarSpazioDati
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...semanticsconference
 
Graph-based Network & IT Management.
Graph-based Network & IT Management.Graph-based Network & IT Management.
Graph-based Network & IT Management.Linkurious
 
Cloud architectures for data science
Cloud architectures for data scienceCloud architectures for data science
Cloud architectures for data scienceMargriet Groenendijk
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهWeb Standards School
 
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...Martin Hawksey
 
Graph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise PapersGraph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise PapersLinkurious
 
Using Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsUsing Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsLinkurious
 
Using graph technology for multi-INT investigations
Using graph technology for multi-INT investigationsUsing graph technology for multi-INT investigations
Using graph technology for multi-INT investigationsLinkurious
 
Knowledge graphs + Chatbots with Neo4j
Knowledge graphs + Chatbots with Neo4jKnowledge graphs + Chatbots with Neo4j
Knowledge graphs + Chatbots with Neo4jChristophe Willemsen
 
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Ibrahim Muhammadi
 
MLSD18. Real-World Use Case I
MLSD18. Real-World Use Case IMLSD18. Real-World Use Case I
MLSD18. Real-World Use Case IBigML, Inc
 
MLSD18. Unsupervised Workshop
MLSD18. Unsupervised WorkshopMLSD18. Unsupervised Workshop
MLSD18. Unsupervised WorkshopBigML, Inc
 
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developersISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developersSpazioDati
 
Visualize the Knowledge Graph and Unleash Your Data
Visualize the Knowledge Graph and Unleash Your DataVisualize the Knowledge Graph and Unleash Your Data
Visualize the Knowledge Graph and Unleash Your DataLinkurious
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
 
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesSrinath Srinivasa
 

What's hot (20)

Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
Data Science in the Cloud
Data Science in the CloudData Science in the Cloud
Data Science in the Cloud
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
 
Graph-based Network & IT Management.
Graph-based Network & IT Management.Graph-based Network & IT Management.
Graph-based Network & IT Management.
 
Cloud architectures for data science
Cloud architectures for data scienceCloud architectures for data science
Cloud architectures for data science
 
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهمعرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده
 
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
Interfacing Google Analytics with Google Sheets for Primary and Secondary Dat...
 
Graph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise PapersGraph technology and data-journalism: the case of the Paradise Papers
Graph technology and data-journalism: the case of the Paradise Papers
 
Using Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsUsing Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projects
 
Using graph technology for multi-INT investigations
Using graph technology for multi-INT investigationsUsing graph technology for multi-INT investigations
Using graph technology for multi-INT investigations
 
Knowledge graphs + Chatbots with Neo4j
Knowledge graphs + Chatbots with Neo4jKnowledge graphs + Chatbots with Neo4j
Knowledge graphs + Chatbots with Neo4j
 
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
 
Power of Polyglot Search
Power of Polyglot SearchPower of Polyglot Search
Power of Polyglot Search
 
MLSD18. Real-World Use Case I
MLSD18. Real-World Use Case IMLSD18. Real-World Use Case I
MLSD18. Real-World Use Case I
 
MLSD18. Unsupervised Workshop
MLSD18. Unsupervised WorkshopMLSD18. Unsupervised Workshop
MLSD18. Unsupervised Workshop
 
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developersISWC 2014 - Dandelion: from raw data to dataGEMs for developers
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
 
Visualize the Knowledge Graph and Unleash Your Data
Visualize the Knowledge Graph and Unleash Your DataVisualize the Knowledge Graph and Unleash Your Data
Visualize the Knowledge Graph and Unleash Your Data
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
 
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on Demand
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and Opportunities
 

Viewers also liked

Web users tracking behind the scenes
Web users tracking behind the scenesWeb users tracking behind the scenes
Web users tracking behind the scenesMichal Brys
 
Poznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google AnalyticsPoznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google AnalyticsMichal Brys
 
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi GoogleBigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi GoogleMichal Brys
 
Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.Michal Brys
 
Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.Michal Brys
 
Google Analytics + R
Google Analytics + RGoogle Analytics + R
Google Analytics + RMichal Brys
 

Viewers also liked (6)

Web users tracking behind the scenes
Web users tracking behind the scenesWeb users tracking behind the scenes
Web users tracking behind the scenes
 
Poznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google AnalyticsPoznaj lepiej użytkowników Twojego serwisu z Google Analytics
Poznaj lepiej użytkowników Twojego serwisu z Google Analytics
 
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi GoogleBigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
Bigdata w serwisach e-commerce z wykorzystaniem narzędzi Google
 
Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.Google dla serwisów e-commerce. Porady Praktyczne.
Google dla serwisów e-commerce. Porady Praktyczne.
 
Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.Google Analytics + R. Praktyczne przykłady.
Google Analytics + R. Praktyczne przykłady.
 
Google Analytics + R
Google Analytics + RGoogle Analytics + R
Google Analytics + R
 

Similar to Machine Learning in action

Open Bades Action Lab
Open Bades Action Lab Open Bades Action Lab
Open Bades Action Lab Ilona Buchem
 
Evolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and ApplicationsEvolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and ApplicationsChris Bizer
 
Records-Analytics_in_Course_Design__Leveraging_Canvas_Data
Records-Analytics_in_Course_Design__Leveraging_Canvas_DataRecords-Analytics_in_Course_Design__Leveraging_Canvas_Data
Records-Analytics_in_Course_Design__Leveraging_Canvas_DataRicardo A. VanEgas
 
How to get prepared for SharePoint Syntex
How to get prepared for SharePoint SyntexHow to get prepared for SharePoint Syntex
How to get prepared for SharePoint SyntexNicolas Georgeault
 
BSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBigML, Inc
 
Google Analytics API and OS analytics tools
Google Analytics API and OS analytics toolsGoogle Analytics API and OS analytics tools
Google Analytics API and OS analytics toolsAndrás Bártházi
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Data Science Thailand
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE
 
British Library Labs Presentation at Nottingham
British Library Labs Presentation at NottinghamBritish Library Labs Presentation at Nottingham
British Library Labs Presentation at Nottinghamlabsbl
 
Odoo code search
Odoo code searchOdoo code search
Odoo code searchinitOS GmbH
 
Search London - The technical factors that every small or local business shou...
Search London - The technical factors that every small or local business shou...Search London - The technical factors that every small or local business shou...
Search London - The technical factors that every small or local business shou...StrategiQ Marketing
 
Open Learning Analytics and xAPI | HT2 Learning
Open Learning Analytics and xAPI | HT2 LearningOpen Learning Analytics and xAPI | HT2 Learning
Open Learning Analytics and xAPI | HT2 LearningHT2 Labs
 
Ben Betts Open Learning Analytics and xAPI
Ben Betts Open Learning Analytics and xAPIBen Betts Open Learning Analytics and xAPI
Ben Betts Open Learning Analytics and xAPIAaron Silvers
 
Itility marianne faro
Itility marianne faroItility marianne faro
Itility marianne faroBigDataExpo
 
What is Power BI
What is Power BIWhat is Power BI
What is Power BIDries Vyvey
 
Hardcore Data Science - in Practice
Hardcore Data Science - in PracticeHardcore Data Science - in Practice
Hardcore Data Science - in PracticeMikio L. Braun
 
BDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBigData_Europe
 
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...Patrick Guimonet
 

Similar to Machine Learning in action (20)

Open Bades Action Lab
Open Bades Action Lab Open Bades Action Lab
Open Bades Action Lab
 
Scaling up Open Badges for Open Education
Scaling up Open Badges for Open EducationScaling up Open Badges for Open Education
Scaling up Open Badges for Open Education
 
Evolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and ApplicationsEvolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and Applications
 
Records-Analytics_in_Course_Design__Leveraging_Canvas_Data
Records-Analytics_in_Course_Design__Leveraging_Canvas_DataRecords-Analytics_in_Course_Design__Leveraging_Canvas_Data
Records-Analytics_in_Course_Design__Leveraging_Canvas_Data
 
How to get prepared for SharePoint Syntex
How to get prepared for SharePoint SyntexHow to get prepared for SharePoint Syntex
How to get prepared for SharePoint Syntex
 
BSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic Workflows
 
Google Analytics API and OS analytics tools
Google Analytics API and OS analytics toolsGoogle Analytics API and OS analytics tools
Google Analytics API and OS analytics tools
 
RAGE ECGBL
RAGE ECGBLRAGE ECGBL
RAGE ECGBL
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
 
British Library Labs Presentation at Nottingham
British Library Labs Presentation at NottinghamBritish Library Labs Presentation at Nottingham
British Library Labs Presentation at Nottingham
 
Odoo code search
Odoo code searchOdoo code search
Odoo code search
 
Search London - The technical factors that every small or local business shou...
Search London - The technical factors that every small or local business shou...Search London - The technical factors that every small or local business shou...
Search London - The technical factors that every small or local business shou...
 
Open Learning Analytics and xAPI | HT2 Learning
Open Learning Analytics and xAPI | HT2 LearningOpen Learning Analytics and xAPI | HT2 Learning
Open Learning Analytics and xAPI | HT2 Learning
 
Ben Betts Open Learning Analytics and xAPI
Ben Betts Open Learning Analytics and xAPIBen Betts Open Learning Analytics and xAPI
Ben Betts Open Learning Analytics and xAPI
 
Itility marianne faro
Itility marianne faroItility marianne faro
Itility marianne faro
 
What is Power BI
What is Power BIWhat is Power BI
What is Power BI
 
Hardcore Data Science - in Practice
Hardcore Data Science - in PracticeHardcore Data Science - in Practice
Hardcore Data Science - in Practice
 
BDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - Martin
 
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
Collab365 - [FRENCH] Nouvelles options pour SharePoint 2016 et Office 365 c’e...
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Machine Learning in action

  • 1. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Michal Brys Data Scientist @ Allegro Measure Camp | London, 05th March 2016 Machine Learning in action Advanced users segmentation using Google Analytics, Google Tag Manager and R.
  • 2. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Michal Brys Data Scientist @ Allegro Specialized also in: + Google Analytics + Google Tag Manager michalbrys.com about.me/michal.brys
  • 3. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Case study Website with online courses on 3 levels: Beginner Intermediate Advanced
  • 4. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Case study Problem: I want to categorize users into groups of interests. Note: ● Users are not registered. No PII stored in Google Analytics
  • 5. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Before
  • 6. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 After Beginner AdvancedIntermediate Beginner
  • 7. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 What I will use? Google Tag Manager + dataLayer Google Analytics R Studio
  • 8. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (1) Google Tag Manager ● dataLayer - Content Grouping in Google Analytics ● dataLayer - browser fingerprint to identify users
  • 9. Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (1) <script> dataLayer = [{ 'level': 'advanced' }]; </script> Export from your CMS.
  • 10. Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (1) <script> dataLayer = [{ 'level': 'advanced', 'fingerprint' : '123456' }]; </script> https://goo.gl/1X84fY
  • 11. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (2) Google Analytics ● Content Grouping ○ Training level ● Custom Dimension ○ Fingerprint
  • 12. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (2) Google Analytics
  • 13. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (2)
  • 14. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (3) Send data from dataLayer to Google Analytics: Create new variable in GTM - dataLayer variable
  • 15. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Prepare data collection (3) Send data from dataLayer to Google Analytics:
  • 16. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Data ready in Google Analytics Pageviews in content groups
  • 17. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Why R? + the fastest growing statistic analysis language, + free, + many libraries/packages, + a lot of educational materials, + has big community support, + ready to different platforms - not efficient for *TB* of data - external libraries with errors - not out-of-the-box solution with graphic interface
  • 18. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Export data to R Set up connection to Google Analytics API: https://goo.gl/2CGLsc Download and install R Studio: https://goo.gl/6xJwSy
  • 19. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Google Developers Console
  • 20. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 RGoogleAnalytics: Set up install.packages("RGoogleAnalytics") require(RGoogleAnalytics) client.id <- "xxxxxxxxxxxx.apps.googleusercontent.com" client.secret <- "zzzzzzzzzzzz" token <- Auth(client.id,client.secret) # Save the token object for future sessions save(token,file="./token_file")
  • 21. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Authorize access to Google Analytics
  • 22. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Hello world! # Get the Sessions by month in 2014 query.list <- Init(start.date = "2014-01-01", end.date = "2014-12-31", dimensions = "ga:month", metrics = "ga:sessions", table.id = "ga:000000") # Create the Query Builder object ga.query <- QueryBuilder(query.list) # Download data from GA and save it in a data-frame ga.data <- GetReportData(ga.query, token) Dim & metrics explorer: https://goo.gl/QvTbKn
  • 23. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Hello world > head(ga.data) month sessions 1 01 906 2 02 1643 3 03 1755 4 04 963 5 05 407 6 06 490
  • 24. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Users segmentation Unsupervised learning: k-Means k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean https://en.wikipedia.org/wiki/K-means_clustering
  • 25. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Goal Segment your user in groups of interest.
  • 26. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Users segmentation # Get the Unique Pageviews by User in Content Group query.list <- Init(start.date = "2014-01-01", end.date = "2014-12-31", dimensions = "ga:dimension1, ga:contentGroup1", metrics = "ga:contentGroupUniqueViews1", table.id = "ga:000000") # Create the Query Builder object ga.query <- QueryBuilder(query.list) # Extract the data and store it in a data-frame ga.data <- GetReportData(ga.query, token)
  • 27. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Input data Browser fingerprint and sum of pageviews in each content group. > head(ga.data) beginner_pv intermediate_pv advanced_pv 191352 0 2 42 990977 0 4 32 770561 0 4 48 898022 0 5 21 277510 0 6 31 644227 0 6 44 Aggregated with tidyr::spread()
  • 28. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 k-means clustering # K-Means Cluster Analysis fit <- kmeans(ga.data, 3) # 3 clusters ... # Append cluster assignment clustered_users <- data.frame(ga.data, fit$cluster) head(clustered_users)
  • 29. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Added label for users Color is group of interest.
  • 30. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Clustering results > clustered_users beginner_pv intermediate_pv advanced_pv fit.cluster 266876 9 45 4 1 965265 9 51 7 1 ... 981924 19 10 8 2 732529 19 16 1 2 ... 377795 2 7 38 3 918083 2 8 28 3
  • 31. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Clustering results - centroids Check centroids to see which cluster means which level. Marked number of pageviews in each content group. > fit$centers beginner_pv intermediate_pv advanced_pv 1 7.011765 38.42353 5.023529 #intermediate 2 25.530435 10.06087 4.713043 #beginner 3 3.628571 5.90000 32.657143 #advanced
  • 32. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Visualize results # 3d chart install.packages("plotly") library(plotly) clustered_users <- data.frame(ga.data, fit$cluster) plot_ly(result, x = clustered_users$beginner_pv, y = clustered_users$intermediate_pv, z = clustered_users$advanced_pv, type = "scatter3d", mode = "markers", color=factor(clustered_users$fit.cluster))
  • 33. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Visualize results
  • 34. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 What’s next? Upload as custom dimension to Google Analytics Upload as label to CRM Prepare well targeted e-mail campaign ...
  • 35. Michał Bryś, Data Scientist @ Allegro, Complexity Garage @ Kraków, 05.02.2016Michał Bryś, Data Scientist @ Allegro, Measure Camp @ London, 05.03.2016 Summary ● Prepare data collection ○ Google Tag Manager ○ dataLayer ○ browser fingerprinting ● Data processing and aggregation ○ Google Analytics ■ Custom Dimensions ■ Content Grouping ● Advanced data analysis ■ R + Google Analytics API ■ Unsupervised Learning - k-Means algorithm