SlideShare a Scribd company logo
How to collect Google Analytics events to your
own data warehouse and do it on budget
Alex Levashov
Web Analytics Wednesday presentation
06 Nov 2019
Brief Intro
WWW.OWNYOURBUSINESSDATA.NET
• eCommerce consultant, run own small consultancy Magenable, specializing in
Magento
• Deal with many eCommerce related things: from strategy to implementation to
support, so not only web analytics
• Started OwnYourBusinessData.net couple months ago
OwnYourBusinessData
WWW.OWNYOURBUSINESSDATA.NET
• Own data warehouse over vendor locked in
• Central data warehouse over silos
• Open, transferable data format over vendor proprietary
• De-coupled warehouse, ETL and business analysis tool over monolith
• Open-source over proprietary
The data generated by a business should be owned by this business
for its own and its customers benefits.
WHY BOTHER TO COLLECT COPY OF GA DATA?
MOTIVATION
Why in general?
1. Being paranoid and control freak 
2. Centralization
3. Sampling
4. API Limits
Why this way?
1. Affordability
2. Low maintenance
3. Learn something new
WWW.OWNYOURBUSINESSDATA.NET
INSPIRATION AND CREDITS
Existing Snowplow GA Plugin
Google Analytics plugin for Snowplow
Approach in general
Blog post at Bostata.com “Client-side instrumentation for under $1 per month. No servers
necessary.”
WWW.OWNYOURBUSINESSDATA.NET
DISCLAMERS, NOTES
• I am just starting to use Snowplow
• Alternative ways are there and may work
better in other cases
• Link to blog post that describes the process
in more details and git repository will be
provided, so no need to write everything
WWW.OWNYOURBUSINESSDATA.NET
WWW.OWNYOURBUSINESSDATA.NET
TECHNOLOGIES USED
Approach
Snowplow architecture
Technologies we used
AWS Cloudfront AWS Lambda
Python
AWS S3 AWS Athena
WWW.OWNYOURBUSINESSDATA.NET
WWW.OWNYOURBUSINESSDATA.NET
PROCESS
Approach
WWW.OWNYOURBUSINESSDATA.NET
JS tracker
• Calls
tracking
pixel
Cloudfront
• Produces
logs
Lambda
function
• Processes
logs
• Enriches
data
• Puts to S3
Athena
• Takes S3
data
• Creates
SQL tables
WWW.OWNYOURBUSINESSDATA.NET
Why this way?
Benefits
WWW.OWNYOURBUSINESSDATA.NET
• Easy to implement
• Serverless, low resource
usage and costs (under
$1/month)
• Reliable/low maintenance
• Easy access to data (SQL)
WWW.OWNYOURBUSINESSDATA.NET
What you need to start?
WWW.OWNYOURBUSINESSDATA.NET
1. Google Analytics account
2. Google Tag Manager account
3. AWS account
4. Terraform (optional, but saves your time)
WWW.OWNYOURBUSINESSDATA.NET
Step 1. Deploy AWS infrastructure
WWW.OWNYOURBUSINESSDATA.NET
1. Manually
2. Or use Terraform script:
https://github.com/ownyourbusinessdata/snowplow-google-analytics-enrich-lambda
At the end of process you’ll get:
• Cloudfront distribution
• 3 S3 buckets for logs, tracking pixel and Athena queries results
• Tracking pixel in one S3 bucket
• Python Lambda function that does data processing and enrichment
• Athena table (empty now)
AWS Cloudfront AWS Lambda
Python
AWS S3 AWS Athena
Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Create User Defined Variable (Custom Javascript type), where you insert your tracker
Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Make another variable with type Variable Configuration and add there your Custom Javascript variable was a field
Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Use that configuration variable to modify tag configuration
Step 2. Deploy JS tracker
With Google Tag Manager
WWW.OWNYOURBUSINESSDATA.NET
Use that configuration variable to modify tag configuration
Wait
The data updates every 5-15 mins
WWW.OWNYOURBUSINESSDATA.NET
Few words about enrichment
AWS Lambda (Python)
WWW.OWNYOURBUSINESSDATA.NET
Part that we had to develop
• Processing turns logs to text
files
• Enrichment adds geo data (use
MaxMindDB)
Let’s check what we get
Access from R demo
WWW.OWNYOURBUSINESSDATA.NET
AWR.Athena package comes handy
# sample R connector to Athena DB with Snowplow events generated via
Google Analytics plugin collected
# required package to instal AWR.Athena
# connect to Athena
# install.packages("AWR.Athena")
library(AWR.Athena)
require(DBI)
library(tidyverse)
library(lubridate)
# You need AWS API user with proper access to S3 and Athena
# AWS Access Key and Secret should be set via AWS CLI, run "aws configure"
from command line
# S3OutputLocation should be taken from your Athena settings
con <- dbConnect(AWR.Athena::Athena(), region='us-west-2',
S3OutputLocation='s3://aws-athena-query-results-518190832416-us-
west-2/',
Schema='default')
# get list of tables available
dbListTables(con)
#query specific table (all records, SQL statement can be any supported by
Athena)
df <- as_tibble(dbGetQuery(con, "Select * from eventsga"))
Let’s check what we get
AWS S3 and Athana live demo
WWW.OWNYOURBUSINESSDATA.NET
WWW.OWNYOURBUSINESSDATA.NET
References
WWW.OWNYOURBUSINESSDATA.NET
• Collect Google Analytics events in your own cheap AWS warehouse with Snowplow (OwnYourBusinessData)
https://www.ownyourbusinessdata.net/collect-google-analytics-events-in-your-own-cheap-aws-warehouse-with-snowplow
• Snowplow data enrichment with Lambda (OwnYourBusinessData)
https://www.ownyourbusinessdata.net/enrich-snowplow-data-with-aws-lambda-function/
• Connect R to Athena (OwnYourBusinessData)
https://www.ownyourbusinessdata.net/connecting-r-to-athena-to-analyse-snowplow-events/
• Own Your Business Data Git
https://github.com/ownyourbusinessdata/
• Client-side instrumentation for under $1 per month. No servers necessary (Bostata)
https://bostata.com/client-side-instrumentation-for-under-one-dollar/
Q&A TIME
WWW.OWNYOURBUSINESSDATA.NET
WWW.OWNYOURBUSINESSDATA.NET
Contacts
Web: OwnYourBusinessData.Net
Twitter: https://twitter.com/own_data
LinkedIn: https://www.linkedin.com/groups/12283165/
OwnYourBusinessData
Web: https://levashov.biz/
Twitter: https://twitter.com/levashovbiz
LinkedIn: https://www.linkedin.com/in/alevashov/
Alex Levashov
Looking for people interested to join the course

More Related Content

What's hot

Spark logs made easy
Spark logs made easySpark logs made easy
Spark logs made easy
Simona Meriam
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
Amazon Web Services
 
A quick introduction to AWS Lambda
A quick introduction to AWS LambdaA quick introduction to AWS Lambda
A quick introduction to AWS Lambda
ogeisser
 
Automating Application over OpenStack using Workflows
Automating Application over OpenStack using WorkflowsAutomating Application over OpenStack using Workflows
Automating Application over OpenStack using Workflows
Yaron Parasol
 
SmartNews's journey into microservices
SmartNews's journey into microservicesSmartNews's journey into microservices
SmartNews's journey into microservices
SmartNews, Inc.
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
Amazon Web Services
 
AWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansAWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It Means
RightScale
 
(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless
Claudio Pontili
 
StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?
Andrew Paxley
 
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
Amazon Web Services
 
Big data for dot net Devs with Spark
Big data for dot net Devs with SparkBig data for dot net Devs with Spark
Big data for dot net Devs with Spark
Nilesh Gule
 
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Amazon Web Services
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017
David McDaniel
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
Lynn Langit
 
Eric Williams (Rackspace) - Using Heat on OpenStack
Eric Williams (Rackspace) - Using Heat on OpenStackEric Williams (Rackspace) - Using Heat on OpenStack
Eric Williams (Rackspace) - Using Heat on OpenStack
Outlyer
 
Event driven workloads on Kubernetes with KEDA
Event driven workloads on Kubernetes with KEDAEvent driven workloads on Kubernetes with KEDA
Event driven workloads on Kubernetes with KEDA
Nilesh Gule
 
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Amazon Web Services
 
New Relic Plugin for Cassandra | Blue Medora
New Relic Plugin for Cassandra | Blue MedoraNew Relic Plugin for Cassandra | Blue Medora
New Relic Plugin for Cassandra | Blue Medora
Blue Medora
 
Getting started with Serverless applications on Microsoft Azure
Getting started with Serverless applications on Microsoft AzureGetting started with Serverless applications on Microsoft Azure
Getting started with Serverless applications on Microsoft Azure
Nilesh Gule
 
Automate all your EMR related activities
Automate all your EMR related activitiesAutomate all your EMR related activities
Automate all your EMR related activities
Eitan Sela
 

What's hot (20)

Spark logs made easy
Spark logs made easySpark logs made easy
Spark logs made easy
 
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
AWS re:Invent 2016: Another Day in the Life of a Netflix Engineer (DEV209)
 
A quick introduction to AWS Lambda
A quick introduction to AWS LambdaA quick introduction to AWS Lambda
A quick introduction to AWS Lambda
 
Automating Application over OpenStack using Workflows
Automating Application over OpenStack using WorkflowsAutomating Application over OpenStack using Workflows
Automating Application over OpenStack using Workflows
 
SmartNews's journey into microservices
SmartNews's journey into microservicesSmartNews's journey into microservices
SmartNews's journey into microservices
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
 
AWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansAWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It Means
 
(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless(New)SQL on AWS: Aurora serverless
(New)SQL on AWS: Aurora serverless
 
StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?
 
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
Best Practices for Securing Serverless Applications (SEC362-R1) - AWS re:Inve...
 
Big data for dot net Devs with Spark
Big data for dot net Devs with SparkBig data for dot net Devs with Spark
Big data for dot net Devs with Spark
 
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
Build an End-To-End IoT Example with AWS IoT Core (IOT211-R2) - AWS re:Invent...
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
 
Eric Williams (Rackspace) - Using Heat on OpenStack
Eric Williams (Rackspace) - Using Heat on OpenStackEric Williams (Rackspace) - Using Heat on OpenStack
Eric Williams (Rackspace) - Using Heat on OpenStack
 
Event driven workloads on Kubernetes with KEDA
Event driven workloads on Kubernetes with KEDAEvent driven workloads on Kubernetes with KEDA
Event driven workloads on Kubernetes with KEDA
 
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
 
New Relic Plugin for Cassandra | Blue Medora
New Relic Plugin for Cassandra | Blue MedoraNew Relic Plugin for Cassandra | Blue Medora
New Relic Plugin for Cassandra | Blue Medora
 
Getting started with Serverless applications on Microsoft Azure
Getting started with Serverless applications on Microsoft AzureGetting started with Serverless applications on Microsoft Azure
Getting started with Serverless applications on Microsoft Azure
 
Automate all your EMR related activities
Automate all your EMR related activitiesAutomate all your EMR related activities
Automate all your EMR related activities
 

Similar to How to collect Google Analytics events to your own data warehouse and do it on budget

How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
Lecole Cole
 
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAmazon Web Services
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
ecobold
 
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Amazon Web Services
 
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsAmazon Web Services
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
Amazon Web Services
 
Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014
Tom Laszewski
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAmazon Web Services
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
Informatica
 
Running Business Critical Workloads on AWS – Nam Je Cho
Running Business Critical Workloads on AWS – Nam Je ChoRunning Business Critical Workloads on AWS – Nam Je Cho
Running Business Critical Workloads on AWS – Nam Je Cho
Amazon Web Services
 
AWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAmazon Web Services
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
GirdhareeSaran
 
Hybrid IT with Amazon Web Services: Best of Both Worlds
Hybrid IT with Amazon Web Services: Best of Both WorldsHybrid IT with Amazon Web Services: Best of Both Worlds
Hybrid IT with Amazon Web Services: Best of Both Worlds
Amazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
Amazon Web Services
 
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Amazon Web Services
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
Amazon Web Services
 
Running Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS CloudRunning Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS Cloud
Amazon Web Services
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
Amazon Web Services
 
Your First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesYour First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web Services
Amazon Web Services
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The Netherlands
Marek Kuczynski
 

Similar to How to collect Google Analytics events to your own data warehouse and do it on budget (20)

How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
 
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014Oracle Solutions on AWS : May 2014
Oracle Solutions on AWS : May 2014
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Running Business Critical Workloads on AWS – Nam Je Cho
Running Business Critical Workloads on AWS – Nam Je ChoRunning Business Critical Workloads on AWS – Nam Je Cho
Running Business Critical Workloads on AWS – Nam Je Cho
 
AWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWS
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Hybrid IT with Amazon Web Services: Best of Both Worlds
Hybrid IT with Amazon Web Services: Best of Both WorldsHybrid IT with Amazon Web Services: Best of Both Worlds
Hybrid IT with Amazon Web Services: Best of Both Worlds
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
 
Running Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS CloudRunning Business-Critical Applications on the AWS Cloud
Running Business-Critical Applications on the AWS Cloud
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
Your First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesYour First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web Services
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The Netherlands
 

More from Alex Levashov

Altima better creditcardform-1.0.0_instructions
Altima better creditcardform-1.0.0_instructionsAltima better creditcardform-1.0.0_instructions
Altima better creditcardform-1.0.0_instructions
Alex Levashov
 
Coursera mafash 2014 statement of accomplishment
Coursera mafash 2014 statement of accomplishmentCoursera mafash 2014 statement of accomplishment
Coursera mafash 2014 statement of accomplishmentAlex Levashov
 
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
Alex Levashov
 
Suburbarian - presentation
Suburbarian - presentationSuburbarian - presentation
Suburbarian - presentation
Alex Levashov
 
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
Alex Levashov
 
Conversion optimization basics: how to extract more value online
Conversion optimization basics: how to extract more value onlineConversion optimization basics: how to extract more value online
Conversion optimization basics: how to extract more value online
Alex Levashov
 
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guideLookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
Alex Levashov
 
Product pricing: how to extract more value
Product pricing: how to extract more valueProduct pricing: how to extract more value
Product pricing: how to extract more value
Alex Levashov
 

More from Alex Levashov (8)

Altima better creditcardform-1.0.0_instructions
Altima better creditcardform-1.0.0_instructionsAltima better creditcardform-1.0.0_instructions
Altima better creditcardform-1.0.0_instructions
 
Coursera mafash 2014 statement of accomplishment
Coursera mafash 2014 statement of accomplishmentCoursera mafash 2014 statement of accomplishment
Coursera mafash 2014 statement of accomplishment
 
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
Magento capabilities, presentation from Magenable - Melbourne Magento eCommer...
 
Suburbarian - presentation
Suburbarian - presentationSuburbarian - presentation
Suburbarian - presentation
 
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
Presentation of Magenable eCommerce consultancy, Magento ecommerce focused, b...
 
Conversion optimization basics: how to extract more value online
Conversion optimization basics: how to extract more value onlineConversion optimization basics: how to extract more value online
Conversion optimization basics: how to extract more value online
 
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guideLookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
Lookbook Cloud (Facebook slideshow with multiple hotspots app) user guide
 
Product pricing: how to extract more value
Product pricing: how to extract more valueProduct pricing: how to extract more value
Product pricing: how to extract more value
 

Recently uploaded

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

How to collect Google Analytics events to your own data warehouse and do it on budget

  • 1. How to collect Google Analytics events to your own data warehouse and do it on budget Alex Levashov Web Analytics Wednesday presentation 06 Nov 2019
  • 2. Brief Intro WWW.OWNYOURBUSINESSDATA.NET • eCommerce consultant, run own small consultancy Magenable, specializing in Magento • Deal with many eCommerce related things: from strategy to implementation to support, so not only web analytics • Started OwnYourBusinessData.net couple months ago
  • 3. OwnYourBusinessData WWW.OWNYOURBUSINESSDATA.NET • Own data warehouse over vendor locked in • Central data warehouse over silos • Open, transferable data format over vendor proprietary • De-coupled warehouse, ETL and business analysis tool over monolith • Open-source over proprietary The data generated by a business should be owned by this business for its own and its customers benefits.
  • 4. WHY BOTHER TO COLLECT COPY OF GA DATA? MOTIVATION Why in general? 1. Being paranoid and control freak  2. Centralization 3. Sampling 4. API Limits Why this way? 1. Affordability 2. Low maintenance 3. Learn something new WWW.OWNYOURBUSINESSDATA.NET
  • 5. INSPIRATION AND CREDITS Existing Snowplow GA Plugin Google Analytics plugin for Snowplow Approach in general Blog post at Bostata.com “Client-side instrumentation for under $1 per month. No servers necessary.” WWW.OWNYOURBUSINESSDATA.NET
  • 6. DISCLAMERS, NOTES • I am just starting to use Snowplow • Alternative ways are there and may work better in other cases • Link to blog post that describes the process in more details and git repository will be provided, so no need to write everything WWW.OWNYOURBUSINESSDATA.NET
  • 7. WWW.OWNYOURBUSINESSDATA.NET TECHNOLOGIES USED Approach Snowplow architecture Technologies we used AWS Cloudfront AWS Lambda Python AWS S3 AWS Athena WWW.OWNYOURBUSINESSDATA.NET
  • 8. WWW.OWNYOURBUSINESSDATA.NET PROCESS Approach WWW.OWNYOURBUSINESSDATA.NET JS tracker • Calls tracking pixel Cloudfront • Produces logs Lambda function • Processes logs • Enriches data • Puts to S3 Athena • Takes S3 data • Creates SQL tables
  • 9. WWW.OWNYOURBUSINESSDATA.NET Why this way? Benefits WWW.OWNYOURBUSINESSDATA.NET • Easy to implement • Serverless, low resource usage and costs (under $1/month) • Reliable/low maintenance • Easy access to data (SQL)
  • 10. WWW.OWNYOURBUSINESSDATA.NET What you need to start? WWW.OWNYOURBUSINESSDATA.NET 1. Google Analytics account 2. Google Tag Manager account 3. AWS account 4. Terraform (optional, but saves your time)
  • 11. WWW.OWNYOURBUSINESSDATA.NET Step 1. Deploy AWS infrastructure WWW.OWNYOURBUSINESSDATA.NET 1. Manually 2. Or use Terraform script: https://github.com/ownyourbusinessdata/snowplow-google-analytics-enrich-lambda At the end of process you’ll get: • Cloudfront distribution • 3 S3 buckets for logs, tracking pixel and Athena queries results • Tracking pixel in one S3 bucket • Python Lambda function that does data processing and enrichment • Athena table (empty now) AWS Cloudfront AWS Lambda Python AWS S3 AWS Athena
  • 12. Step 2. Deploy JS tracker With Google Tag Manager WWW.OWNYOURBUSINESSDATA.NET Create User Defined Variable (Custom Javascript type), where you insert your tracker
  • 13. Step 2. Deploy JS tracker With Google Tag Manager WWW.OWNYOURBUSINESSDATA.NET Make another variable with type Variable Configuration and add there your Custom Javascript variable was a field
  • 14. Step 2. Deploy JS tracker With Google Tag Manager WWW.OWNYOURBUSINESSDATA.NET Use that configuration variable to modify tag configuration
  • 15. Step 2. Deploy JS tracker With Google Tag Manager WWW.OWNYOURBUSINESSDATA.NET Use that configuration variable to modify tag configuration
  • 16. Wait The data updates every 5-15 mins WWW.OWNYOURBUSINESSDATA.NET
  • 17. Few words about enrichment AWS Lambda (Python) WWW.OWNYOURBUSINESSDATA.NET Part that we had to develop • Processing turns logs to text files • Enrichment adds geo data (use MaxMindDB)
  • 18. Let’s check what we get Access from R demo WWW.OWNYOURBUSINESSDATA.NET AWR.Athena package comes handy # sample R connector to Athena DB with Snowplow events generated via Google Analytics plugin collected # required package to instal AWR.Athena # connect to Athena # install.packages("AWR.Athena") library(AWR.Athena) require(DBI) library(tidyverse) library(lubridate) # You need AWS API user with proper access to S3 and Athena # AWS Access Key and Secret should be set via AWS CLI, run "aws configure" from command line # S3OutputLocation should be taken from your Athena settings con <- dbConnect(AWR.Athena::Athena(), region='us-west-2', S3OutputLocation='s3://aws-athena-query-results-518190832416-us- west-2/', Schema='default') # get list of tables available dbListTables(con) #query specific table (all records, SQL statement can be any supported by Athena) df <- as_tibble(dbGetQuery(con, "Select * from eventsga"))
  • 19. Let’s check what we get AWS S3 and Athana live demo WWW.OWNYOURBUSINESSDATA.NET
  • 20. WWW.OWNYOURBUSINESSDATA.NET References WWW.OWNYOURBUSINESSDATA.NET • Collect Google Analytics events in your own cheap AWS warehouse with Snowplow (OwnYourBusinessData) https://www.ownyourbusinessdata.net/collect-google-analytics-events-in-your-own-cheap-aws-warehouse-with-snowplow • Snowplow data enrichment with Lambda (OwnYourBusinessData) https://www.ownyourbusinessdata.net/enrich-snowplow-data-with-aws-lambda-function/ • Connect R to Athena (OwnYourBusinessData) https://www.ownyourbusinessdata.net/connecting-r-to-athena-to-analyse-snowplow-events/ • Own Your Business Data Git https://github.com/ownyourbusinessdata/ • Client-side instrumentation for under $1 per month. No servers necessary (Bostata) https://bostata.com/client-side-instrumentation-for-under-one-dollar/
  • 22. WWW.OWNYOURBUSINESSDATA.NET Contacts Web: OwnYourBusinessData.Net Twitter: https://twitter.com/own_data LinkedIn: https://www.linkedin.com/groups/12283165/ OwnYourBusinessData Web: https://levashov.biz/ Twitter: https://twitter.com/levashovbiz LinkedIn: https://www.linkedin.com/in/alevashov/ Alex Levashov Looking for people interested to join the course