How big data and machine learning can become a powerful and easy tool for ecommerce analytics ? By using Big Query, SQL and ML.
Here at MicroHub we use Google Cloud to build price effective and high performance cloud solutions.
2. BigQuery ML
ML models by using SQL
BigQuery ML enables users to create
and execute machine learning models in
BigQuery by using SQL queries.
The goal is to democratize machine
learning by enabling SQL practitioners to
build models using their existing tools
and to increase development speed by
eliminating the need for data movement.
3. Goal
Website visitors
By interacting with the
website, users sessions
generate over 200 metrics,
stored for future analysis
Log sessions data
Data analyst team exports the
analytics logs for an
ecommerce website into
BigQuery.
Then creates a new table of all
the raw ecommerce visitor
session data for you to explore.
Prediction Model
Create a model using your
dataset.
Train your model, adjusting
various weights in the model so
that the model's predictions
match the true values.
Evaluate the performance of
the classifier against the actual
data. Use your model to predict
an outcome.
Build a model to predict whether a website visitor will make a transaction
4. Q&A with the dataset
Out of 250+ possible metrics,
Build unlimited, flexible and dynamic key metrics analytics backend
schemas for your ecommerce with BigQuery
5. #1
Out of the total visitors who
visited our website, what % made
a purchase?
2.69%
6. #2
What are the top 5 selling
products?
Top 5
Product name
Product
category
Nest USANest Nest
Nest Nest
Nest
7. #3
How many visitors bought on
subsequent visits to the website?
Later we’ll better analyze that to increase conversion rates and
reduce the outflow to competitor sites by better understanding
new visitors.
11873 or 1.6%
8. #4
How far the visitor got in the
checkout process on their first
visit?
Checkout_options
11. #7
Which two fields are known after
a visitor's first session?
bounces, time_on_site
12. #tunning the model
Your team start to test whether these two fields are good inputs for your
classification model
What are the risks of only using the above two fields?
Training a model on just these two fields is a start, you will see if they're good enough to
produce an accurate model.
totals.bounces totals.timeOnSite
13. #1
Which fields are the model
features? What is the label?
bounces,time_on_site.
14. Looking at the initial data results,
do you think time_on_site and
bounces will be a good indicator
of whether the user will return and
purchase or not?
Low ROC(Receiver Operating
Characteristic), 0.73
#tunning the model
Increase the ROC
15. #feature engineering
To better understand the relationship between a visitor’s first session and
the likelihood that they will purchase on a subsequent visit
Let’s search and try more features
Goal
Improve predictive power of the model.
hits.eCommerceAction trafficSource device.deviceCategory geoNetwork.country
16. The key new feature added to the
training dataset query is the
maximum checkout progress
each visitor reached in their
session.
Higher ROC(Receiver Operating
Characteristic) of 0.92
#feature engineering
17. Which new visitor will come back
and make a purchase ?
Higher ROC(Receiver Operating
Characteristic) of 0.92
#feature engineering
ROC sweet spot