BigQuery & SQL for SEOs
@areej_abuali linkedin.com/in/areejabuali/
@areej_abuali
HELLO!
I’m here to talk to you about how (5
months ago) I started using BigQuery
& SQL in my day to day job.
2
@areej_abuali
HELLO!
I’m here to talk to you about how (5
months ago) I started using was forced
to start using BigQuery & SQL in my
day to day job.
3
@areej_abuali
Agency-side
4
Client-side
@areej_abuali
35,000,000
Indexed Pages
55,000,000
Monthly Visits
5
@areej_abuali
6
Don’t get me started on Excel...
@areej_abuali
7
And using GA wasn’t going to cut it!
@areej_abuali
8
Note: This GA interface screenshot is not
Zoopla data and I do not plan on showing
any Zoopla data throughout the slides...
@areej_abuali
9
...I just got my job 5 months ago
and would like to keep it!
@areej_abuali
So this talk is about how we can all
adopt BigQuery and start doing really
cool stuff with it...
10
“ It’s about learning to feel
comfortable with the
uncomfortable.
11
@areej_abuali
12
My 24/7
state of
mind!
And that it’s okay to feel this way!
@areej_abuali
Let’s set the scene!
@areej_abuali
14
Google Cloud Platform
@areej_abuali
What is BigQuery?
15
“BigQuery is an enterprise data warehouse that
stores and queries massive datasets by
enabling super-fast SQL queries using the
processing power of Google’s infrastructure.”
@areej_abuali
What is BigQuery?
16
“BigQuery is an enterprise data warehouse that
stores and queries massive datasets by
enabling super-fast SQL queries using the
processing power of Google’s infrastructure.”
TOO MUCH
GIBBERISH!
@areej_abuali
So what is it then?
17
It’s a thing that will help you analyse
massive datasets quickly and easily
via SQL!
@areej_abuali
Why is it useful?
▸ It’s cloud-based (super scalable)
18
@areej_abuali
Why is it useful?
▸ It’s cloud-based (super scalable)
▸ Unlimited access to historical data
19
@areej_abuali
Why is it useful?
▸ It’s cloud-based (super scalable)
▸ Unlimited access to historical data
▸ It’s pay as you go (1TB = $5)
20
@areej_abuali
Why is it useful?
▸ It’s cloud-based (super scalable)
▸ Unlimited access to historical data
▸ It’s pay as you go (1TB = $5)
▸ Simple interface and setup
21
@areej_abuali
And as for SQL...
▸ It’s a language used for extracting and
analysing data stored in databases
22
@areej_abuali
And as for SQL...
▸ It’s a language used for extracting and
analysing data stored in databases
▸ It’s way faster than Excel because the data
you’re analysing is stored separately
23
@areej_abuali
And as for SQL...
▸ It’s a language used for extracting and
analysing data stored in databases
▸ It’s way faster than Excel because the data
you’re analysing is stored separately
▸ Your code is reusable
24
@areej_abuali
It’s pseudo-codish!
SELECT *
FROM example_table
WHERE example_column = "value"
25
@areej_abuali
What did I need?
26
Data query
(repeatedly)
@areej_abuali
What did I need?
27
Data query
(repeatedly)
Advanced
filtering
@areej_abuali
What did I need?
28
Data query
(repeatedly)
Advanced
filtering
Sort large
datasets
@areej_abuali
Let’s get started!
@areej_abuali
30
console.cloud.google.com/bigquery
@areej_abuali
31
Query Editor
@areej_abuali
32
Query Editor
Your datasets are
stored here
@areej_abuali
33
Query Editor
Your datasets are
stored here
You can see your
Job History & Query
History here
@areej_abuali
34
https://cloud.google.com/bigquery/docs/loading-data
@areej_abuali
35
BigQuery Cookbook:
support.google.com/analytics
/answer/4419694
@areej_abuali
GA Sample Dataset
36
https://bigquery.cloud.google.com/table
/bigquery-public-data:google_analytics_
sample.ga_sessions_20170801
https://support.google.com/analytics/answer/7586738
@areej_abuali
GA Sample Dataset
37
https://bigquery.cloud.google.com/table
/bigquery-public-data:google_analytics_
sample.ga_sessions_20170801
https://support.google.com/analytics/answer/7586738
Because if I use
Zoopla data...
@areej_abuali
SQL Query
38
SELECT FROM WHERE
ORDER BY LIMIT
@areej_abuali
SQL Query - Select
39
What columns do you
want to pull?
@areej_abuali
SQL Query - Select
338 columns in total
40
@areej_abuali
SQL Query - Select
▸ SELECT *
▸ SELECT date, visitNumber
▸ SELECT visitNumber as Number
41
@areej_abuali
SELECT
date as Date,
channelGrouping as Channel,
totals.visits as Visits,
totals.transactionRevenue as Revenue
42
SQL Query - Select
@areej_abuali
SQL Query - From
43
Which data source do you
want to pull from?
@areej_abuali
SQL Query - From
44
PROJECT ID DATASET TABLE
@areej_abuali
SQL Query - From
45
PROJECT ID DATASET TABLE
bigquery-public-data.google_analytics_sample.ga_sessions_20170801
@areej_abuali
SELECT
date as Date,
channelGrouping as Channel,
totals.visits as Visits,
totals.transactionRevenue as Revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
46
SQL Query - From
@areej_abuali
SQL Query - Where
47
What filters do you want
to apply?
@areej_abuali
SQL Query - Where
▸ WHERE channelGrouping = ‘Organic Search’
▸ WHERE channelGrouping in (‘Organic Search’, ‘Direct’)
▸ WHERE channelGrouping = ‘Organic Search’ AND date =
‘20170701’
48
@areej_abuali
SELECT
date as Date,
channelGrouping as Channel,
totals.visits as Visits,
totals.transactionRevenue as Revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE channelGrouping = 'Organic Search'
49
SQL Query - Where
@areej_abuali
SQL Query - Order By
50
How do you want to sort
your data?
@areej_abuali
SELECT
date as Date,
channelGrouping as Channel,
totals.visits as Visits,
totals.transactionRevenue as Revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE channelGrouping = 'Organic Search'
ORDER BY totals.visits desc
51
SQL Query - Order By
@areej_abuali
SQL Query - Limit
52
How many rows do you
want to return?
@areej_abuali
SELECT
date as Date,
channelGrouping as Channel,
totals.visits as Visits,
totals.transactionRevenue as Revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE channelGrouping = 'Organic Search'
ORDER BY Revenue desc
LIMIT 100
53
SQL Query - Limit
@areej_abuali
#standardSQL
SELECT
date as Date,
channelGrouping as Channel,
totals.visits as Visits,
totals.transactionRevenue as Revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE channelGrouping = 'Organic Search'
ORDER BY Revenue desc
LIMIT 100
54
Standard vs
Legacy
@areej_abuali
55
@areej_abuali
Your typical process...
56
▸ Open GA
▸ Filter data in GA
▸ Export GA data
@areej_abuali
Your typical process...
57
▸ Open GA
▸ Filter data in GA
▸ Export GA data
▸ Open Excel
▸ Clean data
▸ Filter data
▸ Sort data
@areej_abuali
Your typical process...
58
▸ Open GA
▸ Filter data in GA
▸ Export GA data
▸ Open Excel
▸ Clean data
▸ Filter data
▸ Sort data
Cry because everything
breaks and you get the
spinning wheel of death
@areej_abuali
Your typical process...
59
@areej_abuali
60
2.1 seconds!
@areej_abuali
61
But what if I want to
sum up some of my
values?
@areej_abuali
SQL Query - Select
62
SELECT
date as Date,
channelGrouping as Channel,
sum(totals.visits) as Visits,
sum(totals.transactionRevenue) as Revenue
@areej_abuali
SQL Query - Group By
63
SELECT
date as Date,
channelGrouping as Channel,
sum(totals.visits) as Visits,
sum(totals.transactionRevenue) as Revenue
GROUP BY Date, Channel
Non-aggregated
columns should
be in Group By
@areej_abuali
SELECT
date as Date,
channelGrouping as Channel,
sum(totals.visits) as Visits,
sum(totals.transactionRevenue) as Revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE channelGrouping = 'Organic Search'
GROUP BY Date, Channel
ORDER BY Revenue desc
LIMIT 100
64
@areej_abuali
65
@areej_abuali
Real Life Challenge
@areej_abuali
Challenge
67
What if all of our data wasn’t living
in the same table?
@areej_abuali
Sessions Data
68
Transaction Data
@areej_abuali
Excel
69
▸ VLookup
▸ MatchIndex (for VLookup haters)
@areej_abuali
Join
70
Takes all the data from your first
table and joins rows from a second
table (using a common metric)
@areej_abuali
71
@areej_abuali
Left Join
72
FROM `Table 1` a
LEFT JOIN `Table 2` b
ON (a.metric = b.metric)
@areej_abuali
Left Join
73
FROM `project-1234.analytics.ga_sessions` a
LEFT JOIN
`project-1234.analytics.ga_transactions` b
ON (a.ga_session_id = b.ga_session_id)
@areej_abuali
SELECT
a.channelGrouping as Channel,
sum(a.totals.visits) as Visits,
sum(b.totals.transactionRevenue) as Revenue
FROM `project-1234.analytics.ga_sessions` a
LEFT JOIN `project-1234.analytics.ga_transactions` b
ON (a.ga_session_id = b.ga_session_id)
WHERE a.channelGrouping = 'Organic Search'
74
@areej_abuali
SQL Query
75
SELECT
FROM
WHERE GROUP BY
JOINORDER BY
@areej_abuali
Covered
▸ SELECT
▸ FROM
▸ WHERE
▸ ORDER BY
▸ GROUP BY
▸ JOIN
▸ LIMIT
76
Not Covered
▸ HAVING
▸ WINDOW
▸ UNION
▸ WITH
@areej_abuali
77
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax
@areej_abuali
Supported Functions
▸ Aggregate
▸ Arithmetic
▸ Comparison
▸ Date & Time
▸ Logical Operators
▸ Regular Expressions
▸ String
78
@areej_abuali
So, What’s Next?
@areej_abuali
80
In this talk, we’ve simply
scratched the surface
by analysing Analytics data...
@areej_abuali
There’s so much more to do!
81
Crawl Data
@areej_abuali
There’s so much more to do!
82
Crawl Data Link Data
@areej_abuali
There’s so much more to do!
83
Crawl Data Link Data Log Files
@areej_abuali
84
And there are so many
smart(er) people and resources
to help you do that!
@areej_abuali
85
BristolSEO (Jan 28th)
ReadingSEO (Feb 13th)
Hayden Roche - Technical SEO at Scale
Attend more advanced talks!
https://www.meetup.com/bristol-seo/
https://www.meetup.com/SEO-Meetup-Reading/
@HaydenRoche3
@areej_abuali
86
Read everything Dom writes!
@dom_woodman
▸ How to Use BigQuery for Large-Scale SEO
▸ Guide to Log Analysis with Big Query
https://moz.com/blog/how-to-bigquery-large-scale-seo
https://www.distilled.net/log-file-analysis/
@areej_abuali
Beautiful Dom Slide!
▸ How long does it take for a page to be discovered after
being published?
▸ Which pages have requests from Googlebot?
▸ What are the top non-canonical pages being crawled?
▸ What are the most crawled parameters?
▸ Which directories have the most 404 error codes?
▸ Which pages are crawled with and without parameters?
87
https://www.slideshare.net/DominicWoodman/a-guide-to-log-analysis-with-big-query
@areej_abuali
88
Learn more SQL!
https://www.codecademy.com/catalog/language/sql
@areej_abuali
89
More great resources/courses
▸ Coursera - From Data to Insights with Google Cloud
▸ QwikLabs - BigQuery for Marketing Analysts
▸ Coding is for Losers - Learning BigQuery SQL
▸ OnCrawl - Why SEOs Should Ditch Excel & Learn SQL
▸ Book - Google BigQuery: The Definitive Guide
▸ Google - BigQuery Documentation
▸ Google - BigQuery Cookbook
@areej_abuali
A few final points...
90
Try out every random SQL query
you come across
(and create a library of saved queries)
@areej_abuali
A few final points...
91
Mash up different datasets
together
(it helps answer tons of questions)
@areej_abuali
A few final points...
92
Share cool things you learn with
the rest of us
(and don’t worry about that one idiot on
Twitter who labels it as ‘old news’)
@areej_abuali
A few final points...
93
It’s okay to feel overwhelmed learning
something new
(maybe in 5 months you’ll be giving a talk
about it too!)
@areej_abuali
94
THANKS!
Any Questions?
▸ @areej_abuali
▸ linkedin.com/in/areejabuali

[LondonSEO 2020] BigQuery & SQL for SEOs