In this talk I will walk through multiple tools/resources available to help you handle large datasets from log files to Google Analytics. These new techniques will empower you to find more valuable insights and help you avoid the annoyance of crashing Excel spreadsheets.
2. What we’ll walk through
● Brief intro to SQL and BigQuery
2
@haydenroche3
#BristolSEO
3. What we’ll walk through
● Brief intro to SQL and BigQuery
● Merging large data sets and the power you’ll gain
3
@haydenroche3
#BristolSEO
4. What we’ll walk through
● Brief intro to SQL and BigQuery
● Merging large data sets and the power you’ll gain
● Projects that YOU can do with “minimal-ish”
analytical effort
4
@haydenroche3
#BristolSEO
8. Joining Data Sources w/ SQL
8
@haydenroche3
#BristolSEO
http://www.sql-join.com/sql-join-types
9. FREE SQL Resources
9
1. https://www.w3schools.com/sql/default.asp
2. https://www.udemy.com/topic/sql/free/
3. https://www.codecademy.com/learn/learn-sql
4. Your colleagues and technical SEO community
@haydenroche3
#BristolSEO
10. BigQuery and SQL for SEOs
10
@haydenroche3
#BristolSEO
linkedin.com/in/areejabuali/
@areej_abuali
Find Areej’s slides here!
12. 12
“My website has 10K → 100K → 1M → 10M pages.
Doing analysis takes a decade without a data
expert.”
“Impossible” Technical SEO
@haydenroche3
#BristolSEO
13. 13
“Two stars in the universe have the same chance of
colliding as 4 bumblebees randomly flying around
the United States.”
Seemingly Impossible Statistic
@haydenroche3
#BristolSEO
14. 14
“My website has 10K → 100K → 1M → 10M pages.
Doing analysis takes a decade without a data
expert.”
Not So “Impossible” Technical SEO
@haydenroche3
#BristolSEO
15. Case Study:
15
“StockX is the world’s
first stock market for
things – a live ‘bid/ask’
marketplace.”
@haydenroche3
#BristolSEO
16. Starting with a Crawl
16
Find all the URLs!
Over 100K found
URLs in a starter
crawl
@haydenroche3
#BristolSEO
29. Adding in Google Analytics
29
Option 1:
Export Data
From UI
Pros: Easy, Quick
Cons: Manual
@haydenroche3
#BristolSEO
30. Adding in Google Analytics
30
Option 1:
Export Data
From UI
Pros: Easy, Quick
Cons: Manual
Option 2:
BigQuery + GA Data
Integration
Pros: Automated & granular
Cons: API setup knowledge*
@haydenroche3
#BristolSEO
35. Joining SF & GA
35
@haydenroche3
SELECT
sf.URL,
sum(ga.Sessions) AS Sessions,
sum(ga.Revenue) AS Revenue,
round(ga.Revenue/ga.Sessions, 2) AS RPS
FROM
`techseo.site_audit.screaming_frog_crawl` sf
LEFT JOIN `techseo.site_audit.google_analytics` ga ON (sf.URL = ga.Landing_Page)
GROUP BY
sf.URL
#BristolSEO
36. Joining SF & GA
36
@haydenroche3
SELECT
sf.URL,
sum(ga.Sessions) AS Sessions,
sum(ga.Revenue) AS Revenue,
round(ga.Revenue/ga.Sessions, 2) AS RPS
FROM
`techseo.site_audit.screaming_frog_crawl` sf
LEFT JOIN
`techseo.site_audit.google_analytics` ga ON
(sf.URL = ga.Landing_Page)
GROUP BY
sf.URL
#BristolSEO
sf.URL ga.Sessions ga.Revenue ga.RPS
/ 800,000 £576,000 £0.72
/supreme 29,000 £11,165 £0.39
/supreme/jackets 6,700 £11,759 £1.76
/supreme-duffle-bag-
ss18-black
4,400 £19,360 £4.40
/nike/airmax/95 18,800 £3,055 £0.16
40. Joining SF, GA & GSC
40
SELECT
sf.URL,
sum(ga.Sessions) AS Sessions,
sum(ga.Revenue) AS Revenue,
count(gsc.Query) AS Ranking_KWs,
avg(gsc.Position) AS Avg_Pos
FROM `techseo.site_audit.screaming_frog_crawl` sf
LEFT JOIN `techseo.site_audit.google_analytics` ga ON (sf.URL = ga.Landing_Page)
LEFT JOIN `techseo.site_audit.google_search_console` gsc ON (sf.URL = gsc.Landing_Page)
GROUP BY
sf.URL
@haydenroche3
#BristolSEO
55. Site Consolidation
55
@haydenroche3
CASE
WHEN URL like ‘/adidas%’ THEN ‘Adidas’
WHEN URL like ‘/palace%’ THEN ‘Palace’
WHEN URL like ‘/supreme%’ THEN ‘Supreme’
WHEN URL like ‘/jordan%’ THEN ‘Air Jordan’
ELSE ‘’
END as Brand
#BristolSEO
Wildcard (%)
allows any
character/num
ber to follow
‘/adidas’
58. Site Consolidation
58
@haydenroche3
Landing_Page Brand
/adidas-yeezy-boost-350-v2-black Adidas
/adidas-yeezy-boost-350-v2-core-black-green Adidas
/palace-pro-tool-t-shirt-black Palace
/palace-pro-tool-t-shirt-white Palace
/supreme-waist-bag-ss19-black Supreme
/supreme-waist-bag-ss19-light-blue Supreme
/jordan-4-retro-black-cement-2012 Air Jordan
/jordan-4-retro-white-cement-2012 Air Jordan
#BristolSEO
59. Site Consolidation
59
@haydenroche3
Landing_Page Brand Type
/adidas-yeezy-boost-350-v2-black Adidas Shoes
/adidas-yeezy-boost-350-v2-core-black-green Adidas Shoes
/palace-pro-tool-t-shirt-black Palace T-Shirt
/palace-pro-tool-t-shirt-white Palace T-Shirt
/supreme-waist-bag-ss19-black Supreme Bag
/supreme-waist-bag-ss19-light-blue Supreme Bag
/jordan-4-retro-black-cement-2012 Air Jordan Shoes
/jordan-4-retro-white-cement-2012 Air Jordan Shoes
#BristolSEO
60. Site Consolidation
60
@haydenroche3
Landing_Page Brand Type Colour
/adidas-yeezy-boost-350-v2-black Adidas Shoes Black
/adidas-yeezy-boost-350-v2-core-black-green Adidas Shoes Black-Green
/palace-pro-tool-t-shirt-black Palace T-Shirt Black
/palace-pro-tool-t-shirt-white Palace T-Shirt White
/supreme-waist-bag-ss19-black Supreme Bag Black
/supreme-waist-bag-ss19-light-blue Supreme Bag Light-Blue
/jordan-4-retro-black-cement-2012 Air Jordan Shoes Black
/jordan-4-retro-white-cement-2012 Air Jordan Shoes White
#BristolSEO
69. What Have We Learned?
69
● Large CSVs can live in BigQuery, not your desktop
@haydenroche3
#BristolSEO
70. What Have We Learned?
70
● Large CSVs can live in BigQuery, not your desktop
● The power of the LEFT JOIN
@haydenroche3
#BristolSEO
71. What Have We Learned?
71
● Large CSVs can live in BigQuery, not your desktop
● The power of the LEFT JOIN
● CASE statements and data groupings
@haydenroche3
#BristolSEO
72. What Have We Learned?
72
● Large CSVs can live in BigQuery, not your desktop
● The power of the LEFT JOIN
● CASE statements and data groupings
● How to find quick insights from combining data sources
@haydenroche3
#BristolSEO
73. Additional Resources
73
1. Uploading CSVs to BigQuery →
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv
2. BigQuery Pricing Scheme →
https://cloud.google.com/bigquery/pricing
3. Set up BigQuery export →
https://support.google.com/analytics/answer/3416092
4. Creating a storage bucket in GCS →
https://cloud.google.com/storage/docs/creating-buckets
@haydenroche3
#BristolSEO