In this talk, we would explore available options for building Data Warehouse for data-oriented business using Google Cloud Platform. We will start by discussing why Data Warehouse can be needed, move to the differences between "traditional" and Cloud Data Warehouses, and finally discuss steps and options for building your own Data Warehouse.
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
1.
2. Chief Software Architect at Cloud Works
(Teamwork Commerce)
Google Developers Expert
Cloud Champion Innovator
GDG Cloud Kharkiv Organized
Certified Google Cloud Architect
Artem
Nikulchenko
3. A data warehouse is an enterprise system used for the analysis and
reporting of structured and semi-structured data from multiple sources,
such as point-of-sale transactions, marketing automation, customer
relationship management, and more. A data warehouse is suited for ad hoc
analysis as well custom reporting. A data warehouse can store both current
and historical data in one place and is designed to give a long-range view of
data over time, making it a primary component of business intelligence.
Wikipedia
What is Data Warehouse?
5. Do you need a Data Warehouse?
! Reports are running too slow
6. Do you need a Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
7. Do you need a Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
! Data is dispersed across multiple DB (and some not even in DB…)
8. Do you need a Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
! Data is dispersed across multiple DB (and some not even in DB…)
! System accumulates a lot of historical data that is not needed for day-to-day workflow
9. None of the above - are you sure you need a DW?
! You just need a reporting tool?
DataStudio (GCP), Looker (GCP), Tableau etc.
! Your reports are a little slow?
Have you tried ROLAP?
! All your data in PostgreSQL?
There is a surprise at the end of talk for you!
10. star schema is the simplest style of data
mart schema and is the approach most
widely used to develop data warehouses
and dimensional data marts.
Star-schema
Product
Dimension
Product ID
Product Name
Product Category
Unit Price
Customer
Dimension
Customer ID
Customer Name
Address
City
Zip
Time
Dimension
Order ID
Order Date
Year
Quarter
Month
SALES
Product ID
Order ID
Customer ID
Employer ID
Total
Quantity
Discount
Emp
Dimension
Emp ID
Emp Name
Title
Department
Region
11. Traditional Data
Warehouse
Extract-Transform-Load (ETL)
! Extract data from sources
! Transform in intermediate tool
! Load into Data Warehouse DB
Data
Warehouse
Data
Sources
Flat
Files
JSON
Files
Cloud
Sources
Extract
Transform
Load
12. Traditional Data
Warehouse
What are the issues?
! High upfront cost
! High maintenance cost
! Complex ETL process
! Proprietary query language
! No automated scaling
13. Cloud Data Warehouse
What is the difference?
! No upfront costs (pay-per-usage)
! Fully managed service
! Automatic scaling (due to storage
and compute separation)
! ELT instead of ETL (done in SQL)
! Support of a standard SQL dialect
14. Google BigQuery
Petabyte scale multi-cloud DW
! Dremel: The Execution Engine
! Colossus: Distributed Storage
! Borg: Compute
! Jupiter: The Network
16. Google BigQuery
Petabyte scale multi-cloud DW
! Take all Wikipedia views in 2022
! Wonder what are the most popular
pages
17. Google BigQuery
Petabyte scale multi-cloud DW
! Take all Wikipedia views in 2022
! Wonder what are the most popular
pages
! Get result within a minute
20. ! BigQuery Data Transfer Service
Moving data into BigQuery
Google Software as a Service (SaaS) apps:
! Campaign Manager
! Cloud Storage
! Google Ad Manager
! Google Ads
! Google Merchant Center (beta)
! Google Play
! Search Ads 360 (beta)
! YouTube Channel reports
! YouTube Content Owner reports
External cloud storage providers:
! Amazon S3
Data warehouses:
! Teradata
! Amazon Redshift
21. ! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
Moving data into BigQuery
22. ! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
Moving data into BigQuery
23. ! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
Moving data into BigQuery
24. ! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
Moving data into BigQuery
25. ! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
! BigQuery Omni
Moving data into BigQuery
26. ! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
! BigQuery Omni
! Custom solution using BQ API
Extract
Load
Moving data into BigQuery
28. ! Preparing you data
○ PK
○ Data-modification column
Things to think about
29. ! Preparing you data
! Batch vs Streaming Import
Things to think about
30. ! Preparing you data
! Batch vs Streaming Import
! Handling Data Modifications
○ Update instantly (not a good idea)
○ Batch update
○ Views (or Materialized Views)
○ …mixed
Things to think about
31. ! Scheduled query
! CloudTasks
! Composer (AirFlow)
Massaging data in BigQuery
Transform
32. ! DataStudio
! Looker
! ML models (BQ ML or Vertex AI)
! …or any other tool your like
Using data in BigQuery
Use
34. Google BigQuery
! Embeded ML and predictive modeling
! Interactive data analysis with BI Engine
! Multicloud data analysis with BQ Omni
! Federated query and logical DW
Tons of cool features:
35. Bonus: AlloyDB
! Fully compatible with PostgreSQL,
providing flexibility and true portability for
your workloads
! Superior performance, 4X faster than
standard PostgreSQL for transactional
workloads
! Fast, real-time insights, up to 100X
faster analytical queries than standard
PostgreSQL
A fully managed PostgreSQL-compatible
database service for your most demanding
enterprise database workloads.
https://cloud.google.com/alloydb