"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
The document discusses the concept of a data warehouse, its importance in business intelligence, and its role in handling structured and semi-structured data. It highlights the differences between traditional and cloud data warehouses, specifically mentioning Google BigQuery's features and advantages. Additionally, it touches on data loading methods, management tools, and the performance benefits of using AlloyDB for enterprise database workloads.
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
2.
Chief Software Architectat Cloud Works
(Teamwork Commerce)
Google Developers Expert
Cloud Champion Innovator
GDG Cloud Kharkiv Organized
Certified Google Cloud Architect
Artem
Nikulchenko
3.
A data warehouseis an enterprise system used for the analysis and
reporting of structured and semi-structured data from multiple sources,
such as point-of-sale transactions, marketing automation, customer
relationship management, and more. A data warehouse is suited for ad hoc
analysis as well custom reporting. A data warehouse can store both current
and historical data in one place and is designed to give a long-range view of
data over time, making it a primary component of business intelligence.
Wikipedia
What is Data Warehouse?
Do you needa Data Warehouse?
! Reports are running too slow
6.
Do you needa Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
7.
Do you needa Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
! Data is dispersed across multiple DB (and some not even in DB…)
8.
Do you needa Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
! Data is dispersed across multiple DB (and some not even in DB…)
! System accumulates a lot of historical data that is not needed for day-to-day workflow
9.
None of theabove - are you sure you need a DW?
! You just need a reporting tool?
DataStudio (GCP), Looker (GCP), Tableau etc.
! Your reports are a little slow?
Have you tried ROLAP?
! All your data in PostgreSQL?
There is a surprise at the end of talk for you!
10.
star schema isthe simplest style of data
mart schema and is the approach most
widely used to develop data warehouses
and dimensional data marts.
Star-schema
Product
Dimension
Product ID
Product Name
Product Category
Unit Price
Customer
Dimension
Customer ID
Customer Name
Address
City
Zip
Time
Dimension
Order ID
Order Date
Year
Quarter
Month
SALES
Product ID
Order ID
Customer ID
Employer ID
Total
Quantity
Discount
Emp
Dimension
Emp ID
Emp Name
Title
Department
Region
11.
Traditional Data
Warehouse
Extract-Transform-Load (ETL)
!Extract data from sources
! Transform in intermediate tool
! Load into Data Warehouse DB
Data
Warehouse
Data
Sources
Flat
Files
JSON
Files
Cloud
Sources
Extract
Transform
Load
12.
Traditional Data
Warehouse
What arethe issues?
! High upfront cost
! High maintenance cost
! Complex ETL process
! Proprietary query language
! No automated scaling
13.
Cloud Data Warehouse
Whatis the difference?
! No upfront costs (pay-per-usage)
! Fully managed service
! Automatic scaling (due to storage
and compute separation)
! ELT instead of ETL (done in SQL)
! Support of a standard SQL dialect
14.
Google BigQuery
Petabyte scalemulti-cloud DW
! Dremel: The Execution Engine
! Colossus: Distributed Storage
! Borg: Compute
! Jupiter: The Network
! BigQuery DataTransfer Service
Moving data into BigQuery
Google Software as a Service (SaaS) apps:
! Campaign Manager
! Cloud Storage
! Google Ad Manager
! Google Ads
! Google Merchant Center (beta)
! Google Play
! Search Ads 360 (beta)
! YouTube Channel reports
! YouTube Content Owner reports
External cloud storage providers:
! Amazon S3
Data warehouses:
! Teradata
! Amazon Redshift
21.
! BigQuery DataTransfer Service
! Federated query (for PSQL and
MySQL)
Moving data into BigQuery
22.
! BigQuery DataTransfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
Moving data into BigQuery
23.
! BigQuery DataTransfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
Moving data into BigQuery
24.
! BigQuery DataTransfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
Moving data into BigQuery
25.
! BigQuery DataTransfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
! BigQuery Omni
Moving data into BigQuery
26.
! BigQuery DataTransfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
! BigQuery Omni
! Custom solution using BQ API
Extract
Load
Moving data into BigQuery
! Preparing youdata
○ PK
○ Data-modification column
Things to think about
29.
! Preparing youdata
! Batch vs Streaming Import
Things to think about
30.
! Preparing youdata
! Batch vs Streaming Import
! Handling Data Modifications
○ Update instantly (not a good idea)
○ Batch update
○ Views (or Materialized Views)
○ …mixed
Things to think about
31.
! Scheduled query
!CloudTasks
! Composer (AirFlow)
Massaging data in BigQuery
Transform
32.
! DataStudio
! Looker
!ML models (BQ ML or Vertex AI)
! …or any other tool your like
Using data in BigQuery
Use
Google BigQuery
! EmbededML and predictive modeling
! Interactive data analysis with BI Engine
! Multicloud data analysis with BQ Omni
! Federated query and logical DW
Tons of cool features:
35.
Bonus: AlloyDB
! Fullycompatible with PostgreSQL,
providing flexibility and true portability for
your workloads
! Superior performance, 4X faster than
standard PostgreSQL for transactional
workloads
! Fast, real-time insights, up to 100X
faster analytical queries than standard
PostgreSQL
A fully managed PostgreSQL-compatible
database service for your most demanding
enterprise database workloads.
https://cloud.google.com/alloydb