Chief Software Architect at Cloud Works
(Teamwork Commerce)
Google Developers Expert
Cloud Champion Innovator
GDG Cloud Kharkiv Organized
Certified Google Cloud Architect
Artem
Nikulchenko
A data warehouse is an enterprise system used for the analysis and
reporting of structured and semi-structured data from multiple sources,
such as point-of-sale transactions, marketing automation, customer
relationship management, and more. A data warehouse is suited for ad hoc
analysis as well custom reporting. A data warehouse can store both current
and historical data in one place and is designed to give a long-range view of
data over time, making it a primary component of business intelligence.
Wikipedia
What is Data Warehouse?
Do you need a Data Warehouse?
Do you need a Data Warehouse?
! Reports are running too slow
Do you need a Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
Do you need a Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
! Data is dispersed across multiple DB (and some not even in DB…)
Do you need a Data Warehouse?
! Reports are running too slow
! Reports interfere with transactional workflows
! Data is dispersed across multiple DB (and some not even in DB…)
! System accumulates a lot of historical data that is not needed for day-to-day workflow
None of the above - are you sure you need a DW?
! You just need a reporting tool?
DataStudio (GCP), Looker (GCP), Tableau etc.
! Your reports are a little slow?
Have you tried ROLAP?
! All your data in PostgreSQL?
There is a surprise at the end of talk for you!
star schema is the simplest style of data
mart schema and is the approach most
widely used to develop data warehouses
and dimensional data marts.
Star-schema
Product
Dimension
Product ID
Product Name
Product Category
Unit Price
Customer
Dimension
Customer ID
Customer Name
Address
City
Zip
Time
Dimension
Order ID
Order Date
Year
Quarter
Month
SALES
Product ID
Order ID
Customer ID
Employer ID
Total
Quantity
Discount
Emp
Dimension
Emp ID
Emp Name
Title
Department
Region
Traditional Data
Warehouse
Extract-Transform-Load (ETL)
! Extract data from sources
! Transform in intermediate tool
! Load into Data Warehouse DB
Data
Warehouse
Data
Sources
Flat
Files
JSON
Files
Cloud
Sources
Extract
Transform
Load
Traditional Data
Warehouse
What are the issues?
! High upfront cost
! High maintenance cost
! Complex ETL process
! Proprietary query language
! No automated scaling
Cloud Data Warehouse
What is the difference?
! No upfront costs (pay-per-usage)
! Fully managed service
! Automatic scaling (due to storage
and compute separation)
! ELT instead of ETL (done in SQL)
! Support of a standard SQL dialect
Google BigQuery
Petabyte scale multi-cloud DW
! Dremel: The Execution Engine
! Colossus: Distributed Storage
! Borg: Compute
! Jupiter: The Network
Google BigQuery
Petabyte scale multi-cloud DW
! Take all Wikipedia views in 2022
Google BigQuery
Petabyte scale multi-cloud DW
! Take all Wikipedia views in 2022
! Wonder what are the most popular
pages
Google BigQuery
Petabyte scale multi-cloud DW
! Take all Wikipedia views in 2022
! Wonder what are the most popular
pages
! Get result within a minute
ELT process
Use
Load
Extract Transform
Moving data into BigQuery
Extract
Load
! BigQuery Data Transfer Service
Moving data into BigQuery
Google Software as a Service (SaaS) apps:
! Campaign Manager
! Cloud Storage
! Google Ad Manager
! Google Ads
! Google Merchant Center (beta)
! Google Play
! Search Ads 360 (beta)
! YouTube Channel reports
! YouTube Content Owner reports
External cloud storage providers:
! Amazon S3
Data warehouses:
! Teradata
! Amazon Redshift
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
! BigQuery Omni
Moving data into BigQuery
! BigQuery Data Transfer Service
! Federated query (for PSQL and
MySQL)
! Cloud Data Fusion
! AirBite (open source ELT tool)
! Existing ETL tools
! BigQuery Omni
! Custom solution using BQ API
Extract
Load
Moving data into BigQuery
Things to think about
?
! Preparing you data
○ PK
○ Data-modification column
Things to think about
! Preparing you data
! Batch vs Streaming Import
Things to think about
! Preparing you data
! Batch vs Streaming Import
! Handling Data Modifications
○ Update instantly (not a good idea)
○ Batch update
○ Views (or Materialized Views)
○ …mixed
Things to think about
! Scheduled query
! CloudTasks
! Composer (AirFlow)
Massaging data in BigQuery
Transform
! DataStudio
! Looker
! ML models (BQ ML or Vertex AI)
! …or any other tool your like
Using data in BigQuery
Use
Teamwork Example
Google BigQuery
! Embeded ML and predictive modeling
! Interactive data analysis with BI Engine
! Multicloud data analysis with BQ Omni
! Federated query and logical DW
Tons of cool features:
Bonus: AlloyDB
! Fully compatible with PostgreSQL,
providing flexibility and true portability for
your workloads
! Superior performance, 4X faster than
standard PostgreSQL for transactional
workloads
! Fast, real-time insights, up to 100X
faster analytical queries than standard
PostgreSQL
A fully managed PostgreSQL-compatible
database service for your most demanding
enterprise database workloads.
https://cloud.google.com/alloydb
Bonus: AlloyDB
Долучайтеся
PayPal: nikulchenko@gmail.com
Revolut: https://revolut.me/artemwvzv
Карта: 5375 4141 2884 6630
Тазики – займаються автівками для ЗСУ
Передали вже більше 170 “тазиків”. Газуємо далі!
ТГ: https://t.me/rooh_uk
Thank You!
Artem Nikulchenko
https://www.linkedin.com/in/artem-nikulchenko/
https://medium.com/@an_14796

"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko

  • 2.
    Chief Software Architectat Cloud Works (Teamwork Commerce) Google Developers Expert Cloud Champion Innovator GDG Cloud Kharkiv Organized Certified Google Cloud Architect Artem Nikulchenko
  • 3.
    A data warehouseis an enterprise system used for the analysis and reporting of structured and semi-structured data from multiple sources, such as point-of-sale transactions, marketing automation, customer relationship management, and more. A data warehouse is suited for ad hoc analysis as well custom reporting. A data warehouse can store both current and historical data in one place and is designed to give a long-range view of data over time, making it a primary component of business intelligence. Wikipedia What is Data Warehouse?
  • 4.
    Do you needa Data Warehouse?
  • 5.
    Do you needa Data Warehouse? ! Reports are running too slow
  • 6.
    Do you needa Data Warehouse? ! Reports are running too slow ! Reports interfere with transactional workflows
  • 7.
    Do you needa Data Warehouse? ! Reports are running too slow ! Reports interfere with transactional workflows ! Data is dispersed across multiple DB (and some not even in DB…)
  • 8.
    Do you needa Data Warehouse? ! Reports are running too slow ! Reports interfere with transactional workflows ! Data is dispersed across multiple DB (and some not even in DB…) ! System accumulates a lot of historical data that is not needed for day-to-day workflow
  • 9.
    None of theabove - are you sure you need a DW? ! You just need a reporting tool? DataStudio (GCP), Looker (GCP), Tableau etc. ! Your reports are a little slow? Have you tried ROLAP? ! All your data in PostgreSQL? There is a surprise at the end of talk for you!
  • 10.
    star schema isthe simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. Star-schema Product Dimension Product ID Product Name Product Category Unit Price Customer Dimension Customer ID Customer Name Address City Zip Time Dimension Order ID Order Date Year Quarter Month SALES Product ID Order ID Customer ID Employer ID Total Quantity Discount Emp Dimension Emp ID Emp Name Title Department Region
  • 11.
    Traditional Data Warehouse Extract-Transform-Load (ETL) !Extract data from sources ! Transform in intermediate tool ! Load into Data Warehouse DB Data Warehouse Data Sources Flat Files JSON Files Cloud Sources Extract Transform Load
  • 12.
    Traditional Data Warehouse What arethe issues? ! High upfront cost ! High maintenance cost ! Complex ETL process ! Proprietary query language ! No automated scaling
  • 13.
    Cloud Data Warehouse Whatis the difference? ! No upfront costs (pay-per-usage) ! Fully managed service ! Automatic scaling (due to storage and compute separation) ! ELT instead of ETL (done in SQL) ! Support of a standard SQL dialect
  • 14.
    Google BigQuery Petabyte scalemulti-cloud DW ! Dremel: The Execution Engine ! Colossus: Distributed Storage ! Borg: Compute ! Jupiter: The Network
  • 15.
    Google BigQuery Petabyte scalemulti-cloud DW ! Take all Wikipedia views in 2022
  • 16.
    Google BigQuery Petabyte scalemulti-cloud DW ! Take all Wikipedia views in 2022 ! Wonder what are the most popular pages
  • 17.
    Google BigQuery Petabyte scalemulti-cloud DW ! Take all Wikipedia views in 2022 ! Wonder what are the most popular pages ! Get result within a minute
  • 18.
  • 19.
    Moving data intoBigQuery Extract Load
  • 20.
    ! BigQuery DataTransfer Service Moving data into BigQuery Google Software as a Service (SaaS) apps: ! Campaign Manager ! Cloud Storage ! Google Ad Manager ! Google Ads ! Google Merchant Center (beta) ! Google Play ! Search Ads 360 (beta) ! YouTube Channel reports ! YouTube Content Owner reports External cloud storage providers: ! Amazon S3 Data warehouses: ! Teradata ! Amazon Redshift
  • 21.
    ! BigQuery DataTransfer Service ! Federated query (for PSQL and MySQL) Moving data into BigQuery
  • 22.
    ! BigQuery DataTransfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion Moving data into BigQuery
  • 23.
    ! BigQuery DataTransfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion ! AirBite (open source ELT tool) Moving data into BigQuery
  • 24.
    ! BigQuery DataTransfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion ! AirBite (open source ELT tool) ! Existing ETL tools Moving data into BigQuery
  • 25.
    ! BigQuery DataTransfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion ! AirBite (open source ELT tool) ! Existing ETL tools ! BigQuery Omni Moving data into BigQuery
  • 26.
    ! BigQuery DataTransfer Service ! Federated query (for PSQL and MySQL) ! Cloud Data Fusion ! AirBite (open source ELT tool) ! Existing ETL tools ! BigQuery Omni ! Custom solution using BQ API Extract Load Moving data into BigQuery
  • 27.
  • 28.
    ! Preparing youdata ○ PK ○ Data-modification column Things to think about
  • 29.
    ! Preparing youdata ! Batch vs Streaming Import Things to think about
  • 30.
    ! Preparing youdata ! Batch vs Streaming Import ! Handling Data Modifications ○ Update instantly (not a good idea) ○ Batch update ○ Views (or Materialized Views) ○ …mixed Things to think about
  • 31.
    ! Scheduled query !CloudTasks ! Composer (AirFlow) Massaging data in BigQuery Transform
  • 32.
    ! DataStudio ! Looker !ML models (BQ ML or Vertex AI) ! …or any other tool your like Using data in BigQuery Use
  • 33.
  • 34.
    Google BigQuery ! EmbededML and predictive modeling ! Interactive data analysis with BI Engine ! Multicloud data analysis with BQ Omni ! Federated query and logical DW Tons of cool features:
  • 35.
    Bonus: AlloyDB ! Fullycompatible with PostgreSQL, providing flexibility and true portability for your workloads ! Superior performance, 4X faster than standard PostgreSQL for transactional workloads ! Fast, real-time insights, up to 100X faster analytical queries than standard PostgreSQL A fully managed PostgreSQL-compatible database service for your most demanding enterprise database workloads. https://cloud.google.com/alloydb
  • 36.
  • 37.
    Долучайтеся PayPal: nikulchenko@gmail.com Revolut: https://revolut.me/artemwvzv Карта:5375 4141 2884 6630 Тазики – займаються автівками для ЗСУ Передали вже більше 170 “тазиків”. Газуємо далі! ТГ: https://t.me/rooh_uk
  • 38.