SlideShare a Scribd company logo
1 of 38
Download to read offline
BigQuerybestpractices and recommendations
toreducecosts
with BI Engine, Slots, Materialized Views
Devfest Nantes, October 2022
Márton Kodok
Google Developer Expert at REEA.net
● Among the Top 3 romanians on Stackoverflow 201k reputation
● Google Developer Expert on Cloud technologies (2016→)
● Champion of Google Cloud Innovators program (2021→)
● Crafting Web/Mobile backends at REEA.net
Articles: martonkodok.medium.com
Twitter: @martonkodok
Slideshare: martonkodok
StackOverflow: pentium10
GitHub: pentium10
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
About me
1. Looking at a BigQuery billing report
2. What is BI Engine?
3. Obtaining per job billing stats
4. Enable and use BI Engine reservations
5. Using Cloud Workflows to orchestrate the right capacity
6. Lower bills and faster queries on Data Studio, BigQuery
7. Conclusions, articles
Agenda
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Looking at a BigQuery billing report
@martonkodok
Reduce BigQuery bills with BI Engine capacity orchestration
Article: https://medium.com/p/9e2634c84a82 @martonkodok
Cloud Workflows automating the BI Engine capacity size
@martonkodok
@martonkodok
What is
BI Engine?
Part #2
“ BIEngine is a fast, in-memory analysis service
that integrates out of the box
with BigQuery, DataStudio, Looker,Tableau,PowerBI
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
What is BiEngine?
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
BIEngine architecture
1. Its a cache plugin to BigQuery- a manageddistributed in-memoryexecutionengine
2. BI Engine reservations manage the memoryallocationattheprojectbillinglevel.
3. cachesonlycolumnsandpartitionsthatarequeriedorscanned. It does not cache the whole table.
4. Any BI solution or custom application that works with the BigQuery API
such as REST or JDBC and ODBC drivers canuseBIEnginewithoutanychanges.
What does out-of-the-box means?
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Free-for-all
1TB free each month
On-demand
queries $5/TB, storage: $20/TB
Flat rate reservation slots
average $4 per hour,
best is $1700 for 100 slots (Yrl plan)
BigQuery ML excluded from this table.
Cost components in BigQuery and BI Engine
@martonkodok
BI Engine
$0.0416 per GB/hour
($30.36 per GB/month)
Part #3
Orchestrating the
capacity size
“The aim is to dynamically adjust the size of
the BIEngine to get the lowest combined cost
of BigQuery and BI Engine.
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
1. Obtain the cost of your on-demand BigQuery usage
2. Set the BI Engine capacity in steps
3. Have a real-time sense of the savings todrive capacity automation up/down
4. Monitor the applied settings for optimal savings
Prerequisite:
Access to INFORMATION_SCHEMA or Auditlogs exported to BigQuery (historically better)
Biggest challenges
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
The query to get the recent costs for each job
Article: https://medium.com/p/9e2634c84a82 @martonkodok
The query to get the recent costs for each job
Article: https://medium.com/p/9e2634c84a82 @martonkodok
1. The query uses a flat rate of 5 USD to calculate the cost
2. At this point no optimization is in place, as the two columns are the same
BigQuery savings based on billed vs processed bytes
Article: https://medium.com/p/9e2634c84a82 @martonkodok
BigQuery savings when BI Engine is properly sized
Article: https://medium.com/p/9e2634c84a82 @martonkodok
1. BI Engine capacity resize needs 5 minute to propagate
2. Savings are calculated yielding lower billed bytes than processed bytes
Optimize BI Engine
effectiveness
Part #4
1. BI engine capacity might be too small
2. bq queries are too complex
BI Engine turned on - but ineffective
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
“ Not all BigQueryqueries are accelerated.
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
1. Detailed statistics on BI Engine are available through the job statistics API
2. bq command-line tool to fetch job statistics
Acceleration statistics
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
1. bq show --format=prettyjson -j job_id
"statistics": {
"creationTime": "1602175128902",
"endTime": "1602175130700",
"query": {
"biEngineStatistics": {
"biEngineMode": "DISABLED",
"biEngineReasons": [
{
"code": "UNSUPPORTED_SQL_TEXT",
"message": "Detected unsupported join type"
}
]
},
Acceleration statistics
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Use INFORMATION_SCHEMA to get acceleration statistics
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
1. Investigate queries that have BI Engine acceleration reported as disabled, partial
2. Rewrite queries to perform better under BI Engine optimizer
3. Use materialized views to join and flatten data to optimize their structure for BI Engine
4. Create short lived (5m, 15m, 1h) temporary tables to improve caching efficiency
5. Increase the size of the BI Engine reservation until effective use
6. Use Cloud Workflow and business logic to automate the size based on workload during the day
To have effective BI Engine acceleration
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Leverage temporary, dedicated business scope tables
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Dedicated table for scope
Use a scheduler to recreate
every 5m/15m/1h
Leverage clustering
Use Materialized Views to get latest rows from append-only tables
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Trick to get latest row using
Materialized Views
2nd view to get rid of the
arrays
@martonkodok
Orchestrating the
capacity size
Part #5
Cloud Workflows automating the BI Engine capacity size
@martonkodok
Cloud Workflows automating the BI Engine capacity size
Article: https://medium.com/p/9e2634c84a82 @martonkodok
BigQuery savings when BI Engine is properly sized
Article: https://medium.com/p/9e2634c84a82 @martonkodok
1. BI Engine capacity resize needs 5 minute to propagate
2. Savings are calculated yielding lower billed bytes than processed bytes
1. Reads the output of the effectiveness of billed vs processed bytes query
2. Based on benefits margin map the step of the increase eg: 5GB step, 1GB step, 0.5GB step
3. Have a math of the evaluation, how far you can stretch by increasing the BI Engine to have the benefits
4. Capacity mapping over office hours for more capacity, and lower capacity during the night.
5. Leverage BigQuery ML to write a time-series forecast prediction based on historical data to actually drive
the best BI Engine capacity for the “hour slot”.
6. Stop increasing the capacity when the rationale of the savings costs more than the benefits.
Cloud Workflow automation logic
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Reduce BigQuery bills with BI Engine capacity orchestration
Article: https://medium.com/p/9e2634c84a82 @martonkodok
Data Studio aspects
Article: https://medium.com/p/9e2634c84a82 @martonkodok
1. Accelerated by BigQuery Engine icon
2. Faster dashboards
Cloud Monitoring
Article: https://medium.com/p/9e2634c84a82 @martonkodok
1. create a chart plotting the bigquerybiengine.googleapis.com/reservation/used_bytes
2. over the bigquerybiengine.googleapis.com/reservation/total_bytes
Article on medium.com
@martonkodok
https://medium.com/p/9e2634c84a82
1. Easy out of box way to optimize costs of BigQuery
2. by turning out BI Engine, which does not need code changes.
3. Leverage INFORMATION_SCHEMA stats to see underperforming queries, try tooptimize them.
4. Automate the right capacity size by using Cloud Workflows
5. Save precious development time, lower bills, faster queries
Conclusions
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Thank you. Q&A.
Slides available on:
slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by creativity
to deliver projects.
Twitter: @martonkodok

More Related Content

What's hot

Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
Databricks
 

What's hot (20)

Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flow
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Druid+superset
Druid+supersetDruid+superset
Druid+superset
 
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettRetail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
 
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Use case and integration of ClickHouse with Apache Superset & Dremio
Use case and integration of ClickHouse with Apache Superset & DremioUse case and integration of ClickHouse with Apache Superset & Dremio
Use case and integration of ClickHouse with Apache Superset & Dremio
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 

Similar to BigQuery best practices and recommendations to reduce costs with BI Engine, Slots, Materialized Views

AwReporting Update
AwReporting UpdateAwReporting Update
AwReporting Update
marcwan
 
DevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryDevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQuery
Márton Kodok
 

Similar to BigQuery best practices and recommendations to reduce costs with BI Engine, Slots, Materialized Views (20)

Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analytics
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
Implementing google big query automation using google analytics data
Implementing google big query automation using google analytics dataImplementing google big query automation using google analytics data
Implementing google big query automation using google analytics data
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
 
AwReporting Update
AwReporting UpdateAwReporting Update
AwReporting Update
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery Webinar
 
Database performance improvement, a six sigma project (4 block) by nirav shah
Database performance improvement, a six sigma project (4 block) by nirav shah Database performance improvement, a six sigma project (4 block) by nirav shah
Database performance improvement, a six sigma project (4 block) by nirav shah
 
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in dayCherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
 
BigQuery for Beginners
BigQuery for BeginnersBigQuery for Beginners
BigQuery for Beginners
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)
 
DevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryDevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQuery
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
 
Streamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power AutomateStreamlining Workflows: Unleashing Automation with Azure and Power Automate
Streamlining Workflows: Unleashing Automation with Azure and Power Automate
 

More from Márton Kodok

More from Márton Kodok (20)

Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
 
Cloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerizationCloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerization
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Cloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automationCloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automation
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigVibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps Engineers
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud Platform
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youNext18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to you
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
 
GCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatásokGCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatások
 
GDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud PlatformGDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud Platform
 
Efikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsaEfikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsa
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companies
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 

BigQuery best practices and recommendations to reduce costs with BI Engine, Slots, Materialized Views

  • 1. BigQuerybestpractices and recommendations toreducecosts with BI Engine, Slots, Materialized Views Devfest Nantes, October 2022 Márton Kodok Google Developer Expert at REEA.net
  • 2. ● Among the Top 3 romanians on Stackoverflow 201k reputation ● Google Developer Expert on Cloud technologies (2016→) ● Champion of Google Cloud Innovators program (2021→) ● Crafting Web/Mobile backends at REEA.net Articles: martonkodok.medium.com Twitter: @martonkodok Slideshare: martonkodok StackOverflow: pentium10 GitHub: pentium10 Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok About me
  • 3. 1. Looking at a BigQuery billing report 2. What is BI Engine? 3. Obtaining per job billing stats 4. Enable and use BI Engine reservations 5. Using Cloud Workflows to orchestrate the right capacity 6. Lower bills and faster queries on Data Studio, BigQuery 7. Conclusions, articles Agenda Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 4. Looking at a BigQuery billing report @martonkodok
  • 5. Reduce BigQuery bills with BI Engine capacity orchestration Article: https://medium.com/p/9e2634c84a82 @martonkodok
  • 6. Cloud Workflows automating the BI Engine capacity size @martonkodok
  • 8. “ BIEngine is a fast, in-memory analysis service that integrates out of the box with BigQuery, DataStudio, Looker,Tableau,PowerBI Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok What is BiEngine?
  • 9. Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok BIEngine architecture
  • 10. 1. Its a cache plugin to BigQuery- a manageddistributed in-memoryexecutionengine 2. BI Engine reservations manage the memoryallocationattheprojectbillinglevel. 3. cachesonlycolumnsandpartitionsthatarequeriedorscanned. It does not cache the whole table. 4. Any BI solution or custom application that works with the BigQuery API such as REST or JDBC and ODBC drivers canuseBIEnginewithoutanychanges. What does out-of-the-box means? Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 11. Free-for-all 1TB free each month On-demand queries $5/TB, storage: $20/TB Flat rate reservation slots average $4 per hour, best is $1700 for 100 slots (Yrl plan) BigQuery ML excluded from this table. Cost components in BigQuery and BI Engine @martonkodok BI Engine $0.0416 per GB/hour ($30.36 per GB/month)
  • 13. “The aim is to dynamically adjust the size of the BIEngine to get the lowest combined cost of BigQuery and BI Engine. Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 14. 1. Obtain the cost of your on-demand BigQuery usage 2. Set the BI Engine capacity in steps 3. Have a real-time sense of the savings todrive capacity automation up/down 4. Monitor the applied settings for optimal savings Prerequisite: Access to INFORMATION_SCHEMA or Auditlogs exported to BigQuery (historically better) Biggest challenges Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 15. The query to get the recent costs for each job Article: https://medium.com/p/9e2634c84a82 @martonkodok
  • 16. The query to get the recent costs for each job Article: https://medium.com/p/9e2634c84a82 @martonkodok 1. The query uses a flat rate of 5 USD to calculate the cost 2. At this point no optimization is in place, as the two columns are the same
  • 17. BigQuery savings based on billed vs processed bytes Article: https://medium.com/p/9e2634c84a82 @martonkodok
  • 18. BigQuery savings when BI Engine is properly sized Article: https://medium.com/p/9e2634c84a82 @martonkodok 1. BI Engine capacity resize needs 5 minute to propagate 2. Savings are calculated yielding lower billed bytes than processed bytes
  • 20. 1. BI engine capacity might be too small 2. bq queries are too complex BI Engine turned on - but ineffective Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 21. “ Not all BigQueryqueries are accelerated. Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 22. 1. Detailed statistics on BI Engine are available through the job statistics API 2. bq command-line tool to fetch job statistics Acceleration statistics Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 23. 1. bq show --format=prettyjson -j job_id "statistics": { "creationTime": "1602175128902", "endTime": "1602175130700", "query": { "biEngineStatistics": { "biEngineMode": "DISABLED", "biEngineReasons": [ { "code": "UNSUPPORTED_SQL_TEXT", "message": "Detected unsupported join type" } ] }, Acceleration statistics Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 24. Use INFORMATION_SCHEMA to get acceleration statistics Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 25. 1. Investigate queries that have BI Engine acceleration reported as disabled, partial 2. Rewrite queries to perform better under BI Engine optimizer 3. Use materialized views to join and flatten data to optimize their structure for BI Engine 4. Create short lived (5m, 15m, 1h) temporary tables to improve caching efficiency 5. Increase the size of the BI Engine reservation until effective use 6. Use Cloud Workflow and business logic to automate the size based on workload during the day To have effective BI Engine acceleration Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 26. Leverage temporary, dedicated business scope tables Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok Dedicated table for scope Use a scheduler to recreate every 5m/15m/1h Leverage clustering
  • 27. Use Materialized Views to get latest rows from append-only tables Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok Trick to get latest row using Materialized Views 2nd view to get rid of the arrays
  • 29. Cloud Workflows automating the BI Engine capacity size @martonkodok
  • 30. Cloud Workflows automating the BI Engine capacity size Article: https://medium.com/p/9e2634c84a82 @martonkodok
  • 31. BigQuery savings when BI Engine is properly sized Article: https://medium.com/p/9e2634c84a82 @martonkodok 1. BI Engine capacity resize needs 5 minute to propagate 2. Savings are calculated yielding lower billed bytes than processed bytes
  • 32. 1. Reads the output of the effectiveness of billed vs processed bytes query 2. Based on benefits margin map the step of the increase eg: 5GB step, 1GB step, 0.5GB step 3. Have a math of the evaluation, how far you can stretch by increasing the BI Engine to have the benefits 4. Capacity mapping over office hours for more capacity, and lower capacity during the night. 5. Leverage BigQuery ML to write a time-series forecast prediction based on historical data to actually drive the best BI Engine capacity for the “hour slot”. 6. Stop increasing the capacity when the rationale of the savings costs more than the benefits. Cloud Workflow automation logic Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 33. Reduce BigQuery bills with BI Engine capacity orchestration Article: https://medium.com/p/9e2634c84a82 @martonkodok
  • 34. Data Studio aspects Article: https://medium.com/p/9e2634c84a82 @martonkodok 1. Accelerated by BigQuery Engine icon 2. Faster dashboards
  • 35. Cloud Monitoring Article: https://medium.com/p/9e2634c84a82 @martonkodok 1. create a chart plotting the bigquerybiengine.googleapis.com/reservation/used_bytes 2. over the bigquerybiengine.googleapis.com/reservation/total_bytes
  • 37. 1. Easy out of box way to optimize costs of BigQuery 2. by turning out BI Engine, which does not need code changes. 3. Leverage INFORMATION_SCHEMA stats to see underperforming queries, try tooptimize them. 4. Automate the right capacity size by using Cloud Workflows 5. Save precious development time, lower bills, faster queries Conclusions Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
  • 38. Thank you. Q&A. Slides available on: slideshare.net/martonkodok Reea.net - Integrated web solutions driven by creativity to deliver projects. Twitter: @martonkodok