best practices and recommendations for tuning BI Engine for your existing BigQuery workloads for cheaper and faster queries. Learn how we at REEA are orchestrating BI Engine reservations, on a 5TB dataset, considered small for BigQuery but with big cost savings and accelerated queries. We are seeing many presentations for big enterprises, but now we are showcasing how our queries perform better with lower costs. We are going to address the top considerations when to turn on BI Engine, how to use cloud orchestration for making this an automatic process, and combined with BigQuery and Datastudio query complexity that might save precious development time, lower bills, faster queries.
2. ● Among the Top 3 romanians on Stackoverflow 201k reputation
● Google Developer Expert on Cloud technologies (2016→)
● Champion of Google Cloud Innovators program (2021→)
● Crafting Web/Mobile backends at REEA.net
Articles: martonkodok.medium.com
Twitter: @martonkodok
Slideshare: martonkodok
StackOverflow: pentium10
GitHub: pentium10
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
About me
3. 1. Looking at a BigQuery billing report
2. What is BI Engine?
3. Obtaining per job billing stats
4. Enable and use BI Engine reservations
5. Using Cloud Workflows to orchestrate the right capacity
6. Lower bills and faster queries on Data Studio, BigQuery
7. Conclusions, articles
Agenda
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
4. Looking at a BigQuery billing report
@martonkodok
5. Reduce BigQuery bills with BI Engine capacity orchestration
Article: https://medium.com/p/9e2634c84a82 @martonkodok
8. “ BIEngine is a fast, in-memory analysis service
that integrates out of the box
with BigQuery, DataStudio, Looker,Tableau,PowerBI
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
What is BiEngine?
9. Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
BIEngine architecture
10. 1. Its a cache plugin to BigQuery- a manageddistributed in-memoryexecutionengine
2. BI Engine reservations manage the memoryallocationattheprojectbillinglevel.
3. cachesonlycolumnsandpartitionsthatarequeriedorscanned. It does not cache the whole table.
4. Any BI solution or custom application that works with the BigQuery API
such as REST or JDBC and ODBC drivers canuseBIEnginewithoutanychanges.
What does out-of-the-box means?
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
11. Free-for-all
1TB free each month
On-demand
queries $5/TB, storage: $20/TB
Flat rate reservation slots
average $4 per hour,
best is $1700 for 100 slots (Yrl plan)
BigQuery ML excluded from this table.
Cost components in BigQuery and BI Engine
@martonkodok
BI Engine
$0.0416 per GB/hour
($30.36 per GB/month)
13. “The aim is to dynamically adjust the size of
the BIEngine to get the lowest combined cost
of BigQuery and BI Engine.
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
14. 1. Obtain the cost of your on-demand BigQuery usage
2. Set the BI Engine capacity in steps
3. Have a real-time sense of the savings todrive capacity automation up/down
4. Monitor the applied settings for optimal savings
Prerequisite:
Access to INFORMATION_SCHEMA or Auditlogs exported to BigQuery (historically better)
Biggest challenges
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
15. The query to get the recent costs for each job
Article: https://medium.com/p/9e2634c84a82 @martonkodok
16. The query to get the recent costs for each job
Article: https://medium.com/p/9e2634c84a82 @martonkodok
1. The query uses a flat rate of 5 USD to calculate the cost
2. At this point no optimization is in place, as the two columns are the same
17. BigQuery savings based on billed vs processed bytes
Article: https://medium.com/p/9e2634c84a82 @martonkodok
18. BigQuery savings when BI Engine is properly sized
Article: https://medium.com/p/9e2634c84a82 @martonkodok
1. BI Engine capacity resize needs 5 minute to propagate
2. Savings are calculated yielding lower billed bytes than processed bytes
20. 1. BI engine capacity might be too small
2. bq queries are too complex
BI Engine turned on - but ineffective
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
21. “ Not all BigQueryqueries are accelerated.
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
22. 1. Detailed statistics on BI Engine are available through the job statistics API
2. bq command-line tool to fetch job statistics
Acceleration statistics
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
24. Use INFORMATION_SCHEMA to get acceleration statistics
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
25. 1. Investigate queries that have BI Engine acceleration reported as disabled, partial
2. Rewrite queries to perform better under BI Engine optimizer
3. Use materialized views to join and flatten data to optimize their structure for BI Engine
4. Create short lived (5m, 15m, 1h) temporary tables to improve caching efficiency
5. Increase the size of the BI Engine reservation until effective use
6. Use Cloud Workflow and business logic to automate the size based on workload during the day
To have effective BI Engine acceleration
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
26. Leverage temporary, dedicated business scope tables
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Dedicated table for scope
Use a scheduler to recreate
every 5m/15m/1h
Leverage clustering
27. Use Materialized Views to get latest rows from append-only tables
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
Trick to get latest row using
Materialized Views
2nd view to get rid of the
arrays
30. Cloud Workflows automating the BI Engine capacity size
Article: https://medium.com/p/9e2634c84a82 @martonkodok
31. BigQuery savings when BI Engine is properly sized
Article: https://medium.com/p/9e2634c84a82 @martonkodok
1. BI Engine capacity resize needs 5 minute to propagate
2. Savings are calculated yielding lower billed bytes than processed bytes
32. 1. Reads the output of the effectiveness of billed vs processed bytes query
2. Based on benefits margin map the step of the increase eg: 5GB step, 1GB step, 0.5GB step
3. Have a math of the evaluation, how far you can stretch by increasing the BI Engine to have the benefits
4. Capacity mapping over office hours for more capacity, and lower capacity during the night.
5. Leverage BigQuery ML to write a time-series forecast prediction based on historical data to actually drive
the best BI Engine capacity for the “hour slot”.
6. Stop increasing the capacity when the rationale of the savings costs more than the benefits.
Cloud Workflow automation logic
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
33. Reduce BigQuery bills with BI Engine capacity orchestration
Article: https://medium.com/p/9e2634c84a82 @martonkodok
34. Data Studio aspects
Article: https://medium.com/p/9e2634c84a82 @martonkodok
1. Accelerated by BigQuery Engine icon
2. Faster dashboards
37. 1. Easy out of box way to optimize costs of BigQuery
2. by turning out BI Engine, which does not need code changes.
3. Leverage INFORMATION_SCHEMA stats to see underperforming queries, try tooptimize them.
4. Automate the right capacity size by using Cloud Workflows
5. Save precious development time, lower bills, faster queries
Conclusions
Reduce BigQuery bills with BI Engine capacity orchestration @martonkodok
38. Thank you. Q&A.
Slides available on:
slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by creativity
to deliver projects.
Twitter: @martonkodok