SlideShare a Scribd company logo
1 of 24
Download to read offline
Google BigQuery
About myself
Matthias Feys
work @Datatonic:
- big data (with Google Cloud)
- machine learning
- data visualizations (Tableau/Spotfire)
Google Qualified Cloud Developer
contact:
- @FsMatt
- matthias@datatonic.com
About Datatonic
Datatonic is a team of data science experts that help corporations unleash
the power of data. They use Google Cloud Platform, data visualisation
technologies (like Tableau or Spotfire) and machine learning to build
breakthrough solutions. Either as expert advisors, or built as a fully
managed solution with support end-to-end (A3S).
Some references:
@teamdatatonic
● What is BigQuery?
● How does it scale/work?
● How does it compare to:
- NoSQL datastores
- MapReduce
● Demo
● Pricing Model
● Best Practices
This talk
What is BigQuery?
“BigQuery is a fully-managed
and cloud-based interactive
query service for
massive datasets.”
It’s the externalization
of Dremel, one of
Google’s core
technologies
What is BigQuery? (2)
BigQuery Service is available via:
● Web UI (bigquery.cloud.google.com)
● console (gcloud)
● API (+ client libraries)
● external tools (Tableau, Excel, …)
● ODBC connector
How Does it Scale?
Dremel Architecture
Data Model/Storage:
- Columnar Storage
- Nested/Repeated Fields
- No Index!
-> Single Full Table Scan (from disk)
Query Execution:
- Tree Architecture
- Using tens of thousands
machines over fast Google
network (+1Petabit/s)
Columnar Storage
● Traffic minimization:
○ only read selected
columns
● Higher Compression Ratio:
○ Similar values in the
same column
○ From 1:3 → 1:10
Tree Architecture
- root server:
->receives query + reads table metadata
->rewrites the query(s)
->sends queries to the next level
<-returns final query results
- intermediate servers:
->(similar steps)
<-parallel partial aggregation
- leaf servers:
->actually scan (parts) of the table
<-send data to intermediate servers
NoSQL Datastore vs. BigQuery?
NoSQL Datastore
● Index based
(expected queries)
● Read-write
BigQuery
● Non-index based
(ad hoc queries)
● Read-only (append-
only)
MapReduce vs. BigQuery?
MapReduce
● High latency
● Flexible (complex)
batch processing
● Unstructured data
BigQuery
● Low latency
● SQL-like queries
● Structured data
Demo’s
Demo’s
Pricing Model
Category Price Note
Storage Cost $0.020 per GB, per
month
Query Cost $5 per TB 1st TB per month
is free
Best Practices
Denormalize / Pre-Join Where Possible
● Best performance
● Only pay for the columns you need
● Nested/repeated fields!
Relational Database Design Denormalized Nested/Repeated (JSON)
Table Sharding
● You pay for what you read
→ Read less, pay less
● Table wildcards allow for easy reading over multiple tables
https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions
Optimize for Query vs. Storage Costs
Common Queries?
- Materialized views
(save intermediate results in tables)
with pre-aggregated data:
→ faster + cheaper queries
- Store data in multiple tables:
- table for daily data
- table for weekly data
- table for monthly data
Narrow the Table Scans
You only pay for the columns you read
Don’t use “SELECT *” !!!
Table Decorators
Only way to avoid doing full table scans!
Allows undeleting tables
options:
● snapshot decorator + range decorator
● relative value + absolute values
https://cloud.google.com/bigquery/table-decorators
Query optimizations
Query Plan
https://cloud.google.com/bigquery/query-plan-explanation
Big Data Reference Architecture
Questions?
You can reach me at:
- mail: matthias@datatonic.com
- Twitter: @FsMatt

More Related Content

What's hot

BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptxVikRam S
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptxHarissh16
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsAndreas Raible
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementMárton Kodok
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementationSimon Su
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...javier ramirez
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Amazon Web Services
 
Embedding Data & Analytics With Looker
Embedding Data & Analytics With LookerEmbedding Data & Analytics With Looker
Embedding Data & Analytics With LookerLooker
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introductionleanderlee2
 

What's hot (20)

BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptx
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
 
Bigquery 101
Bigquery 101Bigquery 101
Bigquery 101
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Google Big Table
Google Big TableGoogle Big Table
Google Big Table
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
 
Embedding Data & Analytics With Looker
Embedding Data & Analytics With LookerEmbedding Data & Analytics With Looker
Embedding Data & Analytics With Looker
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
Migrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQLMigrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQL
 

Similar to Google BigQuery

Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQueryDharmesh Vaya
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBMarco Segato
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryMárton Kodok
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyGuillaume Lefranc
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform DATAVERSITY
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Ido Green
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 

Similar to Google BigQuery (20)

Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Mongodb
MongodbMongodb
Mongodb
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Google BigQuery

  • 2. About myself Matthias Feys work @Datatonic: - big data (with Google Cloud) - machine learning - data visualizations (Tableau/Spotfire) Google Qualified Cloud Developer contact: - @FsMatt - matthias@datatonic.com
  • 3. About Datatonic Datatonic is a team of data science experts that help corporations unleash the power of data. They use Google Cloud Platform, data visualisation technologies (like Tableau or Spotfire) and machine learning to build breakthrough solutions. Either as expert advisors, or built as a fully managed solution with support end-to-end (A3S). Some references: @teamdatatonic
  • 4. ● What is BigQuery? ● How does it scale/work? ● How does it compare to: - NoSQL datastores - MapReduce ● Demo ● Pricing Model ● Best Practices This talk
  • 5. What is BigQuery? “BigQuery is a fully-managed and cloud-based interactive query service for massive datasets.” It’s the externalization of Dremel, one of Google’s core technologies
  • 6. What is BigQuery? (2) BigQuery Service is available via: ● Web UI (bigquery.cloud.google.com) ● console (gcloud) ● API (+ client libraries) ● external tools (Tableau, Excel, …) ● ODBC connector
  • 7. How Does it Scale?
  • 8. Dremel Architecture Data Model/Storage: - Columnar Storage - Nested/Repeated Fields - No Index! -> Single Full Table Scan (from disk) Query Execution: - Tree Architecture - Using tens of thousands machines over fast Google network (+1Petabit/s)
  • 9. Columnar Storage ● Traffic minimization: ○ only read selected columns ● Higher Compression Ratio: ○ Similar values in the same column ○ From 1:3 → 1:10
  • 10. Tree Architecture - root server: ->receives query + reads table metadata ->rewrites the query(s) ->sends queries to the next level <-returns final query results - intermediate servers: ->(similar steps) <-parallel partial aggregation - leaf servers: ->actually scan (parts) of the table <-send data to intermediate servers
  • 11. NoSQL Datastore vs. BigQuery? NoSQL Datastore ● Index based (expected queries) ● Read-write BigQuery ● Non-index based (ad hoc queries) ● Read-only (append- only)
  • 12. MapReduce vs. BigQuery? MapReduce ● High latency ● Flexible (complex) batch processing ● Unstructured data BigQuery ● Low latency ● SQL-like queries ● Structured data
  • 15. Pricing Model Category Price Note Storage Cost $0.020 per GB, per month Query Cost $5 per TB 1st TB per month is free
  • 17. Denormalize / Pre-Join Where Possible ● Best performance ● Only pay for the columns you need ● Nested/repeated fields! Relational Database Design Denormalized Nested/Repeated (JSON)
  • 18. Table Sharding ● You pay for what you read → Read less, pay less ● Table wildcards allow for easy reading over multiple tables https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions
  • 19. Optimize for Query vs. Storage Costs Common Queries? - Materialized views (save intermediate results in tables) with pre-aggregated data: → faster + cheaper queries - Store data in multiple tables: - table for daily data - table for weekly data - table for monthly data
  • 20. Narrow the Table Scans You only pay for the columns you read Don’t use “SELECT *” !!!
  • 21. Table Decorators Only way to avoid doing full table scans! Allows undeleting tables options: ● snapshot decorator + range decorator ● relative value + absolute values https://cloud.google.com/bigquery/table-decorators
  • 23. Big Data Reference Architecture
  • 24. Questions? You can reach me at: - mail: matthias@datatonic.com - Twitter: @FsMatt