SlideShare a Scribd company logo
Storytelling: BigQuery - past, present and future
Mobile Campaign Analytics | Retargeting | Unbiased Attribution
Nir Rubinstein - Chief Architect. nir@appsflyer.com
State of the union – 2012
Tech stack:
• Small, semi-isolated python services
• Communication via Redis' pub/sub
• Main DB is CouchDB
• Not a whole lot going on...
State of the union – 2012, cont'd
The problem:
• A really big raw(!) report is being served by CouchDB
• Data in CouchDB is being generated via a View
• The entire DB is on hold while a view is generated
• Pissed off client
What do we do???
The Quick Solution
Google Bigquery
• Hosted solution – we don't have to manage it
• Based on Google's BigTable whitepaper
• Columnar storage DB
• Really easy to start working with...
The Modeling Problem
Is there one?
• Data in Bigquery is divided into:
●
Projects
●
Datasets
●
Tables
How to split the data?
Is performance constant?
How am I charged?
The Modeling Problem
The naive approach:
One Project
One Dataset
Multiple Tables
Hundred of thousands of rows a month
The Modeling Problem
The Modeling Problem
What are the limitations?
• Cannot change the schema of a table once created
• Cannot update table data
• Cannot delete table data
Essentially – forward writing only!
The Modeling Problem
How we tackled these “issues”?
• One global “Project” that drives our Raw Reports
• New Datasets are created every 30 days -
business and cost limitations
• Tables in the datasets are versioned!
Current State of the Union
The Cost Problem
Is there a problem?
• Storage is very cheap, querying is expansive
• Querying is billed by the amount of data scanned.
Bigquery is a columnar DB which means that every
column that participates in the query is read from
beginning to end
• Once tables start storing a lot of data, even simple queries
with very few columns will be expensive
Cost Optimizations
• We're processing 8B daily events. Out of those, a few
hundred millions are written into Bigquery – only
meaningful data.
• We've created a unified schema to prevent table version
issues
• Tables can be query optimized via Table Decorators to
limit the time range of queried data
• Tables will be named with dates in them in order to
support Table Wildcard queries in order to reduce the cost
The “Cost” of Cost Optimizations
• Performance (querying multiple tables)
• Over engineering (inserting to and maintaining multiple
tables)
• Storage is cheap, but querying is costly. Since querying
does a full column scan, there's a debate whether we
should store the entire data or parts of it.
The Future
What are we waiting for?
Custom partitioning functions!!!
Thank You!
(We're hiring)

More Related Content

What's hot

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
Data Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopData Warehousing Patterns for Hadoop
Data Warehousing Patterns for Hadoop
Michelle Ufford
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
Albert Wong
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
Adam Doyle
 
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
iguazio
 
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
iguazio
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Databricks
 
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Databricks
 
[Codemash] Caching Made "Bootiful"!
[Codemash] Caching Made "Bootiful"![Codemash] Caching Made "Bootiful"!
[Codemash] Caching Made "Bootiful"!
Viktor Gamov
 
Bridging the Completeness of Big Data on Databricks
Bridging the Completeness of Big Data on DatabricksBridging the Completeness of Big Data on Databricks
Bridging the Completeness of Big Data on Databricks
Databricks
 
IPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution OverviewIPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution Overview
pzybrick
 
Microsoft Machine Learning Smackdown
Microsoft Machine Learning SmackdownMicrosoft Machine Learning Smackdown
Microsoft Machine Learning Smackdown
Lynn Langit
 
Building Data Quality Audit Framework using Delta Lake at Cerner
Building Data Quality Audit Framework using Delta Lake at CernerBuilding Data Quality Audit Framework using Delta Lake at Cerner
Building Data Quality Audit Framework using Delta Lake at Cerner
Databricks
 
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Adam Doyle
 
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowAccelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
Databricks
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Databricks
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Jeff Magnusson
 
Data-Driven @ Netflix
Data-Driven @ NetflixData-Driven @ Netflix
Data-Driven @ Netflix
Michelle Ufford
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
kbajda
 
Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...
Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...
Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...
Databricks
 

What's hot (20)

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Data Warehousing Patterns for Hadoop
Data Warehousing Patterns for HadoopData Warehousing Patterns for Hadoop
Data Warehousing Patterns for Hadoop
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
 
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018
 
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...
 
[Codemash] Caching Made "Bootiful"!
[Codemash] Caching Made "Bootiful"![Codemash] Caching Made "Bootiful"!
[Codemash] Caching Made "Bootiful"!
 
Bridging the Completeness of Big Data on Databricks
Bridging the Completeness of Big Data on DatabricksBridging the Completeness of Big Data on Databricks
Bridging the Completeness of Big Data on Databricks
 
IPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution OverviewIPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution Overview
 
Microsoft Machine Learning Smackdown
Microsoft Machine Learning SmackdownMicrosoft Machine Learning Smackdown
Microsoft Machine Learning Smackdown
 
Building Data Quality Audit Framework using Delta Lake at Cerner
Building Data Quality Audit Framework using Delta Lake at CernerBuilding Data Quality Audit Framework using Delta Lake at Cerner
Building Data Quality Audit Framework using Delta Lake at Cerner
 
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
 
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowAccelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
 
Data-Driven @ Netflix
Data-Driven @ NetflixData-Driven @ Netflix
Data-Driven @ Netflix
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
 
Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...
Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...
Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...
 

Similar to BigQuery at AppsFlyer - past, present and future

bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
VIJAYAPRABAP
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
AnjaliKumari301316
 
Introduction to GCP BigQuery and DataPrep
Introduction to GCP BigQuery and DataPrepIntroduction to GCP BigQuery and DataPrep
Introduction to GCP BigQuery and DataPrep
Paweł Mitruś
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)
Rasel Rana
 
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
Remy Rosenbaum
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
Dharmesh Vaya
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
Venkata Reddy Konasani
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
Stratebi
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Tyler Wishnoff
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
Russel Chowdhury
 
Big Data
Big DataBig Data
Big Data
Mahesh Bmn
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
Harissh16
 
A peek into the future
A peek into the futureA peek into the future
A peek into the future
Prateek Chauhan
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Vladi Vexler
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
bigdata (1)
bigdata (1)bigdata (1)
bigdata (1)
DIVYA G
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
Matthias Feys
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Rittman Analytics
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 

Similar to BigQuery at AppsFlyer - past, present and future (20)

bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
 
Introduction to GCP BigQuery and DataPrep
Introduction to GCP BigQuery and DataPrepIntroduction to GCP BigQuery and DataPrep
Introduction to GCP BigQuery and DataPrep
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)
 
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
 
Big Data
Big DataBig Data
Big Data
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
 
A peek into the future
A peek into the futureA peek into the future
A peek into the future
 
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015 Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
bigdata (1)
bigdata (1)bigdata (1)
bigdata (1)
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 

Recently uploaded

UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 

Recently uploaded (20)

UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 

BigQuery at AppsFlyer - past, present and future

  • 1. Storytelling: BigQuery - past, present and future Mobile Campaign Analytics | Retargeting | Unbiased Attribution Nir Rubinstein - Chief Architect. nir@appsflyer.com
  • 2. State of the union – 2012 Tech stack: • Small, semi-isolated python services • Communication via Redis' pub/sub • Main DB is CouchDB • Not a whole lot going on...
  • 3. State of the union – 2012, cont'd The problem: • A really big raw(!) report is being served by CouchDB • Data in CouchDB is being generated via a View • The entire DB is on hold while a view is generated • Pissed off client What do we do???
  • 4. The Quick Solution Google Bigquery • Hosted solution – we don't have to manage it • Based on Google's BigTable whitepaper • Columnar storage DB • Really easy to start working with...
  • 5. The Modeling Problem Is there one? • Data in Bigquery is divided into: ● Projects ● Datasets ● Tables How to split the data? Is performance constant? How am I charged?
  • 6. The Modeling Problem The naive approach: One Project One Dataset Multiple Tables Hundred of thousands of rows a month
  • 8. The Modeling Problem What are the limitations? • Cannot change the schema of a table once created • Cannot update table data • Cannot delete table data Essentially – forward writing only!
  • 9. The Modeling Problem How we tackled these “issues”? • One global “Project” that drives our Raw Reports • New Datasets are created every 30 days - business and cost limitations • Tables in the datasets are versioned!
  • 10. Current State of the Union
  • 11. The Cost Problem Is there a problem? • Storage is very cheap, querying is expansive • Querying is billed by the amount of data scanned. Bigquery is a columnar DB which means that every column that participates in the query is read from beginning to end • Once tables start storing a lot of data, even simple queries with very few columns will be expensive
  • 12. Cost Optimizations • We're processing 8B daily events. Out of those, a few hundred millions are written into Bigquery – only meaningful data. • We've created a unified schema to prevent table version issues • Tables can be query optimized via Table Decorators to limit the time range of queried data • Tables will be named with dates in them in order to support Table Wildcard queries in order to reduce the cost
  • 13. The “Cost” of Cost Optimizations • Performance (querying multiple tables) • Over engineering (inserting to and maintaining multiple tables) • Storage is cheap, but querying is costly. Since querying does a full column scan, there's a debate whether we should store the entire data or parts of it.
  • 14. The Future What are we waiting for? Custom partitioning functions!!!

Editor's Notes

  1. <number>
  2. <number>
  3. <number>
  4. <number>
  5. <number>
  6. <number>
  7. <number>
  8. <number>
  9. <number>
  10. <number>
  11. <number>
  12. <number>
  13. <number>
  14. <number>