3. cloud.developerdays.pl@DeveloperDaysPL
About me
• In BI / Analytics space since 2005
• In Clouds since 2014
• Google Professional Data Architect
• Google Professional Data Engineer
• MCSD: Azure Solutions Architect
• MCSE: Cloud Platform and Infrastructure
• MCSE: Data Management and Analytics
• AWS Certified Solution Architect - Associate
6. cloud.developerdays.pl@DeveloperDaysPL
Gartner MQ for Cloud IaaS, 2018
„Google has been most differentiated
on the forward edge of IT, with deep
investments in analytics and ML, and many
customers who choose Google for strategic
adoption have applications
that are anchored by BigQuery”
10. cloud.developerdays.pl@DeveloperDaysPL
What is BigQuery?
• Fully managed, No-Ops analytics data warehouse
• Highly parallel / distributed processing model
• Only pay for actual storage and compute used
• Virtually unlimited storage and compute resources
• Runs on Google infrastructure (US, EU, Asia)
• Multi-tenant architecture
17. cloud.developerdays.pl@DeveloperDaysPL
Storage subsystem: Colossus
• Successor to GFS (GFS: 2003, Colossus: 2010)
• While GFS – batch, Colossus – real time
• Powering most of Google internal services (Gmail, YouTube, GCS)
• Reliable and fault-tolerant (Reed-Solomon)
• Supports fast table scans (no indexes in BigQuery!)
https://cloud.google.com/files/storage_architecture_and_challenges.pdf
18. cloud.developerdays.pl@DeveloperDaysPL
Storage format: Capacitor
• Replaced earlier ColumnIO (base for Parquet and ORC)
• Columnar storage format for Colossus
• Once imported, all input formats encoded into Capacitor
• Able to operate directly on compressed data
• Maintains rich metadata on datasets – used by Dremel
https://cloud.google.com/files/storage_architecture_and_challenges.pdf
21. cloud.developerdays.pl@DeveloperDaysPL
Execution engine: Dremel
• Scalable, interactive ad-hoc query system for analysis of read-only nested data
• Dynamic processing trees, in-memory shuffle component
• Dynamic query execution, aided by metadata
• No configs/knobs exposed to end user
• When idle, running Batch Ingests (for free!) - Poseidon
https://ai.google/research/pubs/pub36632
25. cloud.developerdays.pl@DeveloperDaysPL
BigQuery use cases
• Analyzing online marketing
• Google Analytics 360
• Firebase
• AdWords
• Doubleclick
• Hadoop-like processing
• BigQuery + DataFlow
• Data lake for ML
• Archive for cold data
(long-term storage pricing)