Sponsors
Silver Sponsors
Strategic Sponsor
Media Partners
Google BigQuery
Wlodek Bielski
cloud.developerdays.pl@DeveloperDaysPL
About me
• In BI / Analytics space since 2005
• In Clouds since 2014
• Google Professional Data Architect
• Google Professional Data Engineer
• MCSD: Azure Solutions Architect
• MCSE: Cloud Platform and Infrastructure
• MCSE: Data Management and Analytics
• AWS Certified Solution Architect - Associate
cloud.developerdays.pl@DeveloperDaysPL
Agenda
• GCP overview
• BigQuery overview
• BigQuery internals
• BigQuery use cases
• Session on ML with GCP: 15:00-16:00
cloud.developerdays.pl@DeveloperDaysPL
Google Cloud Platform
Brief overview
cloud.developerdays.pl@DeveloperDaysPL
Gartner MQ for Cloud IaaS, 2018
„Google has been most differentiated
on the forward edge of IT, with deep
investments in analytics and ML, and many
customers who choose Google for strategic
adoption have applications
that are anchored by BigQuery”
cloud.developerdays.pl@DeveloperDaysPL
Google Cloud Platform
cloud.developerdays.pl@DeveloperDaysPL
Open-source innovations
cloud.developerdays.pl@DeveloperDaysPL
BigQuery
Brief overview
cloud.developerdays.pl@DeveloperDaysPL
What is BigQuery?
• Fully managed, No-Ops analytics data warehouse
• Highly parallel / distributed processing model
• Only pay for actual storage and compute used
• Virtually unlimited storage and compute resources
• Runs on Google infrastructure (US, EU, Asia)
• Multi-tenant architecture
cloud.developerdays.pl@DeveloperDaysPL
Nested and repeated schema
cloud.developerdays.pl@DeveloperDaysPL
BigQuery timeline
cloud.developerdays.pl@DeveloperDaysPL
Demo
GCP console and BigQuery
cloud.developerdays.pl@DeveloperDaysPL
Query example
cloud.developerdays.pl@DeveloperDaysPL
BigQuery internals
cloud.developerdays.pl@DeveloperDaysPL
BigQuery under the hood
cloud.developerdays.pl@DeveloperDaysPL
Storage subsystem: Colossus
• Successor to GFS (GFS: 2003, Colossus: 2010)
• While GFS – batch, Colossus – real time
• Powering most of Google internal services (Gmail, YouTube, GCS)
• Reliable and fault-tolerant (Reed-Solomon)
• Supports fast table scans (no indexes in BigQuery!)
https://cloud.google.com/files/storage_architecture_and_challenges.pdf
cloud.developerdays.pl@DeveloperDaysPL
Storage format: Capacitor
• Replaced earlier ColumnIO (base for Parquet and ORC)
• Columnar storage format for Colossus
• Once imported, all input formats encoded into Capacitor
• Able to operate directly on compressed data
• Maintains rich metadata on datasets – used by Dremel
https://cloud.google.com/files/storage_architecture_and_challenges.pdf
cloud.developerdays.pl@DeveloperDaysPL
cloud.developerdays.pl@DeveloperDaysPL
Networking: Jupyter
• Google petabit network
• Google internal design
• Separation of storage and compute
cloud.developerdays.pl@DeveloperDaysPL
Execution engine: Dremel
• Scalable, interactive ad-hoc query system for analysis of read-only nested data
• Dynamic processing trees, in-memory shuffle component
• Dynamic query execution, aided by metadata
• No configs/knobs exposed to end user
• When idle, running Batch Ingests (for free!) - Poseidon
https://ai.google/research/pubs/pub36632
cloud.developerdays.pl@DeveloperDaysPL
Compute: Borg
• Container-oriented cluster-management system
• Precursor to Kubernetes
• E.g. Allocs = Pods
https://ai.google/research/pubs/pub43438
https://kubernetes.io/blog/2015/04/borg-predecessor-to-kubernetes/
cloud.developerdays.pl@DeveloperDaysPL
Overall BigQuery architecture
cloud.developerdays.pl@DeveloperDaysPL
Business cases
cloud.developerdays.pl@DeveloperDaysPL
BigQuery use cases
• Analyzing online marketing
• Google Analytics 360
• Firebase
• AdWords
• Doubleclick
• Hadoop-like processing
• BigQuery + DataFlow
• Data lake for ML
• Archive for cold data
(long-term storage pricing)
cloud.developerdays.pl@DeveloperDaysPL
Typical e-commerce integration
cloud.developerdays.pl@DeveloperDaysPL
Demo
BigQuery with Cloud Datalab
cloud.developerdays.pl@DeveloperDaysPL
Thank you!
cloud.developerdays.pl
@DeveloperDaysPL
Sponsors
Silver Sponsors
Strategic Sponsor
Media Partners

Cloud Developer Days - BigQuery