BIG DATA AND MACHINE LEARNING
ON GOOGLE CLOUD PLATFORM
Wlodek Bielski | CEO Pure Company
AGENDA
 Google Cloud Platform at a glance
 BigQuery: Cloud DWH and Big Data
 Managed Hadoop, Beam, Airflow – overview of GCP PaaS
 Machine Learning: from ML API toTensorflow
PURE COMPANY
DATA
AICLOUD
GCP AT A GLANCE
GARTNER MQ FOR CLOUD IAAS, 2018
„Google has been most differentiated
on the forward edge of IT, with deep
investments in analytics and ML,
and many customers who choose Google
for strategic adoption have applications
that are anchored by BigQuery”
GOOGLE CLOUD PLATFORM
OPEN-SOURCE INNOVATIONS
BIGQUERY: NO-OPS CLOUD DWH
 Near real-time analysis of massive datasets
 Standard SQL syntax (ANSI SQL 2011)
 No-ops for performance and scaling (global black-box)
 Separated storage and compute, linked with petabit network
 Pay-as-you-go: only for queries and storage used
 Automatic discount for long-term storage
BIGQUERY ARCHITECTURE
BIGQUERY: DEMO
COMPLETE BI FLOW ON GCP
BIGQUERY ML
DATA PROCESSING ON GCP
Cloud Composer – managed AirFlow
Dataproc – Hadoop + Spark
DataFlow – Apache Beam
Matillion – ETL/ELT, mainly for BigQuery
DATAPROCVS DATAFLOW
Cloud Dataproc
 Migrating existing Hadoop workloads
 Iterative processing and Notebooks
 ML with Spark ML
Cloud Dataflow
 Better for greenfields
 Batch + streaming in one tool
 Based on Apache Beam
 Multiple runtimes, e.g. Spark, Flink
 Preprocessing for CloudML
SAMPLE MATILLION FLOW
GOOGLE ML OFFERING
ML APIs AutoML CloudML
Tensorflow
DIY
Data Science expertise required
VISION API
 Pretrained models via API
 Label detection
 Face detection
 NO Face recognition
 Logo detection
 REST API
 Cloud Storage integration
AUTOML
VISION
 Custom ML models
without coding / ML skills
 Human labeling available
(2-20 labels, up to 5 working
days)
 Powered by Google
research in AutoML
andTransfer Learning
CLOUD ML ENGINE
 Complete operationalization service in Cloud
 GCP console, command line gcloud ml-engine, REST API
 TensorFlow, scikit-learn and XGBoost support
 Both Python 2 and Python 3 are supported in v1.4+
 Training with GPUs andTPUs (beta)
TYPICAL ML PROCESS
EXAMPLE CLOUD ML FLOW
CLOUD COMPOSER (AIRFLOW) ORCHESTRATION
WRAP-UP
 BigQuery – Cloud DWH
 BigQuery ML – pure SQL, for devs / data analysts
 Set of APIs for developers (e.g.Vision API)
 AutoML for analysts
 CloudML – for data scientists
HTTPS://AI.GOOGLE/
wlodek@purecompany.pl
www.purecompany.pl

Big Data and ML on Google Cloud