SlideShare a Scribd company logo
1 of 28
Download to read offline
End To End
Machine
Learning With
Google Cloud
Tu Pham
CTO @ Eway JSC
Data Science and
Machine Learning Connect
-- Google Thailand --
Google IO Extended 2018
-- Ha Noi, Viet Nam --
About Me Tu Pham
- Google Developer Expert on Cloud
Platform
- CTO at Eway JSC
- Open source contributor, blogger,
father
- 8 years experience on Big data
and Cloud Computing
Since 2016
- Google IO Extended
- Google Next Extended
- Devfest Viet Nam
- Study Jam
- And so many events …
Our biggest event in Viet Nam
got > 1600 attendees
About Google
Cloud Platform
Viet Nam
Community
About Eway
- AdFlex.asia: Top 1 CPA network in Viet Nam
- MasOffer.net: Top 1 CPS network in Viet Nam
- DYNO.vn: Big data services for Fintech and
Online Advertising
- iHR: Big data for recruiting
- eDoctor.io: Platform for healthcare
One challenge
In 2013:
We have small data, manual
workflow
And … a lot of people for
manual everything
Migrate to
Google Cloud
from 2013
Our Common
Task in 2013
Our Very
First Data
Flow in
2013
1. Remember where I save the data
2. Use tools to extract a part data (SQL, Spark, Pandas,
...)
3. Download the data then send it to the data science
team
4. Wait for the model then manual apply by deploy as
services
5. If something wrong, back to step 1
What I Learned
Redesign The
Flow
Principles:
- KISS (Keep it simple, stupid)
- DRY (Don’t Repeat Yourself)
- Single Responsibility
- Low Cost
- Scalable
Become Geek
Redesign The
Flow
1. GC Compute Engine instances collect
raw data
2. GC Compute Engine instances convert
raw data to Apache Parquet files
3. GC Compute Engine upload parquet file
to GC Cloud Storage
4. Explore dataset using GC Datalab
5. Develop a machine learning model in
Tensorflow / Scikit learn
6. Train a machine learning model at
scale on GC Cloud ML Engine
7. Deploy the trained ML model by Web
API to GC Compute Engine instances
8. Expose Web API via GC Load Balancing
End-to-end ML with Google Cloud
Step 1: GC Compute Engine Instances
Collect Raw Data
- Technology: Cloud Load Balancing, Compute Engine
- Why Cloud Load Balancing:
- TCP/UDP Load Balancing
- Seamless Autoscaling
- Scalable
- Why Compute Engine:
- High-Performance
- Scalable
- Low Cost
- Fast Networking
- Custom Machine Types
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- Technology: Compute Engine, Parquet file format
- Why Parquet:
- Self-describing
- Columnar storage format
- Language-independent
- High query-performance
- Spark SQL is much faster with Parquet
- High compression (up to 70%)- less disk IO
- Technology: Compute Engine, Parquet file format, Cloud Storage
- Why Cloud Storage:
- Four storage classes
- Easy to integrate
- Object Lifecycle Management
- Fast Networking
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
Step 4: Explore Dataset Using GC Datalab
- Technology: Cloud Datalab
- Why Datalab:
- Integrated with: Cloud BigQuery, Cloud Machine Learning Engine, Cloud Storage, and
Stackdriver Monitoring
- IPython Support & Notebook Format
- Interactive Data Visualization
- Multi-Language Support: Python, SQL, and JavaScript (for BigQuery user-defined functions
- Technology: Tensorflow / Scikit Learn
- Why Tensorflow:
- Deep learning neural networks
- Huge community: contributors, researchers, developers
- Train models fast
- Allows developers to iterate quickly
- Run on large scale server
- Multi language support
Step 5: Develop Machine Learning Models
In Tensorflow / Scikit Learn
Step 6: Train Machine Learning Models At
Scale On GC Machine Learning Engine
- Technology: Cloud Machine Learning Engine
- Why Machine Learning Engine:
- Automatic Resource Provisioning
- Server-Side Preprocessing
- HyperTune
- Portable Models
- Integrated: Cloud Dataflow, Cloud Storage
- Multiple Frameworks: scikit-learn, XGBoost, Keras, TensorFlow
Be 1% better everyday tips
- Plan your system principles
- Single responsibility for everything
- Design system architecture, data flow, data model, data
structure first
- Separate realtime and batch flows
- Separate data storage strategies between data types
- Save the cost by network cost, instances cost, storage
cost by metric monitoring & alert system
Join The Flight
● Eway:
○ Senior / Experience Java Backend
○ Senior / Experience PHP FullStack
○ Senior / Experience Data Scientist
○ System Admin
○ Product Operation Executive
○ Business Analysis
Join The Flight
● Eway: hr@eway.vn
● Me: tupp@eway.vn

More Related Content

What's hot

How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
James Chittenden
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit   master gcp big data summit la - 10-20-2015Google cloud big data summit   master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
 

What's hot (20)

How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
 
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Google Cloud Platform (GCP)
Google Cloud Platform (GCP)
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
Data Science on Google Cloud Platform
Data Science on Google Cloud PlatformData Science on Google Cloud Platform
Data Science on Google Cloud Platform
 
Big Data and ML on Google Cloud
Big Data and ML on Google CloudBig Data and ML on Google Cloud
Big Data and ML on Google Cloud
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple Sources
 
StackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinStackEngine Demo - Docker Austin
StackEngine Demo - Docker Austin
 
Getting started with GCP ( Google Cloud Platform)
Getting started with GCP ( Google  Cloud Platform)Getting started with GCP ( Google  Cloud Platform)
Getting started with GCP ( Google Cloud Platform)
 
Google Cloud Platform Data Storage
Google Cloud Platform Data StorageGoogle Cloud Platform Data Storage
Google Cloud Platform Data Storage
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
Google Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your ProductGoogle Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your Product
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit   master gcp big data summit la - 10-20-2015Google cloud big data summit   master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
 
Make your data talk
Make your data talkMake your data talk
Make your data talk
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited
 

Similar to End To End Machine Learning With Google Cloud

Similar to End To End Machine Learning With Google Cloud (20)

Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
 
Building what's next with google cloud's powerful infrastructure
Building what's next with google cloud's powerful infrastructureBuilding what's next with google cloud's powerful infrastructure
Building what's next with google cloud's powerful infrastructure
 
GDSC Cloud Jam.pptx
GDSC Cloud Jam.pptxGDSC Cloud Jam.pptx
GDSC Cloud Jam.pptx
 
Data Infrastructure in Kumparan
Data Infrastructure in KumparanData Infrastructure in Kumparan
Data Infrastructure in Kumparan
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
 
Tran Minh Duc - Certified Hybris Dev
Tran Minh Duc - Certified Hybris DevTran Minh Duc - Certified Hybris Dev
Tran Minh Duc - Certified Hybris Dev
 
LeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration ServicesLeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration Services
 
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampBuilding Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
 
仕事ではじめる機械学習
仕事ではじめる機械学習仕事ではじめる機械学習
仕事ではじめる機械学習
 
OSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
OSDC 2019 | Democratizing Data at Go-JEK by Maulik SonejiOSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
OSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
 
Google Cloud Platform Update
Google Cloud Platform UpdateGoogle Cloud Platform Update
Google Cloud Platform Update
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...
Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...
Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)
 
GCP-pde.pdf
GCP-pde.pdfGCP-pde.pdf
GCP-pde.pdf
 

More from Tu Pham

Secure your app against DDOS, API Abuse, Hijacking, and Fraud
 Secure your app against DDOS, API Abuse, Hijacking, and Fraud Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Tu Pham
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
Tu Pham
 

More from Tu Pham (20)

Go from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptxGo from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptx
 
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
 Secure your app against DDOS, API Abuse, Hijacking, and Fraud Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
 
Challenges In Implementing SRE
Challenges In Implementing SREChallenges In Implementing SRE
Challenges In Implementing SRE
 
IT Strategy
IT Strategy IT Strategy
IT Strategy
 
Set up Learn and Development program
Set up Learn and Development programSet up Learn and Development program
Set up Learn and Development program
 
Cost Management For IT Project / Product
Cost Management For IT Project / ProductCost Management For IT Project / Product
Cost Management For IT Project / Product
 
Minimum Viable Product 101
Minimum Viable Product 101Minimum Viable Product 101
Minimum Viable Product 101
 
Understand your customers
Understand your customersUnderstand your customers
Understand your customers
 
Let's build great products for mid-size companies
Let's build great products for mid-size companiesLet's build great products for mid-size companies
Let's build great products for mid-size companies
 
Latency Control And Supervision In Resilience Design Patterns
Latency Control And Supervision In Resilience Design Patterns Latency Control And Supervision In Resilience Design Patterns
Latency Control And Supervision In Resilience Design Patterns
 
High Output Tech Management
High Output Tech Management High Output Tech Management
High Output Tech Management
 
Security On The Cloud
Security On The CloudSecurity On The Cloud
Security On The Cloud
 
Eway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding GuidelinesEway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding Guidelines
 
Eway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge SharingEway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge Sharing
 
Php 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparisonPhp 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparison
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
 
Database, data storage, hosting with Firebase
Database, data storage, hosting with FirebaseDatabase, data storage, hosting with Firebase
Database, data storage, hosting with Firebase
 
Recommendation system for ecommerce
Recommendation system for ecommerceRecommendation system for ecommerce
Recommendation system for ecommerce
 
Data warehouse solutions
Data warehouse solutionsData warehouse solutions
Data warehouse solutions
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

End To End Machine Learning With Google Cloud

  • 1. End To End Machine Learning With Google Cloud Tu Pham CTO @ Eway JSC Data Science and Machine Learning Connect -- Google Thailand -- Google IO Extended 2018 -- Ha Noi, Viet Nam --
  • 2. About Me Tu Pham - Google Developer Expert on Cloud Platform - CTO at Eway JSC - Open source contributor, blogger, father - 8 years experience on Big data and Cloud Computing
  • 3. Since 2016 - Google IO Extended - Google Next Extended - Devfest Viet Nam - Study Jam - And so many events … Our biggest event in Viet Nam got > 1600 attendees About Google Cloud Platform Viet Nam Community
  • 4. About Eway - AdFlex.asia: Top 1 CPA network in Viet Nam - MasOffer.net: Top 1 CPS network in Viet Nam - DYNO.vn: Big data services for Fintech and Online Advertising - iHR: Big data for recruiting - eDoctor.io: Platform for healthcare
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. In 2013: We have small data, manual workflow And … a lot of people for manual everything Migrate to Google Cloud from 2013
  • 12.
  • 14. Our Very First Data Flow in 2013 1. Remember where I save the data 2. Use tools to extract a part data (SQL, Spark, Pandas, ...) 3. Download the data then send it to the data science team 4. Wait for the model then manual apply by deploy as services 5. If something wrong, back to step 1
  • 16. Redesign The Flow Principles: - KISS (Keep it simple, stupid) - DRY (Don’t Repeat Yourself) - Single Responsibility - Low Cost - Scalable
  • 18. Redesign The Flow 1. GC Compute Engine instances collect raw data 2. GC Compute Engine instances convert raw data to Apache Parquet files 3. GC Compute Engine upload parquet file to GC Cloud Storage 4. Explore dataset using GC Datalab 5. Develop a machine learning model in Tensorflow / Scikit learn 6. Train a machine learning model at scale on GC Cloud ML Engine 7. Deploy the trained ML model by Web API to GC Compute Engine instances 8. Expose Web API via GC Load Balancing
  • 19. End-to-end ML with Google Cloud
  • 20. Step 1: GC Compute Engine Instances Collect Raw Data - Technology: Cloud Load Balancing, Compute Engine - Why Cloud Load Balancing: - TCP/UDP Load Balancing - Seamless Autoscaling - Scalable - Why Compute Engine: - High-Performance - Scalable - Low Cost - Fast Networking - Custom Machine Types
  • 21. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files - Technology: Compute Engine, Parquet file format - Why Parquet: - Self-describing - Columnar storage format - Language-independent - High query-performance - Spark SQL is much faster with Parquet - High compression (up to 70%)- less disk IO
  • 22. - Technology: Compute Engine, Parquet file format, Cloud Storage - Why Cloud Storage: - Four storage classes - Easy to integrate - Object Lifecycle Management - Fast Networking Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage
  • 23. Step 4: Explore Dataset Using GC Datalab - Technology: Cloud Datalab - Why Datalab: - Integrated with: Cloud BigQuery, Cloud Machine Learning Engine, Cloud Storage, and Stackdriver Monitoring - IPython Support & Notebook Format - Interactive Data Visualization - Multi-Language Support: Python, SQL, and JavaScript (for BigQuery user-defined functions
  • 24. - Technology: Tensorflow / Scikit Learn - Why Tensorflow: - Deep learning neural networks - Huge community: contributors, researchers, developers - Train models fast - Allows developers to iterate quickly - Run on large scale server - Multi language support Step 5: Develop Machine Learning Models In Tensorflow / Scikit Learn
  • 25. Step 6: Train Machine Learning Models At Scale On GC Machine Learning Engine - Technology: Cloud Machine Learning Engine - Why Machine Learning Engine: - Automatic Resource Provisioning - Server-Side Preprocessing - HyperTune - Portable Models - Integrated: Cloud Dataflow, Cloud Storage - Multiple Frameworks: scikit-learn, XGBoost, Keras, TensorFlow
  • 26. Be 1% better everyday tips - Plan your system principles - Single responsibility for everything - Design system architecture, data flow, data model, data structure first - Separate realtime and batch flows - Separate data storage strategies between data types - Save the cost by network cost, instances cost, storage cost by metric monitoring & alert system
  • 27. Join The Flight ● Eway: ○ Senior / Experience Java Backend ○ Senior / Experience PHP FullStack ○ Senior / Experience Data Scientist ○ System Admin ○ Product Operation Executive ○ Business Analysis
  • 28. Join The Flight ● Eway: hr@eway.vn ● Me: tupp@eway.vn