SlideShare a Scribd company logo
With AutomatedML,
Is Everyonean ML Engineer?
Dan Sullivan- DevFest Northeast 2020 – October 31,2020
Bio
• Principle Engineer, PEAK6 Technologies
• Author
• Instructor
• Udemy
• Google Cloud
• LinkedIn Learning
• Data Science
• Machine Learning
• Databases & Data Modeling
Overview
• Machine Learning Workflow
• Formulating an ML Problem
• Building ML Models in GCP
• Data Engineering
• Monitoring and Evaluating Fairness
1. Formulating an ML Problem
0. Machine Learning Workflow
• Formulate problem
• Identity data sources
• Prepare data
• Train, evaluate, and tune model
• Deploy model
• Use model in production
• Monitor and Evaluate Fairness
https://thenounproject.com/term/workflow/2409348/
Define the Problem to
Be Solved
• Informal description
• What is the value of
solving the problem?
• How can the problem
be solved
• Regression
• Classification
https://static.thenounproject.com/png/230138-200.png
IdentifyData Sources
• Amount of data available
• Quality
• Rate of generation
• Requirements to access
• Limitations on the use of data
https://commons.wikimedia.org/wiki/File:Data_types_-_en.svg
2. BuildingML Modelsin GCP
Services for BuildingModels
• Cloud AutoML
• AI Platform Training
• Kubeflow
• Dataproc with Spark ML
• BigQuery ML
CloudAutoML
• Designed for model builders with limited
ML experience
• GUI for training, evaluating, and tuning
• Services for sight, language, and
structured data
• AutoML Tables uses structured data to
build regression and classification
models
AI Platform Training
• Trains and runs models built in
• Tensorflow
• Scikit Learn
• XGBoost
• Hosted frameworks but can run custom containers
• Service provisions compute resources needed for a job and
then executes the job
Kubeflow
• Kubeflow is machine learning toolkit for
Kubernetes
• Packages models like applications
• Compose, deploy, and manage ML
workflows
Dataproc and Spark ML
• Dataproc is managed Spark and
Hadoop service
• Spark ML is machine learning
library
• ML Algorithms
• Feature engineering
• Pipelines
• Persistence
• Utilities
BigQueryML
• BigQuery is serverless analytical database
• BigQuery ML brings machine learning
functions to SQL
• Key advantages are:
• Ability to train and run models in
BigQuery
• Use SQL, not Python or ML frameworks
3. Data Engineering
CloudComposer
• Managed Apache Airflow Service
• Executes workflows defined in directed
acyclic graphs (DAGs)
• Accessed through console or command
line (gcloud composer environments)
Apache AirflowDAG
• Workflow is a collection of tasks with
dependencies.
• DAGs stored in Cloud Storage
• Supports custom plugins for
operators, hooks, and interfaces
• Python dependencies (packages)
DAGs are Python Programs
Source: https://cloud.google.com/composer/docs/how-to/using/writing-dags
CloudComposer Environments
• Deployed in environments, which are collections of
GCP service components based on Kubernetes Engine
• Uses a combination of tenant and customer project
resources
Architecture
Source: https://cloud.google.com/composer/docs/concepts/overview
CloudData Fusion
• Managed service based on open source CDAP
data analytics platform
• Code-free ETL/ELT development tool
• Over 160 connectors and transformations
• Drag-and-drop ETL/ELT construction
ExecutionEnvironment
• Cloud Data Fusion deployed as an instance
• Two editions
• Basic – visual designer, transformations, SDK, etc.
• Enterprise – Basic plus streaming pipelines,
integration metadata repository, high availability,
triggers, schedules, etc.
Visual Interface
Isource: https://cloud.google.com/data-fusion/docs/quickstart
4. Monitoringand Evaluating Fairness
Objective
• Understand performance of model
• Metrics to monitor:
• Traffic patterns
• Error rates
• Latency
• Resource utilization
• Configure alerts in Cloud Monitoring
MonitoringAI Platforms
• AI Platform exports metrics to
Cloud Monitoring
• Metrics
• Error count
• Latencies
• Accelerator utilization
• Memory utilization
• CPU utilization
• Network
• Prediction counts
MonitoringML ModelBest Practices
• Monitor for data skew
• Watch for changes in dependencies
• Models are refreshed as needed
• Assess model prediction quality
• Test for unfairness
Fairness
• Anti-classification
• Protected attributes not used in model
• Example: gender
• Classification parity
• Measures of predictive performance are equal across
groups
• Calibration
• Outcomes are independent of protected attributes
Fairness Resources
• Google’s Machine Learning Fairness
• https://developers.google.com/machine-
learning/fairness-overview
• AI Fairness 360 https://github.com/IBM/AIF360
• FairML https://github.com/adebayoj/fairml
• What-If Tool https://pair-code.github.io/what-if-tool/
QuickSummary
• Machine learning workflows
are multi-step
• Automated ML addresses
some, but not all steps
• Lots of data engineering and
monitoring still required

More Related Content

What's hot

Google Cloud Platform Updates
Google Cloud Platform UpdatesGoogle Cloud Platform Updates
Google Cloud Platform Updates
Romin Irani
 
Grokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking: Data Engineering Course
Grokking: Data Engineering Course
Grokking VN
 
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
European Collaboration Summit
 
Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...
Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...
Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...
Databricks
 
MLflow with R
MLflow with RMLflow with R
MLflow with R
Databricks
 
VulcanJS Austin Presentation
VulcanJS Austin PresentationVulcanJS Austin Presentation
VulcanJS Austin Presentation
Justin Reynolds
 
Airflow techtonic template
Airflow   techtonic templateAirflow   techtonic template
Airflow techtonic template
Sampath Kumar
 
Boost your APIs with GraphQL
Boost your APIs with GraphQLBoost your APIs with GraphQL
Boost your APIs with GraphQL
Jean-Francois James
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
Herman Wu
 
From desktop to the cloud with forge
From desktop to the cloud with forgeFrom desktop to the cloud with forge
From desktop to the cloud with forge
fpm2015
 
An intro to GraphQL
An intro to GraphQLAn intro to GraphQL
An intro to GraphQL
valuebound
 
PnP Monthly Community Call - December 2017
PnP Monthly Community Call - December 2017PnP Monthly Community Call - December 2017
PnP Monthly Community Call - December 2017
SharePoint Patterns and Practices
 
Graphql
GraphqlGraphql
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Databricks
 
CONDG April 23 2020 - Baskar Rao - GraphQL
CONDG April 23 2020 - Baskar Rao - GraphQLCONDG April 23 2020 - Baskar Rao - GraphQL
CONDG April 23 2020 - Baskar Rao - GraphQL
Matthew Groves
 
Google app engine
Google app engineGoogle app engine
Google app engine
esmaillhasanzadeh1
 
GraphQL Advanced
GraphQL AdvancedGraphQL Advanced
GraphQL Advanced
LeanIX GmbH
 
SharePoint Framework, React and Office UI SPS Paris 2016 - d01
SharePoint Framework, React and Office UI SPS Paris 2016 - d01SharePoint Framework, React and Office UI SPS Paris 2016 - d01
SharePoint Framework, React and Office UI SPS Paris 2016 - d01
Sonja Madsen
 
Using Azure Functions for Integration
Using Azure Functions for IntegrationUsing Azure Functions for Integration
Using Azure Functions for Integration
BizTalk360
 

What's hot (20)

Google Cloud Platform Updates
Google Cloud Platform UpdatesGoogle Cloud Platform Updates
Google Cloud Platform Updates
 
Grokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking: Data Engineering Course
Grokking: Data Engineering Course
 
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
 
Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...
Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...
Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...
 
MLflow with R
MLflow with RMLflow with R
MLflow with R
 
VulcanJS Austin Presentation
VulcanJS Austin PresentationVulcanJS Austin Presentation
VulcanJS Austin Presentation
 
Airflow techtonic template
Airflow   techtonic templateAirflow   techtonic template
Airflow techtonic template
 
Boost your APIs with GraphQL
Boost your APIs with GraphQLBoost your APIs with GraphQL
Boost your APIs with GraphQL
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
 
tapan_resume_v1
tapan_resume_v1tapan_resume_v1
tapan_resume_v1
 
From desktop to the cloud with forge
From desktop to the cloud with forgeFrom desktop to the cloud with forge
From desktop to the cloud with forge
 
An intro to GraphQL
An intro to GraphQLAn intro to GraphQL
An intro to GraphQL
 
PnP Monthly Community Call - December 2017
PnP Monthly Community Call - December 2017PnP Monthly Community Call - December 2017
PnP Monthly Community Call - December 2017
 
Graphql
GraphqlGraphql
Graphql
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
CONDG April 23 2020 - Baskar Rao - GraphQL
CONDG April 23 2020 - Baskar Rao - GraphQLCONDG April 23 2020 - Baskar Rao - GraphQL
CONDG April 23 2020 - Baskar Rao - GraphQL
 
Google app engine
Google app engineGoogle app engine
Google app engine
 
GraphQL Advanced
GraphQL AdvancedGraphQL Advanced
GraphQL Advanced
 
SharePoint Framework, React and Office UI SPS Paris 2016 - d01
SharePoint Framework, React and Office UI SPS Paris 2016 - d01SharePoint Framework, React and Office UI SPS Paris 2016 - d01
SharePoint Framework, React and Office UI SPS Paris 2016 - d01
 
Using Azure Functions for Integration
Using Azure Functions for IntegrationUsing Azure Functions for Integration
Using Azure Functions for Integration
 

Similar to With Automated ML, is Everyone an ML Engineer?

A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
Jesus Rodriguez
 
A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development Workflow
Databricks
 
Getting started with GCP ( Google Cloud Platform)
Getting started with GCP ( Google  Cloud Platform)Getting started with GCP ( Google  Cloud Platform)
Getting started with GCP ( Google Cloud Platform)
bigdata trunk
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Sotrender
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
Valdas Maksimavičius
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
GoDataDriven
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
Sarah Dutkiewicz
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
DataScienceConferenc1
 
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Lucas Jellema
 
Migrating Your Data Platform At a High Growth Startup
Migrating Your Data Platform At a High Growth StartupMigrating Your Data Platform At a High Growth Startup
Migrating Your Data Platform At a High Growth Startup
Databricks
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup
 
Forge - DevCon 2016: From Desktop to the Cloud with Forge
Forge - DevCon 2016: From Desktop to the Cloud with ForgeForge - DevCon 2016: From Desktop to the Cloud with Forge
Forge - DevCon 2016: From Desktop to the Cloud with Forge
Autodesk
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Debraj GuhaThakurta
 
Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQuery
Wlodek Bielski
 
MIGRATION - PAIN OR GAIN?
MIGRATION - PAIN OR GAIN?MIGRATION - PAIN OR GAIN?
MIGRATION - PAIN OR GAIN?
DrupalCamp Kyiv
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
Databricks
 
Going Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS GlueGoing Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS Glue
Michael Rainey
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
David Talby
 
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
GoDataDriven
 

Similar to With Automated ML, is Everyone an ML Engineer? (20)

A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
 
A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development Workflow
 
Getting started with GCP ( Google Cloud Platform)
Getting started with GCP ( Google  Cloud Platform)Getting started with GCP ( Google  Cloud Platform)
Getting started with GCP ( Google Cloud Platform)
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
 
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
 
Migrating Your Data Platform At a High Growth Startup
Migrating Your Data Platform At a High Growth StartupMigrating Your Data Platform At a High Growth Startup
Migrating Your Data Platform At a High Growth Startup
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
 
Forge - DevCon 2016: From Desktop to the Cloud with Forge
Forge - DevCon 2016: From Desktop to the Cloud with ForgeForge - DevCon 2016: From Desktop to the Cloud with Forge
Forge - DevCon 2016: From Desktop to the Cloud with Forge
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQuery
 
MIGRATION - PAIN OR GAIN?
MIGRATION - PAIN OR GAIN?MIGRATION - PAIN OR GAIN?
MIGRATION - PAIN OR GAIN?
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 
Going Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS GlueGoing Serverless - an Introduction to AWS Glue
Going Serverless - an Introduction to AWS Glue
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 

More from Dan Sullivan, Ph.D.

How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQuery
Dan Sullivan, Ph.D.
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery ML
Dan Sullivan, Ph.D.
 
Google Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningGoogle Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine Learning
Dan Sullivan, Ph.D.
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured data
Dan Sullivan, Ph.D.
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetup
Dan Sullivan, Ph.D.
 
Text mining meets neural nets
Text mining meets neural netsText mining meets neural nets
Text mining meets neural nets
Dan Sullivan, Ph.D.
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False Dichotomy
Dan Sullivan, Ph.D.
 
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
Big data, bioscience and the cloud   biocatalyst june 2015 sullivanBig data, bioscience and the cloud   biocatalyst june 2015 sullivan
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
Dan Sullivan, Ph.D.
 
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Dan Sullivan, Ph.D.
 
Modeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsModeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key Patterns
Dan Sullivan, Ph.D.
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Dan Sullivan, Ph.D.
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious Diseases
Dan Sullivan, Ph.D.
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
Dan Sullivan, Ph.D.
 

More from Dan Sullivan, Ph.D. (13)

How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQuery
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery ML
 
Google Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningGoogle Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine Learning
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured data
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetup
 
Text mining meets neural nets
Text mining meets neural netsText mining meets neural nets
Text mining meets neural nets
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False Dichotomy
 
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
Big data, bioscience and the cloud   biocatalyst june 2015 sullivanBig data, bioscience and the cloud   biocatalyst june 2015 sullivan
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
 
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
 
Modeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsModeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key Patterns
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious Diseases
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
 

Recently uploaded

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 

Recently uploaded (20)

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 

With Automated ML, is Everyone an ML Engineer?

  • 1. With AutomatedML, Is Everyonean ML Engineer? Dan Sullivan- DevFest Northeast 2020 – October 31,2020
  • 2. Bio • Principle Engineer, PEAK6 Technologies • Author • Instructor • Udemy • Google Cloud • LinkedIn Learning • Data Science • Machine Learning • Databases & Data Modeling
  • 3.
  • 4. Overview • Machine Learning Workflow • Formulating an ML Problem • Building ML Models in GCP • Data Engineering • Monitoring and Evaluating Fairness
  • 5. 1. Formulating an ML Problem
  • 6. 0. Machine Learning Workflow • Formulate problem • Identity data sources • Prepare data • Train, evaluate, and tune model • Deploy model • Use model in production • Monitor and Evaluate Fairness https://thenounproject.com/term/workflow/2409348/
  • 7. Define the Problem to Be Solved • Informal description • What is the value of solving the problem? • How can the problem be solved • Regression • Classification https://static.thenounproject.com/png/230138-200.png
  • 8. IdentifyData Sources • Amount of data available • Quality • Rate of generation • Requirements to access • Limitations on the use of data https://commons.wikimedia.org/wiki/File:Data_types_-_en.svg
  • 10. Services for BuildingModels • Cloud AutoML • AI Platform Training • Kubeflow • Dataproc with Spark ML • BigQuery ML
  • 11. CloudAutoML • Designed for model builders with limited ML experience • GUI for training, evaluating, and tuning • Services for sight, language, and structured data • AutoML Tables uses structured data to build regression and classification models
  • 12. AI Platform Training • Trains and runs models built in • Tensorflow • Scikit Learn • XGBoost • Hosted frameworks but can run custom containers • Service provisions compute resources needed for a job and then executes the job
  • 13. Kubeflow • Kubeflow is machine learning toolkit for Kubernetes • Packages models like applications • Compose, deploy, and manage ML workflows
  • 14. Dataproc and Spark ML • Dataproc is managed Spark and Hadoop service • Spark ML is machine learning library • ML Algorithms • Feature engineering • Pipelines • Persistence • Utilities
  • 15. BigQueryML • BigQuery is serverless analytical database • BigQuery ML brings machine learning functions to SQL • Key advantages are: • Ability to train and run models in BigQuery • Use SQL, not Python or ML frameworks
  • 17. CloudComposer • Managed Apache Airflow Service • Executes workflows defined in directed acyclic graphs (DAGs) • Accessed through console or command line (gcloud composer environments)
  • 18. Apache AirflowDAG • Workflow is a collection of tasks with dependencies. • DAGs stored in Cloud Storage • Supports custom plugins for operators, hooks, and interfaces • Python dependencies (packages)
  • 19. DAGs are Python Programs Source: https://cloud.google.com/composer/docs/how-to/using/writing-dags
  • 20. CloudComposer Environments • Deployed in environments, which are collections of GCP service components based on Kubernetes Engine • Uses a combination of tenant and customer project resources
  • 22. CloudData Fusion • Managed service based on open source CDAP data analytics platform • Code-free ETL/ELT development tool • Over 160 connectors and transformations • Drag-and-drop ETL/ELT construction
  • 23. ExecutionEnvironment • Cloud Data Fusion deployed as an instance • Two editions • Basic – visual designer, transformations, SDK, etc. • Enterprise – Basic plus streaming pipelines, integration metadata repository, high availability, triggers, schedules, etc.
  • 26. Objective • Understand performance of model • Metrics to monitor: • Traffic patterns • Error rates • Latency • Resource utilization • Configure alerts in Cloud Monitoring
  • 27. MonitoringAI Platforms • AI Platform exports metrics to Cloud Monitoring • Metrics • Error count • Latencies • Accelerator utilization • Memory utilization • CPU utilization • Network • Prediction counts
  • 28. MonitoringML ModelBest Practices • Monitor for data skew • Watch for changes in dependencies • Models are refreshed as needed • Assess model prediction quality • Test for unfairness
  • 29. Fairness • Anti-classification • Protected attributes not used in model • Example: gender • Classification parity • Measures of predictive performance are equal across groups • Calibration • Outcomes are independent of protected attributes
  • 30. Fairness Resources • Google’s Machine Learning Fairness • https://developers.google.com/machine- learning/fairness-overview • AI Fairness 360 https://github.com/IBM/AIF360 • FairML https://github.com/adebayoj/fairml • What-If Tool https://pair-code.github.io/what-if-tool/
  • 31. QuickSummary • Machine learning workflows are multi-step • Automated ML addresses some, but not all steps • Lots of data engineering and monitoring still required