SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alexandra Johnson - Software Engineer, SigOpt
alexandra@sigopt.com
Twitter: @alexandraj777
Machine Learning Fundamentals
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What Is Machine Learning?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
(Don't say "machine" or "learning")
What Is Machine Learning?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
(Don't say "machine" or "learning")
A solution to a problem that improves with data
What Is Machine Learning?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
(Don't say "machine" or "learning")
A solution to a problem that improves with data
Data: emails, articles, images, list of homes
What Is Machine Learning?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
(Don't say "machine" or "learning")
A solution to a problem that improves with data
Data: emails, articles, images, list of homes
Problem: label an email as spam (classification), predict a
home's price (regression), and others
What Is Machine Learning?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
● Problem: quickly identify if an email is spam or not spam
● Data: a list of emails, a list of "labels" spam or not spam
● Goal: function that will correctly label never-before-seen
emails as spam or not spam
Example: Classify Spam Emails
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
● Pick a model: xgboost, random forest, mxnet CNN, etc
● Transform your data to be readable by the model
● Feature engineering: explore your data to pick out
information you is important
Build - Train - Tune - Deploy
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
● Model: random forest
● Features: percentage of misspelled words, number of
words from a blacklist, domain name of email sender
Build - Train - Tune - Deploy Example
def extract_features(email):
return [
email.mispelled_words,
email.words_on_blacklist,
email.sender.domain,
]
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
● Expose the model to your data so it can better solve your
problem
● Think of a model as a class, this method has already
been implemented
● Compute intensive, best done on a server
Build - Train - Tune - Deploy
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
● Model: random forest
● Features: percentage
of misspelled words,
number of words from
a blacklist, domain
name of email sender
Build - Train - Tune - Deploy Example
email_features = [
[0.1, 1, 'hotmail.com'],
[0.7, 20, 'gmail.com'],
[0.3, 92, 'yahoo.com'],
]
labels = [0, 1, 1]
model = RandomForest()
model.train(email_features, labels)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build - Train - Tune - Deploy
● Models have tunable knobs, aka "hyperparameters"
● Different hyperparameters = different performance
● Train data set for training, validation data set for
measuring performance
● Overfitting: your model is really good on your old data, but
really bad on never-before-seen data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build - Train - Tune - Deploy Example
def evaluate(num_leaves, max_depth):
train_data, train_labels, validation_data, validation_labels
= split(email_features, labels)
model = RandomForest(num_leaves=num_leaves, max_depth=max_depth)
model.train(train_data, train_labels)
validation_score = model.score(validation_data, validation_labels)
return validation_score
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build - Train - Tune - Deploy
● We train our model to solve our problem on old data but
we really want to solve our problem on new data
● Create a REST endpoint for accessing the model
● A/B test different versions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
model = RandomForest(best_hyperparameters)
model.train(emails, labels)
def is_spam(email):
email_features = extract_features(email)
return model.predict(email_features)
Build - Train - Tune - Deploy Example
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thanks! Questions?
alexandra@sigopt.com
Twitter: @alexandraj777

More Related Content

What's hot

Bayesian Global Optimization
Bayesian Global OptimizationBayesian Global Optimization
Bayesian Global Optimization
Amazon Web Services
 
Common Problems in Hyperparameter Optimization
Common Problems in Hyperparameter OptimizationCommon Problems in Hyperparameter Optimization
Common Problems in Hyperparameter Optimization
SigOpt
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
Hayim Makabee
 
C3 w1
C3 w1C3 w1
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
Shubhmay Potdar
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessing
Ajay Taneja
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Sri Ambati
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
Faisal Siddiqi
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
MLconf
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
safa cimenli
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
Gülden Bilgütay
 
C3 w2
C3 w2C3 w2
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
HJ van Veen
 
Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)
Arjun Varma
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
Bernard Ong
 
AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series
PolarSeven Pty Ltd
 

What's hot (20)

Bayesian Global Optimization
Bayesian Global OptimizationBayesian Global Optimization
Bayesian Global Optimization
 
Common Problems in Hyperparameter Optimization
Common Problems in Hyperparameter OptimizationCommon Problems in Hyperparameter Optimization
Common Problems in Hyperparameter Optimization
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
C3 w1
C3 w1C3 w1
C3 w1
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessing
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
C3 w2
C3 w2C3 w2
C3 w2
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series
 

Similar to Machine Learning Fundamentals

Accelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMakerAccelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMaker
Amazon Web Services
 
Build Your Recommendation Engine on AWS Today!
Build Your Recommendation Engine on AWS Today!Build Your Recommendation Engine on AWS Today!
Build Your Recommendation Engine on AWS Today!
AWS Germany
 
Build, train and deploy your ML models with Amazon Sage Maker
Build, train and deploy your ML models with Amazon Sage MakerBuild, train and deploy your ML models with Amazon Sage Maker
Build, train and deploy your ML models with Amazon Sage Maker
AWS User Group Bengaluru
 
Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNet
Apache MXNet
 
Introduction to Sagemaker
Introduction to SagemakerIntroduction to Sagemaker
Introduction to Sagemaker
Amazon Web Services
 
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
Amazon Web Services
 
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon Web Services
 
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Amazon Web Services
 
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
Yotam Yarden
 
Introducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech TalksIntroducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech Talks
Amazon Web Services
 
Building a Recommender System on AWS
Building a Recommender System on AWSBuilding a Recommender System on AWS
Building a Recommender System on AWS
Amazon Web Services
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Amazon Web Services
 
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Amazon Web Services
 
How Trupanion Became an AI-driven Company for Pets
How Trupanion Became an AI-driven Company for PetsHow Trupanion Became an AI-driven Company for Pets
How Trupanion Became an AI-driven Company for Pets
Amazon Web Services
 
A Gentle Intro to Deep Learning
A Gentle Intro to Deep LearningA Gentle Intro to Deep Learning
A Gentle Intro to Deep Learning
Gabe Hollombe
 
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
Amazon Web Services
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon Web Services
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
Amazon Web Services
 
Quickly and easily build, train, and deploy machine learning models at any scale
Quickly and easily build, train, and deploy machine learning models at any scaleQuickly and easily build, train, and deploy machine learning models at any scale
Quickly and easily build, train, and deploy machine learning models at any scale
AWS Germany
 
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
Amazon Web Services
 

Similar to Machine Learning Fundamentals (20)

Accelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMakerAccelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMaker
 
Build Your Recommendation Engine on AWS Today!
Build Your Recommendation Engine on AWS Today!Build Your Recommendation Engine on AWS Today!
Build Your Recommendation Engine on AWS Today!
 
Build, train and deploy your ML models with Amazon Sage Maker
Build, train and deploy your ML models with Amazon Sage MakerBuild, train and deploy your ML models with Amazon Sage Maker
Build, train and deploy your ML models with Amazon Sage Maker
 
Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNet
 
Introduction to Sagemaker
Introduction to SagemakerIntroduction to Sagemaker
Introduction to Sagemaker
 
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
 
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
 
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
 
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
 
Introducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech TalksIntroducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech Talks
 
Building a Recommender System on AWS
Building a Recommender System on AWSBuilding a Recommender System on AWS
Building a Recommender System on AWS
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
 
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
 
How Trupanion Became an AI-driven Company for Pets
How Trupanion Became an AI-driven Company for PetsHow Trupanion Became an AI-driven Company for Pets
How Trupanion Became an AI-driven Company for Pets
 
A Gentle Intro to Deep Learning
A Gentle Intro to Deep LearningA Gentle Intro to Deep Learning
A Gentle Intro to Deep Learning
 
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
 
Quickly and easily build, train, and deploy machine learning models at any scale
Quickly and easily build, train, and deploy machine learning models at any scaleQuickly and easily build, train, and deploy machine learning models at any scale
Quickly and easily build, train, and deploy machine learning models at any scale
 
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
How Peak.AI Uses Amazon SageMaker for Product Personalization (GPSTEC316) - A...
 

More from SigOpt

Optimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment ManagementOptimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment Management
SigOpt
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
SigOpt
 
Efficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric OptimizationEfficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric Optimization
SigOpt
 
Detecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep LearningDetecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep Learning
SigOpt
 
Metric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseMetric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use Case
SigOpt
 
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategyTuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
SigOpt
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
SigOpt
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1
SigOpt
 
Tuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model PerformanceTuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model Performance
SigOpt
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
SigOpt
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019
SigOpt
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques Webinar
SigOpt
 
SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale
SigOpt
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
SigOpt
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
SigOpt
 
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt
 
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt
 
SigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the Untunable
SigOpt
 
SigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimization
SigOpt
 
Lessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scaleLessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scale
SigOpt
 

More from SigOpt (20)

Optimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment ManagementOptimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment Management
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
 
Efficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric OptimizationEfficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric Optimization
 
Detecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep LearningDetecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep Learning
 
Metric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseMetric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use Case
 
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategyTuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1
 
Tuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model PerformanceTuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model Performance
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques Webinar
 
SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
 
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
 
SigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the Untunable
 
SigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimization
 
Lessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scaleLessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scale
 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Machine Learning Fundamentals

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alexandra Johnson - Software Engineer, SigOpt alexandra@sigopt.com Twitter: @alexandraj777 Machine Learning Fundamentals
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What Is Machine Learning?
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. (Don't say "machine" or "learning") What Is Machine Learning?
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. (Don't say "machine" or "learning") A solution to a problem that improves with data What Is Machine Learning?
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. (Don't say "machine" or "learning") A solution to a problem that improves with data Data: emails, articles, images, list of homes What Is Machine Learning?
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. (Don't say "machine" or "learning") A solution to a problem that improves with data Data: emails, articles, images, list of homes Problem: label an email as spam (classification), predict a home's price (regression), and others What Is Machine Learning?
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Problem: quickly identify if an email is spam or not spam ● Data: a list of emails, a list of "labels" spam or not spam ● Goal: function that will correctly label never-before-seen emails as spam or not spam Example: Classify Spam Emails
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Pick a model: xgboost, random forest, mxnet CNN, etc ● Transform your data to be readable by the model ● Feature engineering: explore your data to pick out information you is important Build - Train - Tune - Deploy
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Model: random forest ● Features: percentage of misspelled words, number of words from a blacklist, domain name of email sender Build - Train - Tune - Deploy Example def extract_features(email): return [ email.mispelled_words, email.words_on_blacklist, email.sender.domain, ]
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Expose the model to your data so it can better solve your problem ● Think of a model as a class, this method has already been implemented ● Compute intensive, best done on a server Build - Train - Tune - Deploy
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Model: random forest ● Features: percentage of misspelled words, number of words from a blacklist, domain name of email sender Build - Train - Tune - Deploy Example email_features = [ [0.1, 1, 'hotmail.com'], [0.7, 20, 'gmail.com'], [0.3, 92, 'yahoo.com'], ] labels = [0, 1, 1] model = RandomForest() model.train(email_features, labels)
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build - Train - Tune - Deploy ● Models have tunable knobs, aka "hyperparameters" ● Different hyperparameters = different performance ● Train data set for training, validation data set for measuring performance ● Overfitting: your model is really good on your old data, but really bad on never-before-seen data
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build - Train - Tune - Deploy Example def evaluate(num_leaves, max_depth): train_data, train_labels, validation_data, validation_labels = split(email_features, labels) model = RandomForest(num_leaves=num_leaves, max_depth=max_depth) model.train(train_data, train_labels) validation_score = model.score(validation_data, validation_labels) return validation_score
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build - Train - Tune - Deploy ● We train our model to solve our problem on old data but we really want to solve our problem on new data ● Create a REST endpoint for accessing the model ● A/B test different versions
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. model = RandomForest(best_hyperparameters) model.train(emails, labels) def is_spam(email): email_features = extract_features(email) return model.predict(email_features) Build - Train - Tune - Deploy Example
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thanks! Questions? alexandra@sigopt.com Twitter: @alexandraj777