SlideShare a Scribd company logo
1 of 16
1
AMAZON
SAGE MAKER
BUILT-IN
ALGORITHMS
FOR MACHINE LEARNING
2
3
SUPERVISED
LEARNING
4
Problem
types
Data
input
format
Data input format
Built-in
algorithms
Binary/multi-
class
classification
Tabular
Predict if an item belongs to a category
An email spam filter
Factorization
Machines
Algorithm, K-
Nearest
Neighbors (k-NN)
Algorithm, Linear
Learner
Algorithm,
XGBoost
Algorithm
Regression Tabular
Predict a numeric/continuous value: estimate
the value of a house
Time-series
forecasting
Tabular
Based on historical data for a behavior,
predict future behavior: predict sales on a
new product based on previous sales data.
DeepAR
Forecasting
Algorithm
5
CLASSIFICATION AND REGRESSION PROBLEMS
● Linear Learner Algorithm—learns a linear function for regression or a linear
threshold function for classification.
● Factorization Machines Algorithm—an extension of a linear model that is designed
to economically capture interactions between features within high-dimensional
sparse datasets.
● XGBoost Algorithm—implementation of the gradient-boosted trees algorithm that
combines an ensemble of estimates from a set of simpler and weaker models.
● K-Nearest Neighbors (k-NN) Algorithm—a non-parametric method that uses the k
nearest labeled points to assign a label to a new data point for classification or a
predicted target value from the average of the k nearest points for regression.
6
FEATURE ENGINEERING AND FORECATING TIMES-ERIES DATA
● Object2Vec Algorithm—a new highly customizable multi-purpose algorithm used for
feature engineering. It can learn low-dimensional dense embeddings of high-
dimensional objects to produce features that improve training efficiencies for
downstream models. While this is a supervised algorithm, as it requires labeled data
for training, there are many scenarios in which the relationship labels can be
obtained purely from natural clusterings in data, without any explicit human
annotation.
● DeepAR Forecasting Algorithm—a supervised learning algorithm for forecasting
scalar (one-dimensional) time series using recurrent neural networks (RNN).
7
UNSUPERVISED
LEARNING
8
Problem
types
Data
input
format
Data input format
Built-in
algorithms
Feature
engineering:
dimensionalit
y reduction
Tabular
Drop those columns from a dataset that have
a weak relation with the label/target variable:
the color of a car when predicting its
mileage.
Principal
Component
Analysis (PCA)
Algorithm
Anomaly
detection
Tabular
Detect abnormal behavior in application:
spot when an IoT sensor is sending
abnormal readings
Random Cut
Forest (RCF)
Algorithm
IP anomaly
detection
Tabular
Protect your application from suspicious
users: detect if an IP address accessing a
service might be from a bad actor
IP Insights
9
Problem
types
Data
input
format
Data input format
Built-in
algorithms
Embeddings:
convert high-
dimensional
objects into
low-
dimensional
space.
Tabular
Improve the data embeddings of the high-
dimensional objects: identify duplicate
support tickets or find the correct routing
based on similarity of text in the tickets
Object2Vec
Algorithm
Clustering or
grouping
Tabular
Group similar objects/data together: find
high-, medium-, and low-spending customers
from their transaction histories
K-Means
Algorithm
Topic
modeling
Text
Organize a set of documents into topics (not
known in advance): tag a document as
belonging to a medical category based on
the terms used in the document.
Latent Dirichlet
Allocation (LDA)
Algorithm, Neural
Topic Model
(NTM) Algorithm
10
CLUSTERING, DIMENSION REDUCTION, PATTERN RECOGNITION, ANOMALY DETECTION
● Principal Component Analysis (PCA) Algorithm—reduces the dimensionality
(number of features) within a dataset by projecting data points onto the first few
principal components. The objective is to retain as much information or variation as
possible. For mathematicians, principal components are eigenvectors of the data's
covariance matrix.
● K-Means Algorithm—finds discrete groupings within data, where members of a
group are as similar as possible to one another and as different as possible from
members of other groups.
● IP Insights—learns the usage patterns for IPv4 addresses. It is designed to capture
associations between IPv4 addresses and various entities, such as user IDs or
account numbers.
● Random Cut Forest (RCF) Algorithm—detects anomalous data points within a data
set that diverge from otherwise well-structured or patterned data.
11
TEXTUAL
ANALYSIS
12
Problem
types
Data
input
format
Data input format
Built-in
algorithms
Text
classification
Text
Assign pre-defined categories to documents
in a corpus: categorize books in a library into
academic disciplines
BlazingText
algorithm
Machine
translation
algorithm
Text
Convert text from one language to other:
Spanish to English
Sequence-to-
Sequence
Algorithm
Text
summarization
Text
Summarize a long text corpus: an abstract
for a research paper
Sequence-to-
Sequence
Algorithm
Speech-to-text Text
Convert audio files to text: transcribe call
center conversations for further analysis
Sequence-to-
Sequence
Algorithm
13
NLP, DOCUMENT CLASSIFICATION/SUMMARIZATION, TOPIC
MODELING/CLASSIFICATION LANGUAGE TRANSLATION
● BlazingText algorithm—a highly optimized implementation of the Word2vec and text
classification algorithms that scale to large datasets easily. It is useful for many
downstream natural language processing (NLP) tasks.
● Sequence-to-Sequence Algorithm—a supervised algorithm commonly used for
neural machine translation.
● Latent Dirichlet Allocation (LDA) Algorithm—an algorithm suitable for determining
topics in a set of documents. It is an unsupervised algorithm, which means that it
doesn't use example data with answers during training.
● Neural Topic Model (NTM) Algorithm—another unsupervised technique for
determining topics in a set of documents, using a neural network approach.
14
IMAGE
PROCESSING
15
Problem
types
Data
input
format
Data input format
Built-in
algorithms
Image and
multi-label
classification
Image
Label/tag an image based on the content of
the image: alerts about adult content in an
image
Image
Classification
Algorithm
Object
detection and
classification
Image
Detect people and objects in an image:
police review a large photo gallery for a
missing person
Object Detection
Algorithm
Computer
vision
Image
Tag every pixel of an image individually with
a category: self-driving cars prepare to
identify objects in their way
Semantic
Segmentation
Algorithm
16
IMAGE CLASSIFICATION, OBJECT DETECTION, AND COMPUTER VISION
● Image Classification Algorithm—uses example data with answers (referred to as a
supervised algorithm). Use this algorithm to classify images.
● Semantic Segmentation Algorithm—provides a fine-grained, pixel-level approach to
developing computer vision applications.
● Object Detection Algorithm—detects and classifies objects in images using a single
deep neural network. It is a supervised learning algorithm that takes images as input
and identifies all instances of objects within the image scene.

More Related Content

Similar to Sagemaker built_in algorithems.pptx

Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueVijayananda Mohire
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Premanand naik data_scientist_4years_pune
Premanand naik data_scientist_4years_punePremanand naik data_scientist_4years_pune
Premanand naik data_scientist_4years_punepremanand naik
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Ahmed Kamal
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationijcsit
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Spark Summit
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docxrohithprabhas1
 
stock prise prediction.pptx
stock prise prediction.pptxstock prise prediction.pptx
stock prise prediction.pptxDishaAnkar
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 
flowchart ON DEEP LEARNING SPP
flowchart ON DEEP LEARNING SPPflowchart ON DEEP LEARNING SPP
flowchart ON DEEP LEARNING SPPDishaAnkar
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsimtiaz khan
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep LearningIRJET Journal
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBigML, Inc
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleAmazon Web Services
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSSKevin Crocker
 

Similar to Sagemaker built_in algorithems.pptx (20)

Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogue
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Premanand naik data_scientist_4years_pune
Premanand naik data_scientist_4years_punePremanand naik data_scientist_4years_pune
Premanand naik data_scientist_4years_pune
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
 
Machine learning
Machine learningMachine learning
Machine learning
 
FINAL REVIEW
FINAL REVIEWFINAL REVIEW
FINAL REVIEW
 
stock prise prediction.pptx
stock prise prediction.pptxstock prise prediction.pptx
stock prise prediction.pptx
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
flowchart ON DEEP LEARNING SPP
flowchart ON DEEP LEARNING SPPflowchart ON DEEP LEARNING SPP
flowchart ON DEEP LEARNING SPP
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET -  	  Automated Essay Grading System using Deep LearningIRJET -  	  Automated Essay Grading System using Deep Learning
IRJET - Automated Essay Grading System using Deep Learning
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
 

Recently uploaded

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 

Recently uploaded (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 

Sagemaker built_in algorithems.pptx

  • 2. 2
  • 4. 4 Problem types Data input format Data input format Built-in algorithms Binary/multi- class classification Tabular Predict if an item belongs to a category An email spam filter Factorization Machines Algorithm, K- Nearest Neighbors (k-NN) Algorithm, Linear Learner Algorithm, XGBoost Algorithm Regression Tabular Predict a numeric/continuous value: estimate the value of a house Time-series forecasting Tabular Based on historical data for a behavior, predict future behavior: predict sales on a new product based on previous sales data. DeepAR Forecasting Algorithm
  • 5. 5 CLASSIFICATION AND REGRESSION PROBLEMS ● Linear Learner Algorithm—learns a linear function for regression or a linear threshold function for classification. ● Factorization Machines Algorithm—an extension of a linear model that is designed to economically capture interactions between features within high-dimensional sparse datasets. ● XGBoost Algorithm—implementation of the gradient-boosted trees algorithm that combines an ensemble of estimates from a set of simpler and weaker models. ● K-Nearest Neighbors (k-NN) Algorithm—a non-parametric method that uses the k nearest labeled points to assign a label to a new data point for classification or a predicted target value from the average of the k nearest points for regression.
  • 6. 6 FEATURE ENGINEERING AND FORECATING TIMES-ERIES DATA ● Object2Vec Algorithm—a new highly customizable multi-purpose algorithm used for feature engineering. It can learn low-dimensional dense embeddings of high- dimensional objects to produce features that improve training efficiencies for downstream models. While this is a supervised algorithm, as it requires labeled data for training, there are many scenarios in which the relationship labels can be obtained purely from natural clusterings in data, without any explicit human annotation. ● DeepAR Forecasting Algorithm—a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN).
  • 8. 8 Problem types Data input format Data input format Built-in algorithms Feature engineering: dimensionalit y reduction Tabular Drop those columns from a dataset that have a weak relation with the label/target variable: the color of a car when predicting its mileage. Principal Component Analysis (PCA) Algorithm Anomaly detection Tabular Detect abnormal behavior in application: spot when an IoT sensor is sending abnormal readings Random Cut Forest (RCF) Algorithm IP anomaly detection Tabular Protect your application from suspicious users: detect if an IP address accessing a service might be from a bad actor IP Insights
  • 9. 9 Problem types Data input format Data input format Built-in algorithms Embeddings: convert high- dimensional objects into low- dimensional space. Tabular Improve the data embeddings of the high- dimensional objects: identify duplicate support tickets or find the correct routing based on similarity of text in the tickets Object2Vec Algorithm Clustering or grouping Tabular Group similar objects/data together: find high-, medium-, and low-spending customers from their transaction histories K-Means Algorithm Topic modeling Text Organize a set of documents into topics (not known in advance): tag a document as belonging to a medical category based on the terms used in the document. Latent Dirichlet Allocation (LDA) Algorithm, Neural Topic Model (NTM) Algorithm
  • 10. 10 CLUSTERING, DIMENSION REDUCTION, PATTERN RECOGNITION, ANOMALY DETECTION ● Principal Component Analysis (PCA) Algorithm—reduces the dimensionality (number of features) within a dataset by projecting data points onto the first few principal components. The objective is to retain as much information or variation as possible. For mathematicians, principal components are eigenvectors of the data's covariance matrix. ● K-Means Algorithm—finds discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups. ● IP Insights—learns the usage patterns for IPv4 addresses. It is designed to capture associations between IPv4 addresses and various entities, such as user IDs or account numbers. ● Random Cut Forest (RCF) Algorithm—detects anomalous data points within a data set that diverge from otherwise well-structured or patterned data.
  • 12. 12 Problem types Data input format Data input format Built-in algorithms Text classification Text Assign pre-defined categories to documents in a corpus: categorize books in a library into academic disciplines BlazingText algorithm Machine translation algorithm Text Convert text from one language to other: Spanish to English Sequence-to- Sequence Algorithm Text summarization Text Summarize a long text corpus: an abstract for a research paper Sequence-to- Sequence Algorithm Speech-to-text Text Convert audio files to text: transcribe call center conversations for further analysis Sequence-to- Sequence Algorithm
  • 13. 13 NLP, DOCUMENT CLASSIFICATION/SUMMARIZATION, TOPIC MODELING/CLASSIFICATION LANGUAGE TRANSLATION ● BlazingText algorithm—a highly optimized implementation of the Word2vec and text classification algorithms that scale to large datasets easily. It is useful for many downstream natural language processing (NLP) tasks. ● Sequence-to-Sequence Algorithm—a supervised algorithm commonly used for neural machine translation. ● Latent Dirichlet Allocation (LDA) Algorithm—an algorithm suitable for determining topics in a set of documents. It is an unsupervised algorithm, which means that it doesn't use example data with answers during training. ● Neural Topic Model (NTM) Algorithm—another unsupervised technique for determining topics in a set of documents, using a neural network approach.
  • 15. 15 Problem types Data input format Data input format Built-in algorithms Image and multi-label classification Image Label/tag an image based on the content of the image: alerts about adult content in an image Image Classification Algorithm Object detection and classification Image Detect people and objects in an image: police review a large photo gallery for a missing person Object Detection Algorithm Computer vision Image Tag every pixel of an image individually with a category: self-driving cars prepare to identify objects in their way Semantic Segmentation Algorithm
  • 16. 16 IMAGE CLASSIFICATION, OBJECT DETECTION, AND COMPUTER VISION ● Image Classification Algorithm—uses example data with answers (referred to as a supervised algorithm). Use this algorithm to classify images. ● Semantic Segmentation Algorithm—provides a fine-grained, pixel-level approach to developing computer vision applications. ● Object Detection Algorithm—detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene.