SlideShare a Scribd company logo
1 of 28
Download to read offline
Supervised
Learning
Orozco Hsu
2022-05-16 1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
Tutorial
Content
3
Image Classification using Logistic Regression
Build a supervised model and prediction
Home work
What is the supervised learning
Code
• Download code
• https://github.com/orozcohsu/ntunhs_2023_01
• Folder/file
• 20230516_02
4
Add-ons
5
What is the supervised learning
6
Supervised learning vs. Unsupervised learning
• Supervised learning: discover patterns in the data that relate data
attributes with a target (class labeled) attribute.
• These patterns are then utilized to predict the values of the target attribute in
future data instances.
• Unsupervised learning: The data have no target attribute.
• We want to explore the data to find some intrinsic structures in them.
• Classic supervised learning algorithm
• Classification
• Regression
7
Supervised learning
8
Ref: https://www.tibco.com/reference-center/what-is-supervised-learning
What is Classification in Supervised Learning?
• Classification is where an algorithm is trained to classify input data on
discrete variables.
• During training, algorithms are given training input data with a class
label. For example, training data might consist of the last credit card
bills of a set of customers, labeled with whether they made a future
purchase or not.
• When a new customer’s credit balance is presented to the algorithm,
it classifies the customer to either will purchase or will not purchase
group.
9
What is Regression in Supervised Learning?
• Regression is a supervised learning method where an algorithm is
trained to predict an output from a continuous range of possible
values. For example, real estate training data would take note of the
location, area, and other relevant parameters. The output is the price
of the specific real estate.
• In regression, an algorithm needs to identify a functional relationship
between the input parameters and the output.
• The output value is not discrete like in classification, instead it is a
function of the continuous outputs.
10
Real-life Applications of Classification
• Binary classification (Most companies use)
• Spam detection
• Churn prediction
• Conversion prediction
• Imbalanced Classification
• Fraud detection: In the labeled data set used for training, only a small number of
inputs are labeled as a fraud.
• Medical diagnostics: In a large pool of samples, ones with a positive case of a disease
might be far less.
• Multi-class Classification
• Face classification: Based on the training data, a model categorizes a photo and maps
it to a specific person.
• Email classification: Multi-class classification is used to segregate emails into various
categories – social, education, work, and family.
11
Real-life Applications of Regression
• Linear regression
• It can be used to predict values within a continuous
range, (e.g. sales, price forecasting) or classifying
them into categories (e.g. cat, dog - logistic
regression)
• Polynomial regression
• It is used for a more complex data set that will not
fit neatly into a linear regression. An algorithm is
trained with a complex, labeled data set that may
not fit well under a straight line regression.
12
Image Classification using Logistic Regression
13
Image Classification using Logistic Regression
14
Unzip the Images.zip dataset
Image Classification using Logistic Regression
• Embedder:
• Inception V3: Google’s Inception v3 model trained on ImageNet.
• SqueezeNet: Deep model for image recognition that achieves AlexNet-level
accuracy on ImageNet with 50x fewer parameter.
• VGG-16: 16-layer image recognition model trained on ImageNet.
• Vgg-19: 19-layer image recognition model trained on ImageNet.
• Painter: A model tained to predict painters from artwork images.
• DeepLoc: A model trained to analyze yeast cell images.
• openface: Face recognition model trained on FaceScrub and CASIA-WebFace
dataset.
• http://vintage.winklerbros.net/facescrub.html
15
ImageNet/ Inception V3
• ImageNet is an image database. The images in the database are
organized into a hierarchy, with each node of the hierarchy depicted
by hundreds and thousands of images.
• Sample size of Training data: 1 million
• Sample size of validation data: 50000
• Number of classes: 1000
16
Outlier detection
17
Data Preprocess (outlier detection)
18
Use iris.tab dataset
Outlier Detection
• Many applications require being
able to decide whether a new
observation belongs to the same
distribution as existing
observations (it is an inlier), or
should be considered as different
(it is an outlier). Often, this ability
is used to clean real data sets.
19
Ref: https://scikit-learn.org/stable/modules/outlier_detection.html#outlier-detection
Evaluation with outliers/ inliers
20
With outliers Without outliers
Tree model
21
Model and explanation
22
Use iris.tab dataset
Model and explanation (Tree model)
23
What is the most significant variable?
Tree also allows the numeric target
24
Use housing.tab dataset
Tree model and prediction
25
26
Open model_build_on_preduction.ows
• Main concepts:
• Data exploration
• Feature Statistics
• Rank
• Data Preprocess
• Preprocess
• Data Split
• Data Sampler
• Model
• Tree/ Tree Viewer
• Save Model/ Load Model
• Test and Score
• Confusion Matrix
• Prediction
Homework
• Please attempt to apply the below-mentioned models to your own
binary dataset, and endeavor to identify the model with the optimal
performances as well as the most significant variables. (in next page)
• Furthermore, it is advised to elucidate the underlying factors of model;
you should include those subsections as below:
• Introduction to your dataset
• Data exploration
• Model evaluation
• Conclusion
• Use PPT with texts and illustrations to present your observations.
27
28

More Related Content

Similar to Supervised Learning

H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGE
Neeraj Goswami
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
AschalewAyele2
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 

Similar to Supervised Learning (20)

H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGE
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
 
Cluster Analysis in Data Science.pptx
Cluster Analysis in Data Science.pptxCluster Analysis in Data Science.pptx
Cluster Analysis in Data Science.pptx
 
Cluster Analysis in Data Science.pptx
Cluster Analysis in Data Science.pptxCluster Analysis in Data Science.pptx
Cluster Analysis in Data Science.pptx
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
Data mining
Data miningData mining
Data mining
 
Distilling dark knowledge from neural networks
Distilling dark knowledge from neural networksDistilling dark knowledge from neural networks
Distilling dark knowledge from neural networks
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist Deck
 
CLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptxCLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptx
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
 
Data mining
Data miningData mining
Data mining
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 

More from FEG

資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
FEG
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
FEG
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
FEG
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
FEG
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
FEG
 

More from FEG (20)

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
Data Visualization in Excel
Data Visualization in ExcelData Visualization in Excel
Data Visualization in Excel
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdf
 

Recently uploaded

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 

Recently uploaded (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 

Supervised Learning

  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Work Experience • Telecom big data Innovation • AI projects • Retail marketing technology • User Group • TW Spark User Group • TW Hadoop User Group • Taiwan Data Engineer Association Director • Research • Big Data/ ML/ AIOT/ AI Columnist 2
  • 3. Tutorial Content 3 Image Classification using Logistic Regression Build a supervised model and prediction Home work What is the supervised learning
  • 4. Code • Download code • https://github.com/orozcohsu/ntunhs_2023_01 • Folder/file • 20230516_02 4
  • 6. What is the supervised learning 6
  • 7. Supervised learning vs. Unsupervised learning • Supervised learning: discover patterns in the data that relate data attributes with a target (class labeled) attribute. • These patterns are then utilized to predict the values of the target attribute in future data instances. • Unsupervised learning: The data have no target attribute. • We want to explore the data to find some intrinsic structures in them. • Classic supervised learning algorithm • Classification • Regression 7
  • 9. What is Classification in Supervised Learning? • Classification is where an algorithm is trained to classify input data on discrete variables. • During training, algorithms are given training input data with a class label. For example, training data might consist of the last credit card bills of a set of customers, labeled with whether they made a future purchase or not. • When a new customer’s credit balance is presented to the algorithm, it classifies the customer to either will purchase or will not purchase group. 9
  • 10. What is Regression in Supervised Learning? • Regression is a supervised learning method where an algorithm is trained to predict an output from a continuous range of possible values. For example, real estate training data would take note of the location, area, and other relevant parameters. The output is the price of the specific real estate. • In regression, an algorithm needs to identify a functional relationship between the input parameters and the output. • The output value is not discrete like in classification, instead it is a function of the continuous outputs. 10
  • 11. Real-life Applications of Classification • Binary classification (Most companies use) • Spam detection • Churn prediction • Conversion prediction • Imbalanced Classification • Fraud detection: In the labeled data set used for training, only a small number of inputs are labeled as a fraud. • Medical diagnostics: In a large pool of samples, ones with a positive case of a disease might be far less. • Multi-class Classification • Face classification: Based on the training data, a model categorizes a photo and maps it to a specific person. • Email classification: Multi-class classification is used to segregate emails into various categories – social, education, work, and family. 11
  • 12. Real-life Applications of Regression • Linear regression • It can be used to predict values within a continuous range, (e.g. sales, price forecasting) or classifying them into categories (e.g. cat, dog - logistic regression) • Polynomial regression • It is used for a more complex data set that will not fit neatly into a linear regression. An algorithm is trained with a complex, labeled data set that may not fit well under a straight line regression. 12
  • 13. Image Classification using Logistic Regression 13
  • 14. Image Classification using Logistic Regression 14 Unzip the Images.zip dataset
  • 15. Image Classification using Logistic Regression • Embedder: • Inception V3: Google’s Inception v3 model trained on ImageNet. • SqueezeNet: Deep model for image recognition that achieves AlexNet-level accuracy on ImageNet with 50x fewer parameter. • VGG-16: 16-layer image recognition model trained on ImageNet. • Vgg-19: 19-layer image recognition model trained on ImageNet. • Painter: A model tained to predict painters from artwork images. • DeepLoc: A model trained to analyze yeast cell images. • openface: Face recognition model trained on FaceScrub and CASIA-WebFace dataset. • http://vintage.winklerbros.net/facescrub.html 15
  • 16. ImageNet/ Inception V3 • ImageNet is an image database. The images in the database are organized into a hierarchy, with each node of the hierarchy depicted by hundreds and thousands of images. • Sample size of Training data: 1 million • Sample size of validation data: 50000 • Number of classes: 1000 16
  • 18. Data Preprocess (outlier detection) 18 Use iris.tab dataset
  • 19. Outlier Detection • Many applications require being able to decide whether a new observation belongs to the same distribution as existing observations (it is an inlier), or should be considered as different (it is an outlier). Often, this ability is used to clean real data sets. 19 Ref: https://scikit-learn.org/stable/modules/outlier_detection.html#outlier-detection
  • 20. Evaluation with outliers/ inliers 20 With outliers Without outliers
  • 22. Model and explanation 22 Use iris.tab dataset
  • 23. Model and explanation (Tree model) 23 What is the most significant variable?
  • 24. Tree also allows the numeric target 24 Use housing.tab dataset
  • 25. Tree model and prediction 25
  • 26. 26 Open model_build_on_preduction.ows • Main concepts: • Data exploration • Feature Statistics • Rank • Data Preprocess • Preprocess • Data Split • Data Sampler • Model • Tree/ Tree Viewer • Save Model/ Load Model • Test and Score • Confusion Matrix • Prediction
  • 27. Homework • Please attempt to apply the below-mentioned models to your own binary dataset, and endeavor to identify the model with the optimal performances as well as the most significant variables. (in next page) • Furthermore, it is advised to elucidate the underlying factors of model; you should include those subsections as below: • Introduction to your dataset • Data exploration • Model evaluation • Conclusion • Use PPT with texts and illustrations to present your observations. 27
  • 28. 28