SlideShare a Scribd company logo
1 of 20
Download to read offline
Introduction to
Machine Learning and
Features
Machine Learning Unit 2
Sudeshna Sarkar
Recognize
the type of
sleeve/
pattern in the
image of a
shirt
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 2
Predict the
averagerating of
a new
microwave
Predict the average rating of
a new microwave
Predict the number of
purchases of a webcam
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 3
Classify email
SPAM PROMOTION
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 4
Autonomous
Driving
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 5
Sensors:
Camera
Radar
Lidar
Model
Data
The Machine
Learning
Solution
๏‚– Collect many examples that specify the correct output
for a given input
๏‚– ML to get the mapping from input to output
Algorithmic solution Machine learning solution
Computer
Input Program
Output
Computer
Input Output
Program
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 6
Machine
Learning :
Definition
๏‚– Learning is the ability to evolve behaviours based on
data (experience).
๏‚– Machine Learning explores algorithms that can
๏‚– Learn from data such as build a model from data
๏‚– Use the model or experience for prediction, decision making or
solving some tasks
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 7
Components
of a learning
problem
๏‚– Task: The behaviour or task being improved.
๏‚– For example: classification, acting in an environment
๏‚– Data:The experiences that are being used to improve
performance in the task.
๏‚– Measure of improvement :
๏‚– For example: increasing accuracy in prediction, acquiring new,
improved speed and efficiency
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 8
Design a
Learner
1. Choose the training
experience
2. Choose the target
function (that is to be
learned)
3. Choose how to
represent the target
function
4. Choose a learning
algorithm to infer the
target function
Experiences
Data
Background
knowledge/
Bias
Problem/
Task
Answer/
Performance
Learner Reasoner
Models
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 9
Components
of a ML
application
Representation
โ€ข Features: Data specification
โ€ข Function class: Model form
Optimization
โ€ข ModelTraining
Evaluation
โ€ข Performance
measure
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 10
1A.
Representation
of Data
1. How is the data specified?
A. Features
๏‚– Feature vector of n
features
าง
๐‘ฅ = (๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘›)
B. Convert input to a vector
of basis functions
๐œ™0 าง
๐‘ฅ , ๐œ™1 าง
๐‘ฅ , โ€ฆ , ๐œ™๐‘ าง
๐‘ฅ
1. A microwave
Attributes:
๏‚– Volume: 17 l, 23 l, โ€ฆ
๏‚– Functions: Micro, Convection, ..
๏‚– Power level
๏‚– Accessories
๏‚– Type of dial
๏‚– Brand
๏‚– Warranty
๏‚– Price
1. Image of shirt
๏‚– Collar style
๏‚– Sleeve type
๏‚– Colour
๏‚– โ€ฆ
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 11
Features
Image classification
๏‚– Raw pixels
๏‚– Histograms
๏‚– GIST descriptors
Product Rating (Webcam)
๏‚– Frame rate
๏‚– Resolution
๏‚– Autofocus
๏‚– Microphone
๏‚– Lens
๏‚– Brand
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 12
Bank
Marketing
Dataset
Predict if the client will subscribe
(yes/no) a term deposit (variable y).
Input variables:
# bank client data:
1. age
2. type of job
3. marital status
4. education
5. has credit in default?
6. has housing loan?
7. has personal loan?
# related with the last contact of
the current campaign:
8. contact communication type
('cellular','telephone')
9. last contact date and duration
# other attributes:
12. No of contacts performed for this client
13. No of days after client last contacted
14. No of contacts performed before this
campaign and for this client
15. outcome of prev marketing campaign
# social and economic context attributes
16. employment variation rate - quarterly
indicator
17. consumer price index - monthly
indicator
18. consumer confidence index - monthly
indicator
19. euribor 3 month rate - daily indicator
20. number of employees - quarterly
indicator
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 13
FeatureChoice
๏‚– Input Data comprise features
๏‚– Structured features (numerical or categorical values)
๏‚– Unstructured (text, speech, image, video, etc)
๏‚– Use only relevant features
๏‚– Too many features?
๏‚– Select feature subset (reduction)
๏‚– Extract features:Transform features
Data Features Model
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 14
Feature
Engineering
Transforming raw data into features that better represent the
underlying problem
๏‚– Feature Selection
๏‚– Feature Extraction
๏‚– Missing feature values
๏‚– Feature value normalization
๏‚– Aggregate Feature values
๏‚– Feature Encoding
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 15
1B. Model
Representation
๏‚– The richer the representation, the more useful it is for subsequent
problem solving.
๏‚– The richer the representation, the more difficult it is to learn.
๐‘ฆ = ๐‘“( าง
๐‘ฅ)
๐‘ฆ = ๐‘” เดค
๐œ™( าง
๐‘ฅ)
๏‚– Linear function
๏‚– DecisionTree
๏‚– Graphical Model
๏‚– Neural Network
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 16
1B. Model
Representation
Hypothesis
space
๐‘ฆ = ๐‘“( าง
๐‘ฅ)
Linear
Function
Decision
Tree
Graphical
Model
Neural
Net
๐‘Œ = ๐‘ค0 + ๐‘ค1๐‘ฅ1 + โ‹ฏ + ๐‘ค๐‘๐‘ฅ๐‘ + ๐œ–
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 17
2. Evaluation
1. Accuracy =
# correctly classified
# all test examples
2. Logarithmic Loss:
๐ฟ๐‘– = โˆ’ log ๐‘ƒ(๐‘Œ = ๐‘ฆ๐‘–|๐‘‹ = ๐‘ฅ๐‘–)
๐ฟ = เท
๐‘=1
๐‘€
๐‘ฆ๐‘œ๐‘ log ๐‘๐‘œ๐‘
3. Mean Squared error
๐‘€๐‘†๐ธ =
1
๐‘š
เท ๐‘ฆ๐‘๐‘Ÿ๐‘’๐‘‘ โˆ’ ๐‘ฆ๐‘ก๐‘Ÿ๐‘ข๐‘’
2
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 18
3.
Optimization
๏‚– Define loss function
๏‚– Optimize loss function
๏‚– Stochastic Gradient Descent
(Convex functions)
๏‚– Combinatorial optimization
๏‚– E.g.:Greedy search
๏‚– Constrained optimization
๏‚– E.g.: Linear programming
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 19
Elements of
Optimization
1. Variables
2. Constraints
3. Objective Function
Simple Linear Regression
1. Variables: ๐‘ค0, ๐‘ค1, โ€ฆ , ๐‘ค๐‘›
2. Constraints: none
3. Objective Function:
Minimize
ฯƒ๐‘–=1
๐‘š
๐‘ฆ๐‘– ๐‘ค0 + ฯƒ๐‘—=1
๐‘›
๐‘ค๐‘—๐‘ฅ๐‘–๐‘—
2
๏‚– ๐‘š data points, ๐‘› features
๏‚– ๐‘ฅ๐‘–๐‘—: jth attribute of ith
instance
๏‚– ๐‘ฆ๐‘–: output of ith instance
Find coefficients ๐‘ค0, ๐‘ค1, โ€ฆ , ๐‘ค๐‘›
to best fit data
Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 20
๐‘ฆ๐‘–
๐‘ฅ๐‘–๐‘—

More Related Content

Similar to Features.pdf

4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest
ashutosh kumar
ย 
Iwsm2014 measuring cosmic software size from functional execution traces of...
Iwsm2014   measuring cosmic software size from functional execution traces of...Iwsm2014   measuring cosmic software size from functional execution traces of...
Iwsm2014 measuring cosmic software size from functional execution traces of...
Nesma
ย 
myResumeupdatedFinal-1
myResumeupdatedFinal-1myResumeupdatedFinal-1
myResumeupdatedFinal-1
Sunny sachan
ย 

Similar to Features.pdf (20)

Ppts21
Ppts21Ppts21
Ppts21
ย 
A Real Time Advance Automated Attendance System using Face-Net Algorithm
A Real Time Advance Automated Attendance System using Face-Net AlgorithmA Real Time Advance Automated Attendance System using Face-Net Algorithm
A Real Time Advance Automated Attendance System using Face-Net Algorithm
ย 
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
ย 
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
ย 
Internshipppt.pptx
Internshipppt.pptxInternshipppt.pptx
Internshipppt.pptx
ย 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
ย 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
ย 
4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest
ย 
Features Engineering Art using R
Features Engineering Art using RFeatures Engineering Art using R
Features Engineering Art using R
ย 
IRJET- Credit Card Authentication using Facial Recognition
IRJET-  	  Credit Card Authentication using Facial RecognitionIRJET-  	  Credit Card Authentication using Facial Recognition
IRJET- Credit Card Authentication using Facial Recognition
ย 
Review Paper on Attendance Capturing System using Face Recognition
Review Paper on Attendance Capturing System using Face RecognitionReview Paper on Attendance Capturing System using Face Recognition
Review Paper on Attendance Capturing System using Face Recognition
ย 
Iwsm2014 measuring cosmic software size from functional execution traces of...
Iwsm2014   measuring cosmic software size from functional execution traces of...Iwsm2014   measuring cosmic software size from functional execution traces of...
Iwsm2014 measuring cosmic software size from functional execution traces of...
ย 
myResumeupdatedFinal-1
myResumeupdatedFinal-1myResumeupdatedFinal-1
myResumeupdatedFinal-1
ย 
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhhppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ย 
A Machine learning based framework for Verification and Validation of Massive...
A Machine learning based framework for Verification and Validation of Massive...A Machine learning based framework for Verification and Validation of Massive...
A Machine learning based framework for Verification and Validation of Massive...
ย 
IRJET - Automated Fraud Detection Framework in Examination Halls
 IRJET - Automated Fraud Detection Framework in Examination Halls IRJET - Automated Fraud Detection Framework in Examination Halls
IRJET - Automated Fraud Detection Framework in Examination Halls
ย 
Short Resume of Ashish Kumar Tiwari
Short Resume of Ashish Kumar TiwariShort Resume of Ashish Kumar Tiwari
Short Resume of Ashish Kumar Tiwari
ย 
Tackling Open Images Challenge (2019)
Tackling Open Images Challenge (2019)Tackling Open Images Challenge (2019)
Tackling Open Images Challenge (2019)
ย 
IRJET - Job Portal Analysis and Salary Prediction System
IRJET -  	  Job Portal Analysis and Salary Prediction SystemIRJET -  	  Job Portal Analysis and Salary Prediction System
IRJET - Job Portal Analysis and Salary Prediction System
ย 
IRJET- Class Attendance using Face Detection and Recognition with OPENCV
IRJET- Class Attendance using Face Detection and Recognition with OPENCVIRJET- Class Attendance using Face Detection and Recognition with OPENCV
IRJET- Class Attendance using Face Detection and Recognition with OPENCV
ย 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
ย 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
ย 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
pragatimahajan3
ย 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
ย 

Recently uploaded (20)

Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
ย 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
ย 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
ย 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
ย 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
ย 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
ย 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ย 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
ย 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
ย 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
ย 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
ย 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
ย 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
ย 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
ย 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
ย 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
ย 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
ย 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
ย 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
ย 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
ย 

Features.pdf

  • 1. Introduction to Machine Learning and Features Machine Learning Unit 2 Sudeshna Sarkar
  • 2. Recognize the type of sleeve/ pattern in the image of a shirt Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 2
  • 3. Predict the averagerating of a new microwave Predict the average rating of a new microwave Predict the number of purchases of a webcam Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 3
  • 4. Classify email SPAM PROMOTION Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 4
  • 5. Autonomous Driving Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 5 Sensors: Camera Radar Lidar Model Data
  • 6. The Machine Learning Solution ๏‚– Collect many examples that specify the correct output for a given input ๏‚– ML to get the mapping from input to output Algorithmic solution Machine learning solution Computer Input Program Output Computer Input Output Program Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 6
  • 7. Machine Learning : Definition ๏‚– Learning is the ability to evolve behaviours based on data (experience). ๏‚– Machine Learning explores algorithms that can ๏‚– Learn from data such as build a model from data ๏‚– Use the model or experience for prediction, decision making or solving some tasks Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 7
  • 8. Components of a learning problem ๏‚– Task: The behaviour or task being improved. ๏‚– For example: classification, acting in an environment ๏‚– Data:The experiences that are being used to improve performance in the task. ๏‚– Measure of improvement : ๏‚– For example: increasing accuracy in prediction, acquiring new, improved speed and efficiency Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 8
  • 9. Design a Learner 1. Choose the training experience 2. Choose the target function (that is to be learned) 3. Choose how to represent the target function 4. Choose a learning algorithm to infer the target function Experiences Data Background knowledge/ Bias Problem/ Task Answer/ Performance Learner Reasoner Models Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 9
  • 10. Components of a ML application Representation โ€ข Features: Data specification โ€ข Function class: Model form Optimization โ€ข ModelTraining Evaluation โ€ข Performance measure Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 10
  • 11. 1A. Representation of Data 1. How is the data specified? A. Features ๏‚– Feature vector of n features าง ๐‘ฅ = (๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘›) B. Convert input to a vector of basis functions ๐œ™0 าง ๐‘ฅ , ๐œ™1 าง ๐‘ฅ , โ€ฆ , ๐œ™๐‘ าง ๐‘ฅ 1. A microwave Attributes: ๏‚– Volume: 17 l, 23 l, โ€ฆ ๏‚– Functions: Micro, Convection, .. ๏‚– Power level ๏‚– Accessories ๏‚– Type of dial ๏‚– Brand ๏‚– Warranty ๏‚– Price 1. Image of shirt ๏‚– Collar style ๏‚– Sleeve type ๏‚– Colour ๏‚– โ€ฆ Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 11
  • 12. Features Image classification ๏‚– Raw pixels ๏‚– Histograms ๏‚– GIST descriptors Product Rating (Webcam) ๏‚– Frame rate ๏‚– Resolution ๏‚– Autofocus ๏‚– Microphone ๏‚– Lens ๏‚– Brand Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 12
  • 13. Bank Marketing Dataset Predict if the client will subscribe (yes/no) a term deposit (variable y). Input variables: # bank client data: 1. age 2. type of job 3. marital status 4. education 5. has credit in default? 6. has housing loan? 7. has personal loan? # related with the last contact of the current campaign: 8. contact communication type ('cellular','telephone') 9. last contact date and duration # other attributes: 12. No of contacts performed for this client 13. No of days after client last contacted 14. No of contacts performed before this campaign and for this client 15. outcome of prev marketing campaign # social and economic context attributes 16. employment variation rate - quarterly indicator 17. consumer price index - monthly indicator 18. consumer confidence index - monthly indicator 19. euribor 3 month rate - daily indicator 20. number of employees - quarterly indicator http://archive.ics.uci.edu/ml/datasets/Bank+Marketing Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 13
  • 14. FeatureChoice ๏‚– Input Data comprise features ๏‚– Structured features (numerical or categorical values) ๏‚– Unstructured (text, speech, image, video, etc) ๏‚– Use only relevant features ๏‚– Too many features? ๏‚– Select feature subset (reduction) ๏‚– Extract features:Transform features Data Features Model Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 14
  • 15. Feature Engineering Transforming raw data into features that better represent the underlying problem ๏‚– Feature Selection ๏‚– Feature Extraction ๏‚– Missing feature values ๏‚– Feature value normalization ๏‚– Aggregate Feature values ๏‚– Feature Encoding Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 15
  • 16. 1B. Model Representation ๏‚– The richer the representation, the more useful it is for subsequent problem solving. ๏‚– The richer the representation, the more difficult it is to learn. ๐‘ฆ = ๐‘“( าง ๐‘ฅ) ๐‘ฆ = ๐‘” เดค ๐œ™( าง ๐‘ฅ) ๏‚– Linear function ๏‚– DecisionTree ๏‚– Graphical Model ๏‚– Neural Network Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 16
  • 17. 1B. Model Representation Hypothesis space ๐‘ฆ = ๐‘“( าง ๐‘ฅ) Linear Function Decision Tree Graphical Model Neural Net ๐‘Œ = ๐‘ค0 + ๐‘ค1๐‘ฅ1 + โ‹ฏ + ๐‘ค๐‘๐‘ฅ๐‘ + ๐œ– Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 17
  • 18. 2. Evaluation 1. Accuracy = # correctly classified # all test examples 2. Logarithmic Loss: ๐ฟ๐‘– = โˆ’ log ๐‘ƒ(๐‘Œ = ๐‘ฆ๐‘–|๐‘‹ = ๐‘ฅ๐‘–) ๐ฟ = เท ๐‘=1 ๐‘€ ๐‘ฆ๐‘œ๐‘ log ๐‘๐‘œ๐‘ 3. Mean Squared error ๐‘€๐‘†๐ธ = 1 ๐‘š เท ๐‘ฆ๐‘๐‘Ÿ๐‘’๐‘‘ โˆ’ ๐‘ฆ๐‘ก๐‘Ÿ๐‘ข๐‘’ 2 Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 18
  • 19. 3. Optimization ๏‚– Define loss function ๏‚– Optimize loss function ๏‚– Stochastic Gradient Descent (Convex functions) ๏‚– Combinatorial optimization ๏‚– E.g.:Greedy search ๏‚– Constrained optimization ๏‚– E.g.: Linear programming Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 19
  • 20. Elements of Optimization 1. Variables 2. Constraints 3. Objective Function Simple Linear Regression 1. Variables: ๐‘ค0, ๐‘ค1, โ€ฆ , ๐‘ค๐‘› 2. Constraints: none 3. Objective Function: Minimize ฯƒ๐‘–=1 ๐‘š ๐‘ฆ๐‘– ๐‘ค0 + ฯƒ๐‘—=1 ๐‘› ๐‘ค๐‘—๐‘ฅ๐‘–๐‘— 2 ๏‚– ๐‘š data points, ๐‘› features ๏‚– ๐‘ฅ๐‘–๐‘—: jth attribute of ith instance ๏‚– ๐‘ฆ๐‘–: output of ith instance Find coefficients ๐‘ค0, ๐‘ค1, โ€ฆ , ๐‘ค๐‘› to best fit data Machine Learning, Sudeshna Sarkar, COEAI, IIT Kharagpur 20 ๐‘ฆ๐‘– ๐‘ฅ๐‘–๐‘—