SlideShare a Scribd company logo
1 of 16
PRACTICAL ISSUES
in Machine Learning
Partha Sarathi Kar
IVSM 166777
1
CONTENTS
1. Importance of Good Features
2. Irrelevant and Redundant Features
3. Feature Pruning and Normalization
4. Evaluating Model Performance
5. Cross Validation
6. Hypothesis Testing and Statistical Significance
7. Debugging Learning Algorithms
8. Bias/Variance Trade-off
2/15
IMPORTANCE OF GOOD FEATURES
Feature:
• a feature is an individual
measurable property
• a base of a model
3/15
Importance of
Feature:
• choosing poorly will
result in an unreliable
model
Figure: Machine learning workflow
FEATURE EXTRACTION EXAMPLE
pixel representation
• 100 x 100 pixel image = 30,000
dimension vector
• each dimension corresponds to the
RGB
• Like feature(1.1) is ..
4/15
patch representation
• the unit of interest is a small rectangular
block
• rather than a single pixel
object recognition from images
Figure: pixel representation
Figure: patch representation
FEATURE EXTRACTION EXAMPLE
shape representation
• throw out all color and pixel
information
• simply provide a bounding polygon
5/15
text categorization
bag of words representation
object recognition from images
Figure: pixel representation
Figure: pixel representation
Figure: shape representation
Figure: text categorization
IRRELEVANT AND REDUNDANT FEATURES
6/15
Figure: pixel representation
Figure: shape representation
Irrelevant Feature:
the presence of
the word “the” might
be largely irrelevant
for predicting whether
a
course review is
positive or negative.
an irrelevant
feature is one that is
completely uncorrelated with
the prediction
task
IRRELEVANT AND REDUNDANT FEATURES
7/15
Figure: pixel representation
Figure: shape representation
Redundant Feature:
having a bright red
pixel in an image at
position
(20, 93) is probably
highly redundant with
having a bright red
pixel
at position (21, 93)
two features are redundant if
they are highly correlated
eg: both might be useful for
identifying fire hydrants
Figure: fire hydrants
FEATURE PRUNING AND NORMALIZATION
8/15
Figure: pixel representation
Figure: shape representation
Feature Pruning:
the word “good” appears
in exactly one training
document, which is
positive.
It’s hard to tell with just
one training example if it
is really correlated with
the
positive class, or is it just
noise.
• reduces the size of decision trees
• reduces the complexity of the
final classifier
FEATURE PRUNING AND NORMALIZATION
9/15
Figure: pixel representation
Figure: shape representation
Normalization:
to make it easier for your learning
algorithm to learn.
Eg: the height of the “A” has been
reduced from 8 to 6 pixels, while the
width has been reduced from 7 to 5
pixels
EVALUATING MODEL PERFORMANCE
10/15
Figure: pixel representation
Figure: shape representation
Purpose:
highly accurate classifier
eg:
Medical Diagnosis
Spam Detection
There are two major types of binary
classification problems.
1.“X versus Y.” For instance, positive versus
negative sentiment.
2. “X versus not-X.” For instance, spam versus
non-spam.
CROSS VALIDATION
11/15
Figure: pixel representation
Figure: shape representation
• evaluating and comparing learning
algorithms
• how a model will perform in the
future
dividing data into two
segments:
one used to learn or
train a model
and the other used to
validate the model
HYPOTHESIS TESTING AND STATISTICAL SIGNIFICANCE
12/15
Figure: pixel representation
eg. In cross validation, compare
between 7% error and 6.9%
error over 1000 examples
in machine learning just as in statistical
hypothesis testing.
DEBUGGING LEARNING ALGORITHMS
13/15
Figure: pixel representation
Moreover, sometimes bugs lead
to learning algorithms
performing
better
• it’s unclear to identify there’s a bug
or
• problem is too hard or
• there’s too much noise
• Learning algorithms are notoriously hard to debug
BIAS/VARIANCE TRADE-OFF
14/15
Figure: pixel representation
trade-off between estimation error and
approximation error
f be the learned classifier, selected
from a set F of “all possible
classifiers using a fixed
representation,” and f * is optimal
classifier
estimation error, measures how
far the actual learned classifier f
is from the optimal classifier f *
approximation error, measures
the quality of the model family
REFERENCES
15/15
Figure: pixel representation
• http://ciml.info/dl/v0_8/ciml-v0_8-all.pdf
• https://en.wikipedia.org/wiki/Feature_(machine_learning)
• https://stats.stackexchange.com
• https://www.quora.com
16
THANKS

More Related Content

What's hot

Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...Rupali Bhatnagar
 
Real-Time Scheduling
Real-Time SchedulingReal-Time Scheduling
Real-Time Schedulingsathish sak
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler designAnul Chaudhary
 
Network management
Network managementNetwork management
Network managementMohd Arif
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning systemswapnac12
 
ppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxAnweshaGarima
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its ApplicationsDr Ganesh Iyer
 
The fundamentals of Machine Learning
The fundamentals of Machine LearningThe fundamentals of Machine Learning
The fundamentals of Machine LearningHichem Felouat
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithmswapnac12
 
Problem solving in Artificial Intelligence.pptx
Problem solving in Artificial Intelligence.pptxProblem solving in Artificial Intelligence.pptx
Problem solving in Artificial Intelligence.pptxkitsenthilkumarcse
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure callSunita Sahu
 
Evolutionary computing - soft computing
Evolutionary computing - soft computingEvolutionary computing - soft computing
Evolutionary computing - soft computingSakshiMahto1
 

What's hot (20)

Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
 
Trends in distributed systems
Trends in distributed systemsTrends in distributed systems
Trends in distributed systems
 
Mobile 2.0
Mobile 2.0Mobile 2.0
Mobile 2.0
 
Real-Time Scheduling
Real-Time SchedulingReal-Time Scheduling
Real-Time Scheduling
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
 
Network management
Network managementNetwork management
Network management
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
 
Fault tolerance techniques
Fault tolerance techniquesFault tolerance techniques
Fault tolerance techniques
 
ppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptx
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
The fundamentals of Machine Learning
The fundamentals of Machine LearningThe fundamentals of Machine Learning
The fundamentals of Machine Learning
 
Firefly algorithm
Firefly algorithmFirefly algorithm
Firefly algorithm
 
Cnn
CnnCnn
Cnn
 
24 Multithreaded Algorithms
24 Multithreaded Algorithms24 Multithreaded Algorithms
24 Multithreaded Algorithms
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
 
Agents_AI.ppt
Agents_AI.pptAgents_AI.ppt
Agents_AI.ppt
 
Problem solving in Artificial Intelligence.pptx
Problem solving in Artificial Intelligence.pptxProblem solving in Artificial Intelligence.pptx
Problem solving in Artificial Intelligence.pptx
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
 
Evolutionary computing - soft computing
Evolutionary computing - soft computingEvolutionary computing - soft computing
Evolutionary computing - soft computing
 
Superscalar Processor
Superscalar ProcessorSuperscalar Processor
Superscalar Processor
 

Similar to Practical issues in Machine Learning

DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDisplayr
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...Smarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Interpretable ML
Interpretable MLInterpretable ML
Interpretable MLMayur Sand
 
C11BD 22-23 data ana-Exploration II.pptx
C11BD 22-23 data ana-Exploration II.pptxC11BD 22-23 data ana-Exploration II.pptx
C11BD 22-23 data ana-Exploration II.pptxTariqqandeel
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionMohamad Sahil
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1heinestien
 
Visual Exploration of Machine Learning Results using Data Cube Analysis
Visual Exploration of Machine Learning Results using Data Cube AnalysisVisual Exploration of Machine Learning Results using Data Cube Analysis
Visual Exploration of Machine Learning Results using Data Cube AnalysisMinsuk Kahng
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentationNeerajNishad4
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Seattle DAML meetup
 
30thSep2014
30thSep201430thSep2014
30thSep2014Mia liu
 

Similar to Practical issues in Machine Learning (20)

DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slides
 
C3 w5
C3 w5C3 w5
C3 w5
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Interpretable ML
Interpretable MLInterpretable ML
Interpretable ML
 
C11BD 22-23 data ana-Exploration II.pptx
C11BD 22-23 data ana-Exploration II.pptxC11BD 22-23 data ana-Exploration II.pptx
C11BD 22-23 data ana-Exploration II.pptx
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
PMED: APPM Workshop: Eliminating the Irrelevant - The HARVEST Algorithm - Her...
PMED: APPM Workshop: Eliminating the Irrelevant - The HARVEST Algorithm - Her...PMED: APPM Workshop: Eliminating the Irrelevant - The HARVEST Algorithm - Her...
PMED: APPM Workshop: Eliminating the Irrelevant - The HARVEST Algorithm - Her...
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1
 
Visual Exploration of Machine Learning Results using Data Cube Analysis
Visual Exploration of Machine Learning Results using Data Cube AnalysisVisual Exploration of Machine Learning Results using Data Cube Analysis
Visual Exploration of Machine Learning Results using Data Cube Analysis
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
PPT1
PPT1PPT1
PPT1
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 

Recently uploaded

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 

Recently uploaded (20)

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 

Practical issues in Machine Learning

  • 1. PRACTICAL ISSUES in Machine Learning Partha Sarathi Kar IVSM 166777 1
  • 2. CONTENTS 1. Importance of Good Features 2. Irrelevant and Redundant Features 3. Feature Pruning and Normalization 4. Evaluating Model Performance 5. Cross Validation 6. Hypothesis Testing and Statistical Significance 7. Debugging Learning Algorithms 8. Bias/Variance Trade-off 2/15
  • 3. IMPORTANCE OF GOOD FEATURES Feature: • a feature is an individual measurable property • a base of a model 3/15 Importance of Feature: • choosing poorly will result in an unreliable model Figure: Machine learning workflow
  • 4. FEATURE EXTRACTION EXAMPLE pixel representation • 100 x 100 pixel image = 30,000 dimension vector • each dimension corresponds to the RGB • Like feature(1.1) is .. 4/15 patch representation • the unit of interest is a small rectangular block • rather than a single pixel object recognition from images Figure: pixel representation Figure: patch representation
  • 5. FEATURE EXTRACTION EXAMPLE shape representation • throw out all color and pixel information • simply provide a bounding polygon 5/15 text categorization bag of words representation object recognition from images Figure: pixel representation Figure: pixel representation Figure: shape representation Figure: text categorization
  • 6. IRRELEVANT AND REDUNDANT FEATURES 6/15 Figure: pixel representation Figure: shape representation Irrelevant Feature: the presence of the word “the” might be largely irrelevant for predicting whether a course review is positive or negative. an irrelevant feature is one that is completely uncorrelated with the prediction task
  • 7. IRRELEVANT AND REDUNDANT FEATURES 7/15 Figure: pixel representation Figure: shape representation Redundant Feature: having a bright red pixel in an image at position (20, 93) is probably highly redundant with having a bright red pixel at position (21, 93) two features are redundant if they are highly correlated eg: both might be useful for identifying fire hydrants Figure: fire hydrants
  • 8. FEATURE PRUNING AND NORMALIZATION 8/15 Figure: pixel representation Figure: shape representation Feature Pruning: the word “good” appears in exactly one training document, which is positive. It’s hard to tell with just one training example if it is really correlated with the positive class, or is it just noise. • reduces the size of decision trees • reduces the complexity of the final classifier
  • 9. FEATURE PRUNING AND NORMALIZATION 9/15 Figure: pixel representation Figure: shape representation Normalization: to make it easier for your learning algorithm to learn. Eg: the height of the “A” has been reduced from 8 to 6 pixels, while the width has been reduced from 7 to 5 pixels
  • 10. EVALUATING MODEL PERFORMANCE 10/15 Figure: pixel representation Figure: shape representation Purpose: highly accurate classifier eg: Medical Diagnosis Spam Detection There are two major types of binary classification problems. 1.“X versus Y.” For instance, positive versus negative sentiment. 2. “X versus not-X.” For instance, spam versus non-spam.
  • 11. CROSS VALIDATION 11/15 Figure: pixel representation Figure: shape representation • evaluating and comparing learning algorithms • how a model will perform in the future dividing data into two segments: one used to learn or train a model and the other used to validate the model
  • 12. HYPOTHESIS TESTING AND STATISTICAL SIGNIFICANCE 12/15 Figure: pixel representation eg. In cross validation, compare between 7% error and 6.9% error over 1000 examples in machine learning just as in statistical hypothesis testing.
  • 13. DEBUGGING LEARNING ALGORITHMS 13/15 Figure: pixel representation Moreover, sometimes bugs lead to learning algorithms performing better • it’s unclear to identify there’s a bug or • problem is too hard or • there’s too much noise • Learning algorithms are notoriously hard to debug
  • 14. BIAS/VARIANCE TRADE-OFF 14/15 Figure: pixel representation trade-off between estimation error and approximation error f be the learned classifier, selected from a set F of “all possible classifiers using a fixed representation,” and f * is optimal classifier estimation error, measures how far the actual learned classifier f is from the optimal classifier f * approximation error, measures the quality of the model family
  • 15. REFERENCES 15/15 Figure: pixel representation • http://ciml.info/dl/v0_8/ciml-v0_8-all.pdf • https://en.wikipedia.org/wiki/Feature_(machine_learning) • https://stats.stackexchange.com • https://www.quora.com