SlideShare a Scribd company logo
1 of 32
Download to read offline
Machine Learning for Language Technology 2015
http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm
Machine Learning in Practice (2)
Marina Santini
santinim@stp.lingfil.uu.se
Department of Linguistics and Philology
Uppsala University, Uppsala, Sweden
Autumn 2015
Acknowledgements
• Weka’s slides
• Feature engineering:
http://machinelearningmastery.com/discover-feature-engineering-how-to-
engineer-features-and-how-to-get-good-at-it/
• - Daume' III (2015): 53-64
- Witten et al. (2011): 147-156
Lecture 8: ML in Practice (1)
Outline
• The importance of features
• Unbalanced data, multiclass classification,
teoretical model vs. real-world implementaton
• Evaluation:
– How to assess what has been learned
– Holdout estimation
– Crossvalidation
– Leave-one-out
– Boostrap
Lecture 8: ML in Practice (1)
The importance of good features
• Garbage in – garbage out
Lecture 8: ML in Practice (1)
Bag of Words Representation
Lecture 8: ML in Practice (1)
Parts Of Speech (PoS) representation
Lecture 8: ML in Practice (1)
The Importance of Good Features
• Ex in Text Classification
– BOW (Bag of words) (either counts or binary)
– Phrases
– n-Grams
– Chunks
– PoSs
– PoS n-grams
– etc.
Lecture 8: ML in Practice (1)
ML success: Feature representation
(aka feature engineering)
• The success of ML algorithms depens on how you present the data:
you need great features that describe the structures inherent in
your data:
– Better features means flexibility
– Better features means simpler models
– Better features means better results
• However: The results you achieve are a factor of the model you
choose, the data you have available and the features you prepared.
• That is, your results are dependent on many inter-dependent
properties.
Lecture 8: ML in Practice (1)
Feature Representation is a
knowledge representation problem
• Transforming raw data into features that
better represent the underlying problem to
the predictive models, resulting in improved
model accuracy on unseen data
• Question: what is the best representation of
the sample data to learn a solution to your
problem?
Lecture 8: ML in Practice (1)
Practical Steps
[...] (tasks before here…)
• Select Data: Collect it together
• Preprocess Data: Format it, clean it, sample it so you can
work with it.
• Transform Data: FEATURE REPRESENTATION happens here.
• Model Data: Create models, evaluate them and tune them.
[...] (tasks after here…)
Lecture 8: ML in Practice (1)
Feature representation vs Feature
Selection
• Feature representation is different from
Attribute/Feature selection
Lecture 8: ML in Practice (1)
Irrelevant and Reduntant Features
• Not all features have equal importance.
• Those attributes that are irrelevant to the
problem need to be removed.
• Feature selection addresses those problems
by automatically selecting a subset that are
most useful to the problem.
Lecture 8: ML in Practice (1)
Learning with unbalanced data
• Imbalanced data:
– The number of positive examples is dwarfed by
the number of negative examples (or viceversa)
• Solutions:
– Assign an importance weight to the minority class
– Subsampling
– Oversampling
Lecture 8: ML in Practice (1)
Multiclass Classification
• OVA (one versus all)
• AVA (all versus all)
• Binary tree
Lecture 8: ML in Practice (1)
Theoretical Models vs Real
Machine Learning Schemes
 For an algorithm to be useful in a wide
range of real-world applications it must:
 Allow missing values
 Be robust in the presence of noise
 Be able to approximate arbitrary concept
descriptions
 etc.
Lecture 8: ML in Practice (1)
Evaluating what’s been learned
 How predictive is the model we learned?
 Error on the training data is not a good indicator
of performance on future data
 Otherwise 1-NN would be the optimum classifier!
 Simple solution that can be used if lots of
(labeled) data is available:
 Split data into training and test set
 However: (labeled) data is usually limited
 More sophisticated techniques need to be used
Lecture 8: ML in Practice (1)
Issues in evaluation
 Statistical reliability of estimated differences in
performance ( significance tests)
 Choice of performance measure:
 Number of correct classifications
 Accuracy of probability estimates
 Costs assigned to different types of errors
 Many practical applications involve costs
Lecture 8: ML in Practice (1)
Training and testing: empirical error
 Natural performance measure for
classification problems: error rate
 Success: instance’s class is predicted correctly
 Error: instance’s class is predicted incorrectly
 Error rate: proportion of errors made over the
whole set of instances
 Empirical (Resubstitution) error: error rate
obtained from training data
 Empirical (Resubstitution) error is
(hopelessly) optimistic!
Lecture 8: ML in Practice (1)
Training and testing:
development/validation set & parameter
tuning
 It is important that the test data is not used in any
way to create the classifier
 Some learning schemes operate in two stages:
 Stage 1: build the basic model
 Stage 2: optimize parameter settings
 The test data can’t be used for parameter tuning!
 Proper procedure uses three sets: training data,
validation data, and test data
 Validation data is used to optimize parameters
Lecture 8: ML in Practice (1)
Training and testing: test set
 Test set: independent instances that have played
no part in formation of classifier
 Assumption: both training data and test data are
representative samples of the underlying problem
 Test and training data may differ in nature
 Example: classifiers built using customer data from two
different towns A and B
 To estimate performance of classifier from town A in completely
new town, test it on data from B
Lecture 8: ML in Practice (1)
Making the most of the data
 Once evaluation is complete, all the data can be
used to build the final classifier
 Generally, the larger the training data the better
the classifier (but returns diminish)
 The larger the test data the more accurate the
error estimate
 Holdout procedure: method of splitting original
data into training and test set
 Dilemma: ideally both training set and test set should be
large!
Lecture 8: ML in Practice (1)
Holdout estimation
 What to do if the amount of data is limited?
 The holdout method reserves a certain amount
for testing and uses the remainder for training
 Usually: one third for testing, the rest for training
 Problem: the samples might not be
representative
 Example: class might be missing in the test data
 Advanced version uses stratification
 Ensures that each class is represented with
approximately equal proportions in both subsets
Lecture 8: ML in Practice (1)
Repeated holdout method
 Holdout estimate can be made more reliable by
repeating the process with different subsamples
 In each iteration, a certain proportion is randomly
selected for training (possibly with stratificiation)
 The error rates on the different iterations are averaged to
yield an overall error rate
 This is called the repeated holdout method
 Still not optimum: the different test sets overlap
 Can we prevent overlapping?
Lecture 8: ML in Practice (1)
Cross-validation
 Cross-validation avoids overlapping test sets
 First step: split data into k subsets of equal size
 Second step: use each subset in turn for testing,
the remainder for training
 Called k-fold cross-validation
 Often the subsets are stratified before the
cross-validation is performed
 The error estimates are averaged to yield an
overall error estimate
Lecture 8: ML in Practice (1)
More on cross-validation
 Standard method for evaluation: stratified ten-fold
cross-validation
 Why ten?
 Extensive experiments have shown that this is the best
choice to get an accurate estimate
 There is also some theoretical evidence for this
 Stratification reduces the estimate’s variance
 Even better: repeated stratified cross-validation
 E.g. ten-fold cross-validation is repeated ten times and
results are averaged (reduces the variance)
Lecture 8: ML in Practice (1)
Leave-One-Out cross-validation
 Leave-One-Out:
a particular form of cross-validation:
 Set number of folds to number of training instances
 I.e., for n training instances, build classifier n times
 Makes best use of the data
 Involves no random subsampling
 Very computationally expensive
 (exception: NN)
Lecture 8: ML in Practice (1)
Leave-One-Out-CV and stratification
 Disadvantage of Leave-One-Out-CV:
stratification is not possible
 It guarantees a non-stratified sample
because there is only one instance in the
test set!
Lecture 8: ML in Practice (1)
The bootstrap
 CV uses sampling without replacement
 The same instance, once selected, can not be selected
again for a particular training/test set
 The bootstrap uses sampling with
replacement to form the training set
 Sample a dataset of n instances n times with
replacement to form a new dataset of n instances
 Use this data as the training set
 Use the instances from the original
dataset that don’t occur in the new
training set for testing
Lecture 8: ML in Practice (1)
The 0.632 bootstrap
 Also called the 0.632 bootstrap
 A particular instance has a probability of
1–1/n of not being picked
 Thus its probability of ending up in the
test data is:
 This means the training data will contain
approximately 63.2% of the instances
Lecture 8: ML in Practice (1)
Estimating error with the bootstrap
 The error estimate on the test data will be
very pessimistic
 Trained on just ~63% of the instances
 Therefore, combine it with the
resubstitution/empirical error:
 The resubstitution/empirical error gets less
weight than the error on the test data
 Repeat process several times with different
replacement samples; average the results
Lecture 8: ML in Practice (1)
More on the bootstrap
 Probably the best way of estimating
performance for very small datasets
 However, it has some problems….
Lecture 8: ML in Practice (1)
The end
Lecture 8: ML in Practice (1)

More Related Content

What's hot

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
Bilkent University
 

What's hot (20)

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 
Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and Answers
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 

Viewers also liked

Viewers also liked (20)

Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
 
1 5
1 51 5
1 5
 
Software Architecture Views and Viewpoints
Software Architecture Views and ViewpointsSoftware Architecture Views and Viewpoints
Software Architecture Views and Viewpoints
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to Clustering
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data Representation
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data Exploration
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based Clustering
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification Ensembles
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision Trees
 
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification Rules
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical Clustering
 
Course Introduction
Course IntroductionCourse Introduction
Course Introduction
 

Similar to Lecture 9: Machine Learning in Practice (2)

Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
Chirag Gupta
 

Similar to Lecture 9: Machine Learning in Practice (2) (20)

Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Machine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdfMachine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdf
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentation
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 

More from Marina Santini

More from Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)
 

Recently uploaded

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 

Recently uploaded (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Introduction to TechSoup’s Digital Marketing Services and Use Cases
Introduction to TechSoup’s Digital Marketing  Services and Use CasesIntroduction to TechSoup’s Digital Marketing  Services and Use Cases
Introduction to TechSoup’s Digital Marketing Services and Use Cases
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
 
Our Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdfOur Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdf
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 

Lecture 9: Machine Learning in Practice (2)

  • 1. Machine Learning for Language Technology 2015 http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm Machine Learning in Practice (2) Marina Santini santinim@stp.lingfil.uu.se Department of Linguistics and Philology Uppsala University, Uppsala, Sweden Autumn 2015
  • 2. Acknowledgements • Weka’s slides • Feature engineering: http://machinelearningmastery.com/discover-feature-engineering-how-to- engineer-features-and-how-to-get-good-at-it/ • - Daume' III (2015): 53-64 - Witten et al. (2011): 147-156 Lecture 8: ML in Practice (1)
  • 3. Outline • The importance of features • Unbalanced data, multiclass classification, teoretical model vs. real-world implementaton • Evaluation: – How to assess what has been learned – Holdout estimation – Crossvalidation – Leave-one-out – Boostrap Lecture 8: ML in Practice (1)
  • 4. The importance of good features • Garbage in – garbage out Lecture 8: ML in Practice (1)
  • 5. Bag of Words Representation Lecture 8: ML in Practice (1)
  • 6. Parts Of Speech (PoS) representation Lecture 8: ML in Practice (1)
  • 7. The Importance of Good Features • Ex in Text Classification – BOW (Bag of words) (either counts or binary) – Phrases – n-Grams – Chunks – PoSs – PoS n-grams – etc. Lecture 8: ML in Practice (1)
  • 8. ML success: Feature representation (aka feature engineering) • The success of ML algorithms depens on how you present the data: you need great features that describe the structures inherent in your data: – Better features means flexibility – Better features means simpler models – Better features means better results • However: The results you achieve are a factor of the model you choose, the data you have available and the features you prepared. • That is, your results are dependent on many inter-dependent properties. Lecture 8: ML in Practice (1)
  • 9. Feature Representation is a knowledge representation problem • Transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data • Question: what is the best representation of the sample data to learn a solution to your problem? Lecture 8: ML in Practice (1)
  • 10. Practical Steps [...] (tasks before here…) • Select Data: Collect it together • Preprocess Data: Format it, clean it, sample it so you can work with it. • Transform Data: FEATURE REPRESENTATION happens here. • Model Data: Create models, evaluate them and tune them. [...] (tasks after here…) Lecture 8: ML in Practice (1)
  • 11. Feature representation vs Feature Selection • Feature representation is different from Attribute/Feature selection Lecture 8: ML in Practice (1)
  • 12. Irrelevant and Reduntant Features • Not all features have equal importance. • Those attributes that are irrelevant to the problem need to be removed. • Feature selection addresses those problems by automatically selecting a subset that are most useful to the problem. Lecture 8: ML in Practice (1)
  • 13. Learning with unbalanced data • Imbalanced data: – The number of positive examples is dwarfed by the number of negative examples (or viceversa) • Solutions: – Assign an importance weight to the minority class – Subsampling – Oversampling Lecture 8: ML in Practice (1)
  • 14. Multiclass Classification • OVA (one versus all) • AVA (all versus all) • Binary tree Lecture 8: ML in Practice (1)
  • 15. Theoretical Models vs Real Machine Learning Schemes  For an algorithm to be useful in a wide range of real-world applications it must:  Allow missing values  Be robust in the presence of noise  Be able to approximate arbitrary concept descriptions  etc. Lecture 8: ML in Practice (1)
  • 16. Evaluating what’s been learned  How predictive is the model we learned?  Error on the training data is not a good indicator of performance on future data  Otherwise 1-NN would be the optimum classifier!  Simple solution that can be used if lots of (labeled) data is available:  Split data into training and test set  However: (labeled) data is usually limited  More sophisticated techniques need to be used Lecture 8: ML in Practice (1)
  • 17. Issues in evaluation  Statistical reliability of estimated differences in performance ( significance tests)  Choice of performance measure:  Number of correct classifications  Accuracy of probability estimates  Costs assigned to different types of errors  Many practical applications involve costs Lecture 8: ML in Practice (1)
  • 18. Training and testing: empirical error  Natural performance measure for classification problems: error rate  Success: instance’s class is predicted correctly  Error: instance’s class is predicted incorrectly  Error rate: proportion of errors made over the whole set of instances  Empirical (Resubstitution) error: error rate obtained from training data  Empirical (Resubstitution) error is (hopelessly) optimistic! Lecture 8: ML in Practice (1)
  • 19. Training and testing: development/validation set & parameter tuning  It is important that the test data is not used in any way to create the classifier  Some learning schemes operate in two stages:  Stage 1: build the basic model  Stage 2: optimize parameter settings  The test data can’t be used for parameter tuning!  Proper procedure uses three sets: training data, validation data, and test data  Validation data is used to optimize parameters Lecture 8: ML in Practice (1)
  • 20. Training and testing: test set  Test set: independent instances that have played no part in formation of classifier  Assumption: both training data and test data are representative samples of the underlying problem  Test and training data may differ in nature  Example: classifiers built using customer data from two different towns A and B  To estimate performance of classifier from town A in completely new town, test it on data from B Lecture 8: ML in Practice (1)
  • 21. Making the most of the data  Once evaluation is complete, all the data can be used to build the final classifier  Generally, the larger the training data the better the classifier (but returns diminish)  The larger the test data the more accurate the error estimate  Holdout procedure: method of splitting original data into training and test set  Dilemma: ideally both training set and test set should be large! Lecture 8: ML in Practice (1)
  • 22. Holdout estimation  What to do if the amount of data is limited?  The holdout method reserves a certain amount for testing and uses the remainder for training  Usually: one third for testing, the rest for training  Problem: the samples might not be representative  Example: class might be missing in the test data  Advanced version uses stratification  Ensures that each class is represented with approximately equal proportions in both subsets Lecture 8: ML in Practice (1)
  • 23. Repeated holdout method  Holdout estimate can be made more reliable by repeating the process with different subsamples  In each iteration, a certain proportion is randomly selected for training (possibly with stratificiation)  The error rates on the different iterations are averaged to yield an overall error rate  This is called the repeated holdout method  Still not optimum: the different test sets overlap  Can we prevent overlapping? Lecture 8: ML in Practice (1)
  • 24. Cross-validation  Cross-validation avoids overlapping test sets  First step: split data into k subsets of equal size  Second step: use each subset in turn for testing, the remainder for training  Called k-fold cross-validation  Often the subsets are stratified before the cross-validation is performed  The error estimates are averaged to yield an overall error estimate Lecture 8: ML in Practice (1)
  • 25. More on cross-validation  Standard method for evaluation: stratified ten-fold cross-validation  Why ten?  Extensive experiments have shown that this is the best choice to get an accurate estimate  There is also some theoretical evidence for this  Stratification reduces the estimate’s variance  Even better: repeated stratified cross-validation  E.g. ten-fold cross-validation is repeated ten times and results are averaged (reduces the variance) Lecture 8: ML in Practice (1)
  • 26. Leave-One-Out cross-validation  Leave-One-Out: a particular form of cross-validation:  Set number of folds to number of training instances  I.e., for n training instances, build classifier n times  Makes best use of the data  Involves no random subsampling  Very computationally expensive  (exception: NN) Lecture 8: ML in Practice (1)
  • 27. Leave-One-Out-CV and stratification  Disadvantage of Leave-One-Out-CV: stratification is not possible  It guarantees a non-stratified sample because there is only one instance in the test set! Lecture 8: ML in Practice (1)
  • 28. The bootstrap  CV uses sampling without replacement  The same instance, once selected, can not be selected again for a particular training/test set  The bootstrap uses sampling with replacement to form the training set  Sample a dataset of n instances n times with replacement to form a new dataset of n instances  Use this data as the training set  Use the instances from the original dataset that don’t occur in the new training set for testing Lecture 8: ML in Practice (1)
  • 29. The 0.632 bootstrap  Also called the 0.632 bootstrap  A particular instance has a probability of 1–1/n of not being picked  Thus its probability of ending up in the test data is:  This means the training data will contain approximately 63.2% of the instances Lecture 8: ML in Practice (1)
  • 30. Estimating error with the bootstrap  The error estimate on the test data will be very pessimistic  Trained on just ~63% of the instances  Therefore, combine it with the resubstitution/empirical error:  The resubstitution/empirical error gets less weight than the error on the test data  Repeat process several times with different replacement samples; average the results Lecture 8: ML in Practice (1)
  • 31. More on the bootstrap  Probably the best way of estimating performance for very small datasets  However, it has some problems…. Lecture 8: ML in Practice (1)
  • 32. The end Lecture 8: ML in Practice (1)