SlideShare a Scribd company logo
Practical Data Science
An Introduction to Supervised Machine Learning
and Pattern Classification: The Big Picture
Michigan State University
NextGen Bioinformatics Seminars - 2015
Sebastian Raschka
Feb. 11, 2015
A Little Bit About Myself ...
Developing software & methods for
- Protein ligand docking
- Large scale drug/inhibitor discovery
PhD candidate in Dr. L. Kuhn’s Lab:
and some other machine learning side-projects …
What is Machine Learning?
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
"Field of study that gives computers the
ability to learn without being explicitly
programmed.”
(Arthur Samuel, 1959)
By Phillip Taylor [CC BY 2.0]
http://commons.wikimedia.org/wiki/
File:American_book_company_1916._letter_envelope-2.JPG#filelinks
[public domain]
https://flic.kr/p/5BLW6G [CC BY 2.0]
Text Recognition
Spam Filtering
Biology
Examples of Machine Learning
Examples of Machine Learning
http://googleresearch.blogspot.com/2014/11/a-picture-is-worth-thousand-coherent.html
By Steve Jurvetson [CC BY 2.0]
Self-driving cars
Photo search
and many, many
more ...
Recommendation systems
http://commons.wikimedia.org/wiki/File:Netflix_logo.svg [public domain]
How many of you have used
machine learning before?
Our Agenda
• Concepts and the big picture
• Workflow
• Practical tips & good habits
Learning
• Labeled data
• Direct feedback
• Predict outcome/future
• Decision process
• Reward system
• Learn series of actions
• No labels
• No feedback
• “Find hidden structure”
Unsupervised
Supervised
Reinforcement
Unsupervised
learning
Supervised
learning
Clustering:
[DBSCAN on a toy dataset]
Classification:
[SVM on 2 classes of the Wine dataset]
Regression:
[Soccer Fantasy Score prediction]
Today’s topic
Supervised LearningUnsupervised Learning
Instances (samples, observations)
Features (attributes, dimensions) Classes (targets)
Nomenclature
sepal_length sepal_width petal_length petal_width class
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
… … … … … …
50 6.4 3.2 4.5 1.5 veriscolor
… … … … … …
150 5.9 3.0 5.1 1.8 virginica
https://archive.ics.uci.edu/ml/datasets/Iris
IRIS
Classification
x1
x2
class1
class2
1) Learn from training data
2) Map unseen (new) data
?
Feature Extraction
Feature Selection
Dimensionality Reduction
Feature Scaling
Raw Data Collection
Pre-Processing
Sampling
Test Dataset
Training Dataset
Learning Algorithm
Training
Post-Processing
Cross Validation
Final Classification/
Regression Model
New DataPre-Processing
Refinement
Prediction
Split
Supervised
Learning
Sebastian Raschka 2014
Missing Data
Performance Metrics
Model Selection
Hyperparameter
Optimization
This work is licensed under a Creative Commons Attribution 4.0 International License.
Final Model
Evaluation
Feature Extraction
Feature Selection
Dimensionality Reduction
Feature Scaling
Raw Data Collection
Pre-Processing
Sampling
Test Dataset
Training Dataset
Learning Algorithm
Training
Post-Processing
Cross Validation
Final Classification/
Regression Model
New DataPre-Processing
Refinement
Prediction
Split
Supervised
Learning
Sebastian Raschka 2014
Missing Data
Performance Metrics
Model Selection
Hyperparameter
Optimization
This work is licensed under a Creative Commons Attribution 4.0 International License.
Final Model
Evaluation
A Few Common Classifiers
Decision Tree
Perceptron Naive Bayes
Ensemble Methods: Random Forest, Bagging, AdaBoost
Support Vector Machine
K-Nearest Neighbor
Logistic Regression
Artificial Neural Network / Deep Learning
Discriminative Algorithms
Generative Algorithms
• Models a more general problem: how the data was generated.
• I.e., the distribution of the class; joint probability distribution p(x,y).
• Naive Bayes, Bayesian Belief Network classifier, Restricted
Boltzmann Machine …
• Map x → y directly.
• E.g., distinguish between people speaking different languages
without learning the languages.
• Logistic Regression, SVM, Neural Networks …
Examples of Discriminative Classifiers:
Perceptron
xi1
xi2
w1
w2
Σ yi
y = wTx = w0 + w1x1 + w2x2
1
F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.
x1
x2
y ∈ {-1,1}
w0
wj = weight
xi = training sample
yi = desired output
yi = actual output
t = iteration step
η = learning rate
θ = threshold (here 0)
update rule:
wj(t+1) = wj(t) + η(yi - yi)xi
1 if wTxi ≥ θ
-1 otherwise
^
^
^
^
yi
^
until
t+1 = max iter
or error = 0
Discriminative Classifiers:
Perceptron
F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.
- Binary classifier (one vs all, OVA)
- Convergence problems (set n iterations)
- Modification: stochastic gradient descent
- “Modern” perceptron: Support Vector Machine (maximize margin)
- Multilayer perceptron (MLP)
xi1
xi2
w1
w2
Σ yi
1
y ∈ {-1,1}
w0
^
x1
x2
Generative Classifiers:
Naive Bayes
Bayes Theorem: P(ωj | xi) =
P(xi | ωj) P(ωj)
P(xi)
Posterior probability =
Likelihood x Prior probability
Evidence
Iris example: P(“Setosa"| xi), xi = [4.5 cm, 7.4 cm]
Generative Classifiers:
Naive Bayes
Decision Rule:
Bayes Theorem: P(ωj | xi) =
P(xi | ωj) P(ωj)
P(xi)
pred. class label ωj argmax P(ωj | xi)
i = 1, …, m
e.g., j ∈ {Setosa, Versicolor, Virginica}
Class-conditional
probability
(here Gaussian kernel):
Generative Classifiers:
Naive Bayes
Prior probability:
Evidence:
(cancels out)
(class frequency)
P(ωj | xi) =
P(xi | ωj) P(ωj)
P(xi)
P(ωj) =
Nωj
Nc
P(xik |ωj) =
1
√ (2 π σωj
2)
exp
( )-
(xik - μωj)2
2σωj
2
P(xi |ωj) P(xik |ωj)
Generative Classifiers:
Naive Bayes
- Naive conditional independence assumption typically
violated
- Works well for small datasets
- Multinomial model still quite popular for text classification
(e.g., spam filter)
Non-Parametric Classifiers:
K-Nearest Neighbor
- Simple!
- Lazy learner
- Very susceptible to curse of dimensionality
k=3
e.g., k=1
Iris Example
Setosa Virginica Versicolor
k = 3
mahalanobis dist.
uniform weights
C = 3
depth = 2
Decision Tree
Entropy =
depth = 4
petal length <= 2.45?
petal length <= 4.75?
Setosa
Virginica Versicolor
Yes No
e.g., 2 (- 0.5 log2(0.5)) = 1
∑−pi logk pi
i
Information Gain =
entropy(parent) – [avg entropy(children)]
NoYes
depth = 2
"No Free Lunch" :(
Roughly speaking:
“No one model works best for all possible situations.”
Our model is a simplification of reality
Simplification is based on assumptions (model bias)
Assumptions fail in certain situations
D. H. Wolpert. The supervised learning no-free-lunch theorems. In Soft Computing and Industry, pages 25–42. Springer, 2002.
Which Algorithm?
• What is the size and dimensionality of my training set?
• Is the data linearly separable?
• How much do I care about computational efficiency?
- Model building vs. real-time prediction time
- Eager vs. lazy learning / on-line vs. batch learning
- prediction performance vs. speed
• Do I care about interpretability or should it "just work well?"
• ...
Feature Extraction
Feature Selection
Dimensionality Reduction
Feature Scaling
Raw Data Collection
Pre-Processing
Sampling
Test Dataset
Training Dataset
Learning Algorithm
Training
Post-Processing
Cross Validation
Final Classification/
Regression Model
New DataPre-Processing
Refinement
Prediction
Split
Supervised
Learning
Sebastian Raschka 2014
Missing Data
Performance Metrics
Model Selection
Hyperparameter
Optimization
This work is licensed under a Creative Commons Attribution 4.0 International License.
Final Model
Evaluation
Missing Values:
- Remove features (columns)
- Remove samples (rows)
- Imputation (mean, nearest neighbor, …)
Sampling:
- Random split into training and validation sets
- Typically 60/40, 70/30, 80/20
- Don’t use validation set until the very end! (overfitting)
Feature Scaling:
e.g., standardization:
- Faster convergence (gradient descent)
- Distances on same scale (k-NN with Euclidean distance)
- Mean centering for free
- Normal distributed data
- Numerical stability by avoiding small weights
z =
xik - μk
σk
(use same parameters for the test/new data!)
Categorical Variables
color size prize class
label
0 green M 10.1 class1
1 red L 13.5 class2
2 blue XL 15.3 class1
ordinalnominal
green → (1,0,0)
red → (0,1,0)
blue → (0,0,1)
class
label
color=blue color=green color=red prize size
0 0 0 1 0 10.1 1
1 1 0 0 1 13.5 2
2 0 1 0 0 15.3 3
M → 1
L → 2
XL → 3
Feature Extraction
Feature Selection
Dimensionality Reduction
Feature Scaling
Raw Data Collection
Pre-Processing
Sampling
Test Dataset
Training Dataset
Learning Algorithm
Training
Post-Processing
Cross Validation
Final Classification/
Regression Model
New DataPre-Processing
Refinement
Prediction
Split
Supervised
Learning
Sebastian Raschka 2014
Missing Data
Performance Metrics
Model Selection
Hyperparameter
Optimization
This work is licensed under a Creative Commons Attribution 4.0 International License.
Final Model
Evaluation
Generalization Error and Overfitting
How well does the model perform on unseen data?
Generalization Error and Overfitting
Error Metrics: Confusion Matrix
TP
[Linear SVM on sepal/petal lengths]
TN
FN
FP
here: “setosa” = “positive”
Error Metrics
TP
[Linear SVM on sepal/petal lengths]
TN
FN
FP
here: “setosa” = “positive”
TP + TN
FP +FN +TP +TN
Accuracy =
= 1 - Error
FP
N
TP
P
False Positive Rate =
TP
TP + FP
Precision =
True Positive Rate =
(Recall)
“micro” and “macro”
averaging for multi-class
Receiver Operating Characteristic
(ROC) Curves
Test set
Training dataset Test dataset
Complete dataset
Test set
Test set
Test set
1st iteration calc. error
calc. error
calc. error
calc. error
calculate
avg. error
k-fold cross-validation (k=4):
2nd iteration
3rd iteration
4th iteration
fold 1 fold 2 fold 3 fold 4
Model Selection
k-fold CV and ROC
Feature Selection
- Domain knowledge
- Variance threshold
- Exhaustive search
- Decision trees
- …
IMPORTANT!
(Noise, overfitting, curse of dimensionality, efficiency)
X = [x1, x2, x3, x4]start:
stop:
(if d = k)
X = [x1, x3, x4]
X = [x1, x3]
Simplest example:
Greedy Backward Selection
Dimensionality Reduction
• Transformation onto a new feature subspace
• e.g., Principal Component Analysis (PCA)
• Find directions of maximum variance
• Retain most of the information
0. Standardize data
1. Compute covariance matrix
z =
xik - μk
σik = ∑ (xij - µj) (xik - µk)
σk
1
in -1
σ2
1 σ12 σ13 σ14
σ21 σ2
2 σ23 σ24
σ31 σ32 σ2
3 σ34
σ41 σ42 σ43 σ2
4
∑ =
PCA in 3 Steps
2. Eigendecomposition and sorting eigenvalues
PCA in 3 Steps
X v = λ v
Eigenvectors
[[ 0.52237162 -0.37231836 -0.72101681 0.26199559]
[-0.26335492 -0.92555649 0.24203288 -0.12413481]
[ 0.58125401 -0.02109478 0.14089226 -0.80115427]
[ 0.56561105 -0.06541577 0.6338014 0.52354627]]
Eigenvalues
[ 2.93035378 0.92740362 0.14834223 0.02074601]
(from high to low)
3. Select top k eigenvectors and transform data
PCA in 3 Steps
Eigenvectors
[[ 0.52237162 -0.37231836 -0.72101681 0.26199559]
[-0.26335492 -0.92555649 0.24203288 -0.12413481]
[ 0.58125401 -0.02109478 0.14089226 -0.80115427]
[ 0.56561105 -0.06541577 0.6338014 0.52354627]]
Eigenvalues
[ 2.93035378 0.92740362 0.14834223 0.02074601]
[First 2 PCs of Iris]
Hyperparameter Optimization:
GridSearch in scikit-learn
Non-Linear Problems
- XOR gate
k=11
uniform weights
C=1
C=1000,
gamma=0.1
depth=4
Kernel Trick
Kernel function
Kernel
Map onto high-dimensional space (non-linear combinations)
Kernel Trick
Trick: No explicit dot product!
Radius Basis Function (RBF) Kernel:
Kernel PCA
PC1, linear PCA PC1, kernel PCA
Feature Extraction
Feature Selection
Dimensionality Reduction
Feature Scaling
Raw Data Collection
Pre-Processing
Sampling
Test Dataset
Training Dataset
Learning Algorithm
Training
Post-Processing
Cross Validation
Final Classification/
Regression Model
New DataPre-Processing
Refinement
Prediction
Split
Supervised
Learning
Sebastian Raschka 2014
Missing Data
Performance Metrics
Model Selection
Hyperparameter
Optimization
This work is licensed under a Creative Commons Attribution 4.0 International License.
Final Model
Evaluation
Questions?
mail@sebastianraschka.com
https://github.com/rasbt
@rasbt
Thanks!
Additional Slides
Inspiring Literature
P. N. Klein. Coding the Matrix: Linear
Algebra Through Computer Science
Applications. Newtonian Press, 2013.
R. Schutt and C. O’Neil. Doing Data
Science: Straight Talk from the Frontline.
O’Reilly Media, Inc., 2013.
S. Gutierrez. Data Scientists at Work.
Apress, 2014.
R. O. Duda, P. E. Hart, and D. G. Stork.
Pattern classification. 2nd. Edition. New
York, 2001.
Useful Online Resources
https://www.coursera.org/course/ml
http://stats.stackexchange.com
http://www.kaggle.com
My Favorite Tools
http://stanford.edu/~mwaskom/software/seaborn/
Seaborn
http://www.numpy.org
http://pandas.pydata.org
http://scikit-learn.org/stable/
http://ipython.org/notebook.html
class1
class2
Which one to pick?
class1
class2
The problem of overfitting
Generalization error!

More Related Content

What's hot

Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
Megha Sharma
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
Anandha L Ranganathan
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
Andrew Ferlitsch
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
AbhishekKumar4995
 
Feature selection
Feature selectionFeature selection
Feature selection
Dong Guo
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
hktripathy
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
AyanaRukasar
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Rahul Kumar
 

What's hot (20)

Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Random forest
Random forestRandom forest
Random forest
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Pattern Recognition
Pattern RecognitionPattern Recognition
Pattern Recognition
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Viewers also liked

Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
NhatHai Phan
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
joshwills
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
Daniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
Prof. Dr. Diego Kuonen
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
Imry Kissos
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013Philip Zheng
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Devashish Shanker
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
Pier Luca Lanzi
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
David Pittman
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
Owen Zhang
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
DataRobot
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentationlpaviglianiti
 

Viewers also liked (20)

Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 

Similar to An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture

Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
Xavier Rafael Palou
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
HODIT12
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
MostafaHazemMostafaa
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Marina Santini
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
ShwetaPatil174
 
Clustering
ClusteringClustering
Clustering
Smrutiranjan Sahu
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
Marina Santini
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
Sai Srinivas Kotni
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
hemangppatel
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
Vahid Mirjalili
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
Stefan Duprey
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Association for Computational Linguistics
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
Kaniska Mandal
 
My7class
My7classMy7class
My7class
ketan533
 
Free Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comFree Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.com
Edhole.com
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 

Similar to An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture (20)

Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
Clustering
ClusteringClustering
Clustering
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
My7class
My7classMy7class
My7class
 
Free Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comFree Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.com
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 

Recently uploaded

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 

An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture

  • 1. Practical Data Science An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture Michigan State University NextGen Bioinformatics Seminars - 2015 Sebastian Raschka Feb. 11, 2015
  • 2. A Little Bit About Myself ... Developing software & methods for - Protein ligand docking - Large scale drug/inhibitor discovery PhD candidate in Dr. L. Kuhn’s Lab: and some other machine learning side-projects …
  • 3. What is Machine Learning? http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram "Field of study that gives computers the ability to learn without being explicitly programmed.” (Arthur Samuel, 1959) By Phillip Taylor [CC BY 2.0]
  • 5. Examples of Machine Learning http://googleresearch.blogspot.com/2014/11/a-picture-is-worth-thousand-coherent.html By Steve Jurvetson [CC BY 2.0] Self-driving cars Photo search and many, many more ... Recommendation systems http://commons.wikimedia.org/wiki/File:Netflix_logo.svg [public domain]
  • 6. How many of you have used machine learning before?
  • 7. Our Agenda • Concepts and the big picture • Workflow • Practical tips & good habits
  • 8. Learning • Labeled data • Direct feedback • Predict outcome/future • Decision process • Reward system • Learn series of actions • No labels • No feedback • “Find hidden structure” Unsupervised Supervised Reinforcement
  • 9. Unsupervised learning Supervised learning Clustering: [DBSCAN on a toy dataset] Classification: [SVM on 2 classes of the Wine dataset] Regression: [Soccer Fantasy Score prediction] Today’s topic Supervised LearningUnsupervised Learning
  • 10. Instances (samples, observations) Features (attributes, dimensions) Classes (targets) Nomenclature sepal_length sepal_width petal_length petal_width class 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa … … … … … … 50 6.4 3.2 4.5 1.5 veriscolor … … … … … … 150 5.9 3.0 5.1 1.8 virginica https://archive.ics.uci.edu/ml/datasets/Iris IRIS
  • 11. Classification x1 x2 class1 class2 1) Learn from training data 2) Map unseen (new) data ?
  • 12. Feature Extraction Feature Selection Dimensionality Reduction Feature Scaling Raw Data Collection Pre-Processing Sampling Test Dataset Training Dataset Learning Algorithm Training Post-Processing Cross Validation Final Classification/ Regression Model New DataPre-Processing Refinement Prediction Split Supervised Learning Sebastian Raschka 2014 Missing Data Performance Metrics Model Selection Hyperparameter Optimization This work is licensed under a Creative Commons Attribution 4.0 International License. Final Model Evaluation
  • 13. Feature Extraction Feature Selection Dimensionality Reduction Feature Scaling Raw Data Collection Pre-Processing Sampling Test Dataset Training Dataset Learning Algorithm Training Post-Processing Cross Validation Final Classification/ Regression Model New DataPre-Processing Refinement Prediction Split Supervised Learning Sebastian Raschka 2014 Missing Data Performance Metrics Model Selection Hyperparameter Optimization This work is licensed under a Creative Commons Attribution 4.0 International License. Final Model Evaluation
  • 14. A Few Common Classifiers Decision Tree Perceptron Naive Bayes Ensemble Methods: Random Forest, Bagging, AdaBoost Support Vector Machine K-Nearest Neighbor Logistic Regression Artificial Neural Network / Deep Learning
  • 15. Discriminative Algorithms Generative Algorithms • Models a more general problem: how the data was generated. • I.e., the distribution of the class; joint probability distribution p(x,y). • Naive Bayes, Bayesian Belief Network classifier, Restricted Boltzmann Machine … • Map x → y directly. • E.g., distinguish between people speaking different languages without learning the languages. • Logistic Regression, SVM, Neural Networks …
  • 16. Examples of Discriminative Classifiers: Perceptron xi1 xi2 w1 w2 Σ yi y = wTx = w0 + w1x1 + w2x2 1 F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957. x1 x2 y ∈ {-1,1} w0 wj = weight xi = training sample yi = desired output yi = actual output t = iteration step η = learning rate θ = threshold (here 0) update rule: wj(t+1) = wj(t) + η(yi - yi)xi 1 if wTxi ≥ θ -1 otherwise ^ ^ ^ ^ yi ^ until t+1 = max iter or error = 0
  • 17. Discriminative Classifiers: Perceptron F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957. - Binary classifier (one vs all, OVA) - Convergence problems (set n iterations) - Modification: stochastic gradient descent - “Modern” perceptron: Support Vector Machine (maximize margin) - Multilayer perceptron (MLP) xi1 xi2 w1 w2 Σ yi 1 y ∈ {-1,1} w0 ^ x1 x2
  • 18. Generative Classifiers: Naive Bayes Bayes Theorem: P(ωj | xi) = P(xi | ωj) P(ωj) P(xi) Posterior probability = Likelihood x Prior probability Evidence Iris example: P(“Setosa"| xi), xi = [4.5 cm, 7.4 cm]
  • 19. Generative Classifiers: Naive Bayes Decision Rule: Bayes Theorem: P(ωj | xi) = P(xi | ωj) P(ωj) P(xi) pred. class label ωj argmax P(ωj | xi) i = 1, …, m e.g., j ∈ {Setosa, Versicolor, Virginica}
  • 20. Class-conditional probability (here Gaussian kernel): Generative Classifiers: Naive Bayes Prior probability: Evidence: (cancels out) (class frequency) P(ωj | xi) = P(xi | ωj) P(ωj) P(xi) P(ωj) = Nωj Nc P(xik |ωj) = 1 √ (2 π σωj 2) exp ( )- (xik - μωj)2 2σωj 2 P(xi |ωj) P(xik |ωj)
  • 21. Generative Classifiers: Naive Bayes - Naive conditional independence assumption typically violated - Works well for small datasets - Multinomial model still quite popular for text classification (e.g., spam filter)
  • 22. Non-Parametric Classifiers: K-Nearest Neighbor - Simple! - Lazy learner - Very susceptible to curse of dimensionality k=3 e.g., k=1
  • 23. Iris Example Setosa Virginica Versicolor k = 3 mahalanobis dist. uniform weights C = 3 depth = 2
  • 24. Decision Tree Entropy = depth = 4 petal length <= 2.45? petal length <= 4.75? Setosa Virginica Versicolor Yes No e.g., 2 (- 0.5 log2(0.5)) = 1 ∑−pi logk pi i Information Gain = entropy(parent) – [avg entropy(children)] NoYes depth = 2
  • 25. "No Free Lunch" :( Roughly speaking: “No one model works best for all possible situations.” Our model is a simplification of reality Simplification is based on assumptions (model bias) Assumptions fail in certain situations D. H. Wolpert. The supervised learning no-free-lunch theorems. In Soft Computing and Industry, pages 25–42. Springer, 2002.
  • 26. Which Algorithm? • What is the size and dimensionality of my training set? • Is the data linearly separable? • How much do I care about computational efficiency? - Model building vs. real-time prediction time - Eager vs. lazy learning / on-line vs. batch learning - prediction performance vs. speed • Do I care about interpretability or should it "just work well?" • ...
  • 27. Feature Extraction Feature Selection Dimensionality Reduction Feature Scaling Raw Data Collection Pre-Processing Sampling Test Dataset Training Dataset Learning Algorithm Training Post-Processing Cross Validation Final Classification/ Regression Model New DataPre-Processing Refinement Prediction Split Supervised Learning Sebastian Raschka 2014 Missing Data Performance Metrics Model Selection Hyperparameter Optimization This work is licensed under a Creative Commons Attribution 4.0 International License. Final Model Evaluation
  • 28. Missing Values: - Remove features (columns) - Remove samples (rows) - Imputation (mean, nearest neighbor, …) Sampling: - Random split into training and validation sets - Typically 60/40, 70/30, 80/20 - Don’t use validation set until the very end! (overfitting) Feature Scaling: e.g., standardization: - Faster convergence (gradient descent) - Distances on same scale (k-NN with Euclidean distance) - Mean centering for free - Normal distributed data - Numerical stability by avoiding small weights z = xik - μk σk (use same parameters for the test/new data!)
  • 29. Categorical Variables color size prize class label 0 green M 10.1 class1 1 red L 13.5 class2 2 blue XL 15.3 class1 ordinalnominal green → (1,0,0) red → (0,1,0) blue → (0,0,1) class label color=blue color=green color=red prize size 0 0 0 1 0 10.1 1 1 1 0 0 1 13.5 2 2 0 1 0 0 15.3 3 M → 1 L → 2 XL → 3
  • 30. Feature Extraction Feature Selection Dimensionality Reduction Feature Scaling Raw Data Collection Pre-Processing Sampling Test Dataset Training Dataset Learning Algorithm Training Post-Processing Cross Validation Final Classification/ Regression Model New DataPre-Processing Refinement Prediction Split Supervised Learning Sebastian Raschka 2014 Missing Data Performance Metrics Model Selection Hyperparameter Optimization This work is licensed under a Creative Commons Attribution 4.0 International License. Final Model Evaluation
  • 31. Generalization Error and Overfitting How well does the model perform on unseen data?
  • 32. Generalization Error and Overfitting
  • 33. Error Metrics: Confusion Matrix TP [Linear SVM on sepal/petal lengths] TN FN FP here: “setosa” = “positive”
  • 34. Error Metrics TP [Linear SVM on sepal/petal lengths] TN FN FP here: “setosa” = “positive” TP + TN FP +FN +TP +TN Accuracy = = 1 - Error FP N TP P False Positive Rate = TP TP + FP Precision = True Positive Rate = (Recall) “micro” and “macro” averaging for multi-class
  • 36. Test set Training dataset Test dataset Complete dataset Test set Test set Test set 1st iteration calc. error calc. error calc. error calc. error calculate avg. error k-fold cross-validation (k=4): 2nd iteration 3rd iteration 4th iteration fold 1 fold 2 fold 3 fold 4 Model Selection
  • 38. Feature Selection - Domain knowledge - Variance threshold - Exhaustive search - Decision trees - … IMPORTANT! (Noise, overfitting, curse of dimensionality, efficiency) X = [x1, x2, x3, x4]start: stop: (if d = k) X = [x1, x3, x4] X = [x1, x3] Simplest example: Greedy Backward Selection
  • 39. Dimensionality Reduction • Transformation onto a new feature subspace • e.g., Principal Component Analysis (PCA) • Find directions of maximum variance • Retain most of the information
  • 40. 0. Standardize data 1. Compute covariance matrix z = xik - μk σik = ∑ (xij - µj) (xik - µk) σk 1 in -1 σ2 1 σ12 σ13 σ14 σ21 σ2 2 σ23 σ24 σ31 σ32 σ2 3 σ34 σ41 σ42 σ43 σ2 4 ∑ = PCA in 3 Steps
  • 41. 2. Eigendecomposition and sorting eigenvalues PCA in 3 Steps X v = λ v Eigenvectors [[ 0.52237162 -0.37231836 -0.72101681 0.26199559] [-0.26335492 -0.92555649 0.24203288 -0.12413481] [ 0.58125401 -0.02109478 0.14089226 -0.80115427] [ 0.56561105 -0.06541577 0.6338014 0.52354627]] Eigenvalues [ 2.93035378 0.92740362 0.14834223 0.02074601] (from high to low)
  • 42. 3. Select top k eigenvectors and transform data PCA in 3 Steps Eigenvectors [[ 0.52237162 -0.37231836 -0.72101681 0.26199559] [-0.26335492 -0.92555649 0.24203288 -0.12413481] [ 0.58125401 -0.02109478 0.14089226 -0.80115427] [ 0.56561105 -0.06541577 0.6338014 0.52354627]] Eigenvalues [ 2.93035378 0.92740362 0.14834223 0.02074601] [First 2 PCs of Iris]
  • 44. Non-Linear Problems - XOR gate k=11 uniform weights C=1 C=1000, gamma=0.1 depth=4
  • 45. Kernel Trick Kernel function Kernel Map onto high-dimensional space (non-linear combinations)
  • 46. Kernel Trick Trick: No explicit dot product! Radius Basis Function (RBF) Kernel:
  • 47. Kernel PCA PC1, linear PCA PC1, kernel PCA
  • 48. Feature Extraction Feature Selection Dimensionality Reduction Feature Scaling Raw Data Collection Pre-Processing Sampling Test Dataset Training Dataset Learning Algorithm Training Post-Processing Cross Validation Final Classification/ Regression Model New DataPre-Processing Refinement Prediction Split Supervised Learning Sebastian Raschka 2014 Missing Data Performance Metrics Model Selection Hyperparameter Optimization This work is licensed under a Creative Commons Attribution 4.0 International License. Final Model Evaluation
  • 51. Inspiring Literature P. N. Klein. Coding the Matrix: Linear Algebra Through Computer Science Applications. Newtonian Press, 2013. R. Schutt and C. O’Neil. Doing Data Science: Straight Talk from the Frontline. O’Reilly Media, Inc., 2013. S. Gutierrez. Data Scientists at Work. Apress, 2014. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. 2nd. Edition. New York, 2001.
  • 55. class1 class2 The problem of overfitting Generalization error!