SUPERVISED AND
UNSUPERVISED LEARNING
TECHNIQUES
-Ms. Keren Evangeline. I / AP(O.G)
SRM Valliammai Engineering College
Department of Medical Electronics
Faculty Development Programme
LIST OF COMMON UNSUPERVISED ML
MODELS
• K – Means Clustering
• Principal component analysis (Dimensionality
Reduction)
• Hierarchial clustering
ADVANTAGES OF UNSUPERVISED ML MODELS
DISADVANTAGES OF UNSUPERVISED ML MODELS
DECISION TREE
[Size < 2000?]
/ 
Yes (<=2000) No (>2000)
/ 
[Bedrooms < 3?] [Lot Size < 5000 sq.ft?]
/  / 
Yes (<=3) No (>3) Yes (<5000) No (>5000)
/  / 
[Year < 1990?] [Price: $400K] [Price: $450K] [Price:$550K]
/ 
Yes No
/ 
[Price: $300K] [Price: $350K]
Continuous Variable Decision Tree
RANDOM
FOREST
Introduction
• A random Forest Algorithm is a supervised machine learning Algorithm consisting
decision trees.
• The general method of random decision forests was first proposed by Ho in 1995.
After that, It was developed by Leo Breiman in 2001.
R A N D O M F O R E S T 33
Decision Tree
R A N D O M F O R E S T 34
Decision Tree – Important Terms
R A N D O M F O R E S T 35
How Decision Tree Works
• It follows a tree-like model of decisions and their possible consequences.
• The algorithm works by recursively splitting the data into subsets based on the most significant feature at each
node of the tree.
R A N D O M F O R E S T 36
How Decision Tree Works
R A N D O M F O R E S T 37
Ensemble Learning
Ensemble learning creates a stronger model by aggregating the predictions of multiple weak models. Random
Forest is an example of ensemble learning where each model is a decision tree. The idea behind it is- the
wisdom of the crowd. The majority vote aggregation can have better accuracy than the individual models.
R A N D O M F O R E S T 38
Dataset
Preparation
Bootstrap Aggregating (Bagging)
Creating a different training subset randomly from the
original training dataset with replacement is called
Bagging. With replacement refers to the bootstrap sample
having duplicate elements. Reduces variance, helps to
avoid overfitting.
Out Of Bag (OOB)
The out-of-bag dataset represents the remaining elements
which were not in the bootstrap dataset. Used for cross
validation or to evaluate the performance.
Random Subspace Method(Feature Bagging)
While building the tree, for splitting, a randomly selected
subset of the features are used. So, the trees are
more different having low correlation.
R A N D O M F O R E S T 39
Bagging at training time​
R A N D O M F O R E S T 40
Bagging at inference time
R A N D O M F O R E S T 41
Random Subspace Method at training time
R A N D O M F O R E S T 42
Random Subspace Method at inference time
R A N D O M F O R E S T 43
Definition
The Random Forest algorithm is an ensemble learning method consisting of many decision
trees that are built by using bagging and feature bagging which helps to create an
uncorrelated forest of trees whose combined prediction is more accurate than that of a single
tree. For classification tasks, final prediction is done by taking majority votes and for
regression tasks, average of all the individual trees.
R A N D O M F O R E S T 44
Random Forest Model
R A N D O M F O R E S T 45
Why Use Random Forest
 Random forests are an effective tool in prediction.
 Forests give results competitive with boosting and adaptive bagging, yet do not
progressively change the training set.
 Random inputs and random features produce good results in classification- less so in
regression.
 For larger data sets, we can gain accuracy by combining random features with boosting.
R A N D O M F O R E S T 46
Advantages
 Versatile uses
 Easy-to-understand hyperparameters
 Classifier doesn't overfit with enough
trees
Disadvantages
 Increased accuracy requires more trees
 More trees slow down model
 Can’t describe relationships within data
Advantages and Disadvantages
Random Forest Applications
 Detects reliable debtors and potential fraudsters in finance
 Verifies medicine components and patient data in healthcare
 Gauges whether customers will like products in e-commerce
R A N D O M F O R E S T 48
Thank you
Any Question?
R A N D O M F O R E S T 49

Supervised and Unsupervised Learning .pptx

  • 1.
    SUPERVISED AND UNSUPERVISED LEARNING TECHNIQUES -Ms.Keren Evangeline. I / AP(O.G) SRM Valliammai Engineering College Department of Medical Electronics Faculty Development Programme
  • 14.
    LIST OF COMMONUNSUPERVISED ML MODELS • K – Means Clustering • Principal component analysis (Dimensionality Reduction) • Hierarchial clustering
  • 15.
  • 16.
  • 17.
  • 24.
    [Size < 2000?] / Yes (<=2000) No (>2000) / [Bedrooms < 3?] [Lot Size < 5000 sq.ft?] / / Yes (<=3) No (>3) Yes (<5000) No (>5000) / / [Year < 1990?] [Price: $400K] [Price: $450K] [Price:$550K] / Yes No / [Price: $300K] [Price: $350K] Continuous Variable Decision Tree
  • 32.
  • 33.
    Introduction • A randomForest Algorithm is a supervised machine learning Algorithm consisting decision trees. • The general method of random decision forests was first proposed by Ho in 1995. After that, It was developed by Leo Breiman in 2001. R A N D O M F O R E S T 33
  • 34.
    Decision Tree R AN D O M F O R E S T 34
  • 35.
    Decision Tree –Important Terms R A N D O M F O R E S T 35
  • 36.
    How Decision TreeWorks • It follows a tree-like model of decisions and their possible consequences. • The algorithm works by recursively splitting the data into subsets based on the most significant feature at each node of the tree. R A N D O M F O R E S T 36
  • 37.
    How Decision TreeWorks R A N D O M F O R E S T 37
  • 38.
    Ensemble Learning Ensemble learningcreates a stronger model by aggregating the predictions of multiple weak models. Random Forest is an example of ensemble learning where each model is a decision tree. The idea behind it is- the wisdom of the crowd. The majority vote aggregation can have better accuracy than the individual models. R A N D O M F O R E S T 38
  • 39.
    Dataset Preparation Bootstrap Aggregating (Bagging) Creatinga different training subset randomly from the original training dataset with replacement is called Bagging. With replacement refers to the bootstrap sample having duplicate elements. Reduces variance, helps to avoid overfitting. Out Of Bag (OOB) The out-of-bag dataset represents the remaining elements which were not in the bootstrap dataset. Used for cross validation or to evaluate the performance. Random Subspace Method(Feature Bagging) While building the tree, for splitting, a randomly selected subset of the features are used. So, the trees are more different having low correlation. R A N D O M F O R E S T 39
  • 40.
    Bagging at trainingtime​ R A N D O M F O R E S T 40
  • 41.
    Bagging at inferencetime R A N D O M F O R E S T 41
  • 42.
    Random Subspace Methodat training time R A N D O M F O R E S T 42
  • 43.
    Random Subspace Methodat inference time R A N D O M F O R E S T 43
  • 44.
    Definition The Random Forestalgorithm is an ensemble learning method consisting of many decision trees that are built by using bagging and feature bagging which helps to create an uncorrelated forest of trees whose combined prediction is more accurate than that of a single tree. For classification tasks, final prediction is done by taking majority votes and for regression tasks, average of all the individual trees. R A N D O M F O R E S T 44
  • 45.
    Random Forest Model RA N D O M F O R E S T 45
  • 46.
    Why Use RandomForest  Random forests are an effective tool in prediction.  Forests give results competitive with boosting and adaptive bagging, yet do not progressively change the training set.  Random inputs and random features produce good results in classification- less so in regression.  For larger data sets, we can gain accuracy by combining random features with boosting. R A N D O M F O R E S T 46
  • 47.
    Advantages  Versatile uses Easy-to-understand hyperparameters  Classifier doesn't overfit with enough trees Disadvantages  Increased accuracy requires more trees  More trees slow down model  Can’t describe relationships within data Advantages and Disadvantages
  • 48.
    Random Forest Applications Detects reliable debtors and potential fraudsters in finance  Verifies medicine components and patient data in healthcare  Gauges whether customers will like products in e-commerce R A N D O M F O R E S T 48
  • 49.
    Thank you Any Question? RA N D O M F O R E S T 49