Ensemble hybrid learning technique

ENSEMBLE
HYBRID FEATURE
SELECTION
TECHNIQUE Name: Disha Sinha
Semester: 6th
Year: 3rd
Section: B
University Roll Number :10900117090

CONTENTS
➢ Introduction
➢ Feature Selection
➢ Feature Selection vs Dimensionality Reduction
➢ Types of Feature Selection
➢ Ensembles
➢ Why Ensembles
➢ Types of Ensembles
➢ Application of ensembles
➢ Conclusion

INTRODUCTION
● As the amount of stored information increases, the ability to make use of it is not
proportional.
● In high dimensional datasets, due to redundant features and dimensionality, a
learning method takes quite a significant amount of time and the performance of
the model decreases.
● Hence, we use feature selection technique to select a subset of relevant and
non-redundant features.

FEATURE SELECTION
● Feature selection is used to select a subset of relevant and non-redundant features
from a large feature space.
● In many applications of machine learning and pattern recognition, feature selection
is used to select an optimal feature subset to train the learning model.
● The main objectives of feature selection are:
➢ to improve predictive accuracy
➢ to remove redundant features and
➢ to reduce time consumption during analysis.

FEATURE SELECTION VS
DIMENSIONALITY REDUCTION
➢ Feature selection is simply selecting and excluding
given features without transforming them.
➢ Dimensionality reduction transforms features into a lower
Dimension.

TYPES OF FEATURE SELECTION
TECHNIQUES
➢ Filter Methods
➢ Wrapper Methods
➢ Embedded Methods
➢ Hybrid Methods

1. Filter Methods
● Filter methods select a subset of features from a dataset without using any machine
learning algorithm.
● Examples being eliminating features with null values
● Filter-based feature selection methods are typically faster but the classifier accuracy
is not ensured.
● Selected features can be used in any machine learning algorithm
● They’re computationally inexpensive

2. Wrapper Methods
● Wrapper methods select a subset of features by evaluating it using a machine
learning algorithm that involves a search operation through the space of possible
feature subsets, evaluating each subset based on the performance of a given
algorithm.
● Wrapper methods can give high classification accuracy than filter method for
particular classifiers but they are less cost effective.
● They detect the interaction between variables
● They find the optimal feature subset for the desired machine learning algorithm
● Forward Selection, Backward Propagation, Stepwise Selection

3. Embedded Methods
● Performs feature selection during the process of training
● Specific to the applied learning algorithm.
● A learning algorithm takes advantage of its own variable selection process and
performs feature selection and classification/regression at the same time.
● They take into consideration the interaction of features like wrapper methods do.
● They are faster and more accurate than filter methods.
● They find the feature subset for the algorithm being trained.
● They are much less prone to overfitting.
● Examples : Lasso, Elastic Net

4. Hybrid Methods
● Combinations of all of the other feature selection methods - filter, wrapper and
embedded methods.
● Approach up to the engineer.
● Has high scope for research.
● High performance and accuracy.
● Better computational complexity than wrapper methods.
● Models that are more flexible and robust against high dimensional data.

ENSEMBLES
● For a given dataset, different feature selection algorithms may select different
subsets of features and hence the result obtained may have different accuracy. So
we use ensemble-based feature selection methods to select a stable feature
set.
● Ensembles are sets of learning machines that combine their decisions, or their
learning algorithms, or different views of data, or other specific characteristics to
achieve more reliable and accurate predictions in supervised and unsupervised
learning problems.

WHY ENSEMBLES ?
● It’s not that the best combination of learning algorithms outperforms the best
learning algorithm but a combination of learning algorithms will give more
accurate results on unseen data samples than a single learning algorithm.
● Ensembles enlarge the margins of large-margin classifiers like SVM in order to
classify data points accurately.
● Ensembles can reduce both bias and variance of the error.

WHY ENSEMBLES ?
● A rigorous mathematical treatment starting from the ”representativeness” of the
examples used in machine learning problems leads to the design of ensembles of
weak classifiers, whose accuracy is governed by the law of large numbers.
● Predictive performances of single models have been improved by the ensemble
methodology in several application fields, such as information security, astronomy
and astrophysics, geography and remote sensing, image retrieval, finance, medicine
etc.

TYPES OF ENSEMBLE METHODS
➢ Bayes Optimal Classifier
➢ Bootstrap Aggregating (Bagging)
➢ Boosting
➢ Bayesian Model Averaging
➢ Bayesian Model Combination
➢ Bucket of Models
➢ Stacking

1. Bayes Optimal Classifier
● Classification technique.
● Ensemble of all hypotheses in the hypothesis space.
● The naive Bayes optimal classifier is a version of this that assumes that the data is
conditionally independent of the class.
● Each hypothesis is given a vote proportional to the probability that the training
dataset would be sampled from a system if that hypothesis were true.
● Vote of each hypothesis is multiplied by the prior probability of that hypothesis.

2. Bootstrap Aggregating (Bagging)
● Each model in the ensemble vote has equal weight.
● Trains each model in the ensemble using a randomly drawn subset of the training
set to promote model variance.
● It is a general procedure that can be used to reduce the variance for those
algorithms that have high variance such as decision trees, like classification and
regression trees (CART).
● As an example, the random forest algorithm combines random decision trees with
bagging to achieve very high classification accuracy.

2. Bootstrap Aggregating (Bagging)
Algorithm :
Assuming a dataset with 1000 instances and applying CART algorithm on it
Bagging of the CART algorithm would work as follows :
➢ Create many (e.g. 100) random sub-samples of our dataset with
replacement.
➢ Train a CART model on each sample.
➢ Given a new dataset, calculate the average prediction from each model.
We consider the most frequently predicted class.

3. Boosting
● Incrementally builds an ensemble by training each new model instance to
emphasize the training instances that previous models mis-classified.
● More accurate than bagging, but also tends to over-fit the training data.
● Most common algorithm : Adaboost
● Most boosting algorithms consist of iteratively learning weak classifiers with
respect to a distribution and adding them to a final strong classifier.
● While adding, they are weighted in a way that is related to the weak learners'
accuracy.
● After a weak learner is added, the data weights are re-adjusted by re-weighting
which leads to misclassified input data gaining higher weight and correctly
classified data losing weight.
● When they are added, they are weighted in a way that is related to the

4. Bayesian Model Averaging
● An ensemble technique that seeks to approximate the Bayes optimal classifier by
sampling hypotheses from the hypothesis space, and combining them using Bayes'
law.
● Hypotheses are typically sampled using a Monte Carlo sampling technique
such as MCMC.
● Gibbs sampling may be used to draw hypotheses that are representative of the
distribution P(T|H).
● Under certain circumstances, when hypotheses are drawn in this manner and
averaged according to Bayes' law, this technique has an expected error that is bound
to be at most twice the expected error of the Bayes optimal classifier.

5. Bayesian Model Combination
● An algorithmic correction to Bayesian model averaging (BMA).
● Instead of sampling each model in the ensemble individually, it samples from the
space of possible ensembles. This helps in overcoming the tendency of BMA to
converge toward giving all of the weight to a single model.
● Yields better result but computationally expensive than BMA.
● When they are added, they are weighted in a way that is related to the
weak learners' accuracy.

6. Bucket of Models
● An ensemble technique in which a model selection algorithm is used to choose the
best model for each problem.
● When tested with only one problem, a bucket of models can produce no better
results than the best model in the set, but when evaluated across many problems, it
will typically produce much better results, on average, than any model in the set.
● Most common approach used for model-selection : cross-validation selection
● Gating is a generalization of Cross-Validation Selection. It involves training another
learning model (or often a perceptron) to decide which of the models in the bucket
is best-suited to solve the problem.

6. Bucket of Models
Pseudo-code :
For each model m in the bucket:
Do c times: (where 'c' is some constant)
Randomly divide the training dataset into two datasets: A, and
B.
Train m with A
Test m with B
Select the model that obtains the highest average score

7. Stacking
● Involves training a learning algorithm to combine the predictions of several other
learning algorithms.
● First, all of the other algorithms are trained using the available data, then a
combiner algorithm is trained to make a final prediction using all the predictions of
the other algorithms as additional inputs.
● In practice, a logistic regression model is often used as the combiner.
● Successfully used on both supervised learning tasks (regression, classification and
distance learning) and unsupervised learning (density estimation).
● It has also been used to estimate Bagging's error rate.
● Reportedly out-performs Bayesian model Averaging.

APPLICATIONS OF ENSEMBLES
➢ image classification
➢ fingerprint classification
➢ weather forecasting
➢ text categorization
➢ image segmentation
➢ visual tracking
➢ change detection in image analysis
➢ protein fold pattern recognition
➢ cancer classification
➢ pedestrian recognition or detection
➢ prediction of software quality
➢ face recognition

APPLICATIONS OF ENSEMBLES
➢ email filtering
➢ prediction of students’ performance
➢ medical image analysis
➢ churn prediction
➢ malware detection
➢ intrusion detection
➢ emotion detection
➢ sentiment analysis
➢ prediction of air quality
➢ land cover mapping
➢ intrusion detection.

CONCLUSION
● The extent to which the ensemble implementation outperforms the simple version
of a given algorithm is strongly dependent on the intrinsic stability of the algorithm
itself, with larger gains in robustness for the least stable methods.
● It is worth highlighting that even selection methods that are quite different to each
other tend to exhibit a similar performance, in terms of both accuracy and stability,
when used in their ensemble version.
● As a future line of research, it could be interesting to explore the full potential of
hybrid ensemble approaches, where diversity is injected both at the data level and at
the algorithmic level. This might open the way to the definition of more flexible
selection strategies which leverage multiple heuristics while reducing the degree of
dependence on the specific composition of the training data.

Ensemble hybrid learning technique

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ensemble hybrid learning technique

Similar to Ensemble hybrid learning technique (20)

Recently uploaded

Recently uploaded (20)

Ensemble hybrid learning technique