Optimal Model Complexity (1).pptx

Ways of Attaining Optimal Model
Complexity
Author:
Sudi Murindanyi
2022/HD05/5583X
Presented by - Sudi
Affiliation: Makerere University
Uganda

2
Important words
❑ Machine Learning(ML): Is the use and development of computer
systems that are able to learn and adapt without following explicit
instructions, by using algorithms and statistical models to analyse
and draw inferences from patterns in data.
❑ Model training: In ML, It is the process of feeding an ML algorithm
with data to help identify and learn good values for all attributes
involved.
❑ Model Complexity: In ML, Model Complexity usually refers to the
number of degrees of freedom in a learned model, often measured as
the number of adjustable weights or parameters in the architecture
doing the learning.
❑ Optimization: In ML, It is the process where we train the model
iteratively that results in a maximum and minimum function
evaluation.

3
Background
❑ ML has grown at a remarkable rate, attracting a great
number of researchers and practitioners. It has become
one of the most popular research directions and plays a
significant role in many fields, such as machine
translation, speech recognition, image recognition,
recommendation systems, etc.
❑ Optimization is one of the core components of machine-
learning. The essence of most machine learning
algorithms is to build an optimization model and learn the
parameters in the objective function from the given data.
❑ In order to promote the development of machine
learning, a series of effective optimal model complexity
was put forward, which have improved the performance
and efficiency of the machine-learning methods.

4
5 way of attaining optimal model complexity
1. Feature Engineering and Selection
2. Data Augmentation
3. Dimensionality Reduction
4. Active Learning
5. Ensemble Models

5
1. Feature Engineering and Selection
Feature engineering refers to a process of selecting and transforming
variables/features in your dataset when creating a predictive model using machine
learning while, Feature selection is the process where you automatically or manually
select the features that contribute the most to your prediction variable or output.
Feature Engineering techniques
❑ Handling missing Data(N/A, null, NaN, none, etc):
❑ Variable deletion: Involve dropping variables with missing values on a case-by-
basis
❑ Mean or medium imputation: Uses the mean & median of the non-missing
observation .(Numerical value)
❑ Most common value: replace the missing values with maximum occurred value in
a column/feature (categorical column/features)
❑ Handling continuous features (Difference range of values age, salary)
❑ Min-Max Normalization: It shrink all values in a fixed range between 0 and 1.
subtracts the minimum value in the feature and then diodes by its range.
❑ standardization: its ensures that each feature has a mean of 0 and a standard
deviation of 1.bringing all features to the some magnitude.

6
con’t
❑ Handling categorical features(Divide into groups eg:gender and
education )
❑ label encoding: simply converting each categorical value in a column
to a number.
❑ one-hot-encoding(dummy variable): help to replace aa categorical
variable with one or more features that can have the value 0 and 1.
Features selection techniques
❑ univariate selection: help select independent features that have the
strongest relationship with the target feature in your dataset.
❑ feature importance: give you a secure for each feature of your data.
the higher the scare, the important or relevant that feature is to your
target feature.
❑ correlation matrix heatmap: shows how the feature are related to
each other or the target features.

7
2. Data Augmentation
Data Augmentation is the process of modifying, or augmenting a dataset with
additional dataset.
Many model need a huge amount of data, so they benefit more form data
augmentation techniques.
Types of data augmentation
❑ Real data augmentation: Is is when you add real, additional data to a dataset.
adding a real dataset can be done in two ways, merge and fuzzy merge.
❑ synthetic data augmentation: This is when you add synthetic data, or fake data
that simply looks real.
Challenges in augmentation methods
The are a number of challenges that need to be solved in order to create effective data
augmentation methods to be solved in order to create effective data augmentation
methods, including scalability, heterogeneous datasets, relevance, data duplication,
and validation

8
3. Dimensionality Reduction
When dealing with high dimensional data, it is often useful to reduce the
dimensionality by projecting the data to a lower dimensional subspace
with captures the essence of the data. this called dimensionality
reduction.
Techniques for dimensionality reduction
❑ feature selection methods: uses scoring or statistical methods to
select which features to keep and which features to delete.
❑ matrix factorization: It uses linear algebra for example matrix
factorization methods to reduce dataset matrix into its constituent
parts, for dimensionality reduction eg: PCA
❑ manifold learning: They are used to create a low-dimensional
projection of high dimension data, often for the purpose of data
visualization(Multidimensional scaling).
❑ autoencoder methods:It involves framing a self supervised learning
problem where a model must reproduce the input correctly.

9
4. Active Learning
Active learning is the subset of machine learning in which a
learning algorithm can query a user interactively to label
data with the desired outputs. In active learning, the
algorithm proactively selects the subset of examples to be
labeled next from the pool of unlabeled data.
They are three categories of active learning:
❑ Stream-based selective sampling
❑ Pool-based sampling
❑ Membership query synthesis

10
5. Ensemble Models
Ensemble learning helps improve machine learning results by combining
several models. This approach allows the production of better predictive
performance compared to a single model. Basic idea is to learn a set of
classifiers (experts) and to allow them to vote.

11
Con’t
Methods for Independently Constructing Ensembles –
❏ Majority Vote
❏ Bagging and Random Forest
❏ Randomness Injection
❏ Feature-Selection Ensembles
❏ Error-Correcting Output Coding
Methods for Coordinated Construction of Ensembles –
❏ Boosting
❏ Stacking

12
References
1. S. Sun, Z. Cao, H. Zhu and J. Zhao, "A Survey of Optimization
Methods From a Machine Learning Perspective," in IEEE
Transactions on Cybernetics, vol. 50, no. 8, pp. 3668-3681, Aug.
2020, doi: 10.1109/TCYB.2019.2950779.
2. Machine Learning Tutorial – Feature Engineering and Feature
Selection For Beginners (freecodecamp.org)
3. Augmentation for Machine Learning (akkio.com)
4. Introduction to Dimensionality Reduction for Machine Learning -
MachineLearningMastery.com
5. Active learning machine learning: What it is and how it works -
DataRobot AI Cloud
6. Ensemble Classifier | Data Mining - GeeksforGeeks
7. Best Practices for Improving Your Machine Learning and Deep
Learning Models - neptune.ai
8. (17) Controlling Model Complexity in Machine Learning | LinkedIn

Optimal Model Complexity (1).pptx

Recommended

Recommended

More Related Content

Similar to Optimal Model Complexity (1).pptx

Similar to Optimal Model Complexity (1).pptx (20)

Recently uploaded

Recently uploaded (20)

Optimal Model Complexity (1).pptx