SlideShare a Scribd company logo
1 of 65
Topik 4
Konsep Transformasi Data, Ekstraksi Fitur,
dan Seleksi Fitur Dalam Machine Learning
Dr. Sunu Wibirama
Modul Kuliah Kecerdasan Buatan
Kode mata kuliah: UGMx 001001132012
June 13, 2022
June 13, 2022
1 Capaian Pembelajaran Mata Kuliah
Topik ini akan memenuhi CPMK 4, yakni mampu mendefinisikan konsep dasar trans-
formasi data dan seleksi fitur (feature selection) untuk machine learning.
Adapun indikator tercapainya CPMK tersebut adalah mampu memahami konsep data
preparation, data cleansing, dan feature selection serta teknik-teknik yang lazim digunakan
dalam machine learning.
2 Cakupan Materi
Cakupan materi dalam topik ini sebagai berikut:
a) Introduction to Data Preparation for Machine Learning: materi ini menjelaskan alasan-
alasan pentingnya melakukan persiapan awal sebelum menggunakan dataset dalam
machine learning. Pada materi ini juga dijelaskan langkah-langkah praktis untuk
mendapatkan data yang akan digunakan pada proses machine learning.
b) Overview of Data Preparation: materi ini menjelaskan teknik-teknik dasar yang akan
digunakan dalam mempersiapkan data, misalnya data cleaning, feature selection, data
transforms, feature engineering, dan dimensionality reduction.
c) Data Cleaning: materi ini menjelaskan konsep-konsep dasar data cleaning, yakni
mengidentifikasi dan mengoreksi kesalahan dalam data. Pada materi ini dijelaskan
konsep untuk mengidentifikasi kolom yang memiliki single value menggunakan pem-
rograman Python. Selain itu, materi ini juga menjelaskan cara-cara mengidentifikasi
outliers dalam data dengan menggunakan metode statistika seperti halnya standard
deviation atau interquartile range.
d) Feature Selection: materi ini menjelaskan teknik-teknik dasar pemilihan fitur. Hal
penting yang perlu diperhatikan dalam proses pemilihan fitur adalah melihat tipe data
pada masukan (input) dan luaran (output) algoritme machine learning. Pada materi
ini juga akan dijelaskan teknik Recursive Feature Elimination (RFE) dan Feature
Importance untuk memilih fitur pada proses machine learning.
e) Data Transforms: materi ini akan menjelaskan teknik-teknik dasar transformasi data,
diantaranya data normalization dan quantile transforms. Data normalization digu-
nakan untuk melakukan normalisasi data pada level individu atau elemen dataset.
Sementara itu, quantile transforms digunakan untuk mengubah distribusi data men-
jadi distribusi normal atau distribusi uniform.
f) Dimensionality Reduction: materi ini akan terbagi menjadi dua bagian, yakni penge-
nalan Principal Component Analysis (PCA) dan implementasi PCA. Pada bagian per-
tama, akan dijelaskan konsep dasar PCA, eigenvalues, dan eigenvector. Pada bagian
kedua, akan dijelaskan langkah-langkah praktis implementasi PCA dan aplikasinya
dengan pemrograman Python.
1
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Introduction to Data Preparation for Machine Learning
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Why data preparation
• Data preparation / data preprocessing: the act of transforming raw
data into a form that is appropriate for modeling.
• Data preparation is the most important part and the most difficult
process in machine learning project.
• Most time consuming, but it is the least discussed topic.
• The challenge of data preparation is that each dataset is unique
and different for each project:
• The number of variables (tens, hundreds, thousands, or more)
• The types of the variables (numeric, nominal, ordinal, ratio)
• The scale of the variables
• The drift in the values overtime
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
From raw data to insights
“…. the right features can only be defined in the context of both the model and the data; since data and
models are so diverse, it's difficult to generalize the practice of feature engineering across projects”
(Page vii, Feature Engineering for Machine Learning, 2018.)
Courtesy: Sanvendra Singh (2019)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Raw data can’t be used directly
• Machine learning algorithms require data
to be numbers.
• Some machine learning algorithms
impose requirements on the data.
• Statistical noise and errors in the data
may need to be corrected.
• Complex nonlinear relationships may be
teased out of the data.
Courtesy: Sanvendra Singh (2019)
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Standard tasks during data preparation
• Data cleaning: identifying and correcting
mistakes or errors in the data.
• Feature selection: identifying those input
variables that are most relevant to the task.
• Data transforms: changing the scale or
distribution of variables.
• Feature engineering: deriving new
variables from available data.
• Dimensionality reduction: creating
compact projections of the data. Courtesy: Akira Takezawa (2019)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Before preparing our data
• Gather data from the problem domain.
• Discuss the project with subject matter experts.
• Select those variables to be used as inputs and
outputs for a predictive model.
• Review the data that has been collected.
• Summarize the collected data using statistical
methods.
• Visualize the collected data using plots and
charts.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
7
End of File
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Overview of Data Preparation (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Standard tasks during data preparation
• Data cleaning: identifying and correcting
mistakes or errors in the data.
• Feature selection: identifying those input
variables that are most relevant to the task.
• Data transforms: changing the scale or
distribution of variables.
• Feature engineering: deriving new
variables from available data.
• Dimensionality reduction: creating
compact projections of the data. Courtesy: Akira Takezawa (2019)
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Data cleaning
• The most useful data cleaning involves
deep domain expertise and could involve
identifying and addressing specific
observations that may be incorrect.
• There are many reasons data may have
incorrect values, such as being mistyped,
corrupted, duplicated, and so on.
• Domain expertise may allow obviously
erroneous observations to be identified as
they are different from what is expected (a
person's height of 60 meters.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
General data cleaning operations
• Using statistics to define normal data
and identify outliers
• Identifying columns that have the same
value or no variance and removing
them
• Identifying duplicate rows of data and
removing them.
• Marking empty values as missing.
• Imputing missing values using statistics
or a learned model
Courtesy: Jason Brownlee (2020)
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Feature selection
• Feature selection refers to techniques for
selecting a subset of input features that are most
relevant to the target variable that is being
predicted.
• This is important as irrelevant and redundant
input variables can distract or mislead learning
algorithms possibly resulting in lower predictive
performance.
• Additionally, it is desirable to develop models
only using the data that is required to make a
prediction, e.g. to favor the simplest possible well
performing model.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Feature selection
• Feature selection techniques may generally
grouped into those that use the target variable
(supervised) and those that do not
(unsupervised).
• The supervised techniques can be further
divided into:
• models that automatically select features
as part of fitting the model (intrinsic)
• those that explicitly choose features that
result in the best performing model
(wrapper)
• those that score each input feature and
allow a subset to be selected (filter)
Courtesy: Jason Brownlee (2020)
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
7
End of File
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Overview of Data Preparation (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Data transforms
• Data transforms are used to change the type
or distribution of data variables.
• Numeric data type: number values.
• Integer: integers with no fractional part.
• Float: floating point values.
• Categorical data type: label values.
• Ordinal: labels with a rank ordering.
• Nominal: labels with no rank ordering.
• Boolean: values True and False.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Some techniques of data transforms
• Discretization transform: encode a numeric
variable as an ordinal variable
• Ordinal transform: encode a categorical
variable into an integer variable
• One hot transform: encode a categorical
variable into binary variables
• Normalization transform: scale a variable to
the range 0 and 1
• Standardization transform: scale a variable to
a standard Gaussian
• Power transform: change the distribution of a
variable to be more Gaussian
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Feature engineering
• Feature engineering refers to the
process of creating new input variables
from the available data.
• Engineering new features is highly
specific to your data and data types. As
such, it often requires the collaboration
of a subject matter expert to help identify
new features that could be constructed
from the data.
• This specialization makes it a
challenging topic to generalize to
general methods.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Some techniques of feature engineering
• There are some techniques that can be
used in feature engineering:
• Adding a Boolean flag variable for
some state.
• Adding a group or global summary
statistic, such as a mean.
• Adding new variables for each
component of a compound variable,
such as a date-time.
• Polynomial Transform: Create copies
of numerical input variables that are
raised to a power
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Dimensionality reduction
• The number of input features for a dataset may be
considered the dimensionality of the data.
• Two input variables together can define a two-
dimensional area where each row of data defines a
point in that space.
• The problem is, the more dimensions this space has
(e.g. the more input variables), the more likely it is
that the dataset represents a very sparse and likely
unrepresentative sampling of that space (curse of
dimensionality)
• An alternative to feature selection is to create a
projection of the data into a lower-dimensional space
that still preserves the most important properties of
the original data.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Some techniques of dimensionality reduction
• The most common approach to dimensionality reduction is
to use a matrix factorization technique:
• Principal Component Analysis (PCA).
• Singular Value Decomposition (SVD).
• Other approaches with model-based methods:
• linear discriminant analysis
• autoencoders.
• Sometimes manifold learning algorithms can also be used:
• Kohonen self-organizing maps (SOME)
• t-Distributed Stochastic Neighbor Embedding (t-SNE).
A d-dimensional manifold is a part of an n-dimensional space (d<n) that locally resembles a d-
dimensional hyperplane. Manifold Learning can be thought of as an attempt to generalize linear
frameworks like PCA to be sensitive to non-linear structure in data.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
8
End of File
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Data Cleaning (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Data cleaning in machine learning project
• Before jumping to the sophisticated methods,
there are some very basic data cleaning
operations that you probably should perform on
every single machine learning project.
• Although some techniques seem very basic,
they are so critical.
• If you skip this step, models may break or report
overly optimistic performance results.
• Our goal: identifying and correcting mistakes or
errors in the data.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Data cleaning
Our goal: identifying and correcting mistakes or errors in the data.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Our dataset
• For short demonstration, we will use a dataset from Kubat et al.
(1998).
• The paper describes an application of machine learning to an
important environmental problem: detection of oil spills from
radar images of the sea surface.
• The task involves predicting whether the patch contains an oil
spill or not, e.g. from the illegal or accidental dumping of oil in
the ocean, given a vector that describes the contents of a patch
of a satellite image.
• There are 937 cases. Each case is comprised of 48 numerical
computer vision derived features, a patch number, and a class
label.
• The normal case is no oil spill assigned the class label of 0,
whereas an oil spill is indicated by a class label of 1. There are
896 cases for no oil spill and 41 cases of an oil spill.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Identifying columns that contain a single value
• Columns that have a single observation or value are probably
useless for modelling.
• These columns or features or predictors are referred to zero-
variance predictors as if we measured the variance (average
value from the mean), it would be zero.
• A single value means that each row for that column has the same
value.
• Columns that have a single value for all rows do not contain any
information for modelling.
• Depending on the choice of data preparation and modelling
algorithms, variables with a single value can also cause errors or
unexpected results.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Python code
• We will use Python to demonstrate technical steps in
detecting single value columns
• You can detect rows that have this property using the
unique() NumPy function that will report the number
of unique values in each column.
• A simpler approach is to use the nunique() Pandas
function that does the hard work for you. Below is the
same example using the Pandas function.
• The example loads the oil-spill classification dataset
that contains 50 variables (48 numerical computer
vision derived features, a patch number, and a class
label).
• The the code summarizes the number of unique
values for each column.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
We will see that column
index 22 only has a
single value and should
be removed.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Deleting column with a single unique value
• Columns are relatively easy to remove from a NumPy
array or Pandas DataFrame.
• One approach is to record all columns that have a single
unique value, then delete them from the Pandas
DataFrame by calling the drop() function.
• Running the example first loads the dataset and reports
the number of rows and columns.
• The number of unique values for each column is
calculated, and those columns that have a single unique
value are identified. In this case, column index 22.
• The identified columns are then removed from the
DataFrame, and the number of rows and columns in the
DataFrame are reported to confirm the change.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
9
End of File
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Data Cleaning (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Outlier identification and removal
• When modelling, it is important to clean the data sample to
ensure that the observations best represent the problem.
• Sometimes a dataset can contain extreme values that are
outside the range of what is expected and unlike the other
data.
• These are called outliers and often machine learning
modelling and model skill in general can be improved by
understanding and even removing these outlier values.
• Outliers can have many causes, such as:
• Measurement or input error.
• Data corruption.
• True outlier observation.
(Courtesy: Ou Zhang)
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Detecting outliers
• There is no precise way to define and
identify outliers in general because of the
specifics of each dataset.
• Instead, you, or a domain expert, must
interpret the raw observations and
decide whether a value is an outlier or
not.
• We can use statistical methods to
identify observations that appear to be
rare or unlikely given the available data.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Dataset
• We will generate a population 10,000 random numbers drawn from a
Gaussian distribution with a mean of 50 and a standard deviation of 5.
• Numbers drawn from a Gaussian distribution will have outliers. That is, by
virtue of the distribution itself, there will be a few values that will be a long
way from the mean, rare values that we can identify as outliers.
• We will use the randn() function to generate random Gaussian values
with a mean of 0 and a standard deviation of 1, then multiply the results
by our own standard deviation and add the mean to shift the values into
the preferred range.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Dataset
Running the example generates the sample and then prints the mean and
standard deviation. As expected, the values are very close to the expected values.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Standard deviation of Gaussian distribution
• The Gaussian distribution has the property that the
standard deviation from the mean can be used to reliably
summarize the percentage of values in the sample.
• For example, within one standard deviation of the mean
will cover 68 percent of the data.
• So, if the mean is 50 and the standard deviation is 5, as in
the test dataset above, then all data in the sample
between 45 and 55 will account for about 68 percent of
the data sample.
• A value that falls outside of 3 standard deviations is part
of the distribution, but it is an unlikely or rare event at
approximately 1 in 370 samples.
• Three standard deviations from the mean is a common
cut-off in practice for identifying outliers in a Gaussian or
Gaussian-like distribution.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Removing outliers
• We can calculate the mean and standard
deviation of a given sample, then calculate
the cut-off for identifying outliers as more
than 3 standard deviations from the mean.
• We can then identify outliers as those
examples that fall outside of the defined
lower and upper limits.
• Running the example will first print the
number of identified outliers and then the
number of observations that are not
outliers, demonstrating how to identify and
filter out outliers respectively.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Interquartile Range method
• Not all data is normal or normal enough to treat it as being
drawn from a Gaussian distribution.
• A good statistic for summarizing a non-Gaussian
distribution sample of data is the Interquartile Range
(IQR).
• The IQR is calculated as the difference between the 75th
and the 25th percentiles of the data.
• We refer to the percentiles as quartiles (quart meaning 4)
because the data is divided into four groups via the 25th,
50th and 75th values.
• The IQR can be used to identify outliers by defining limits
on the sample values that are a factor k of the IQR below
the 25th percentile or above the 75th percentile.
• The common value for the factor k is the value 1.5. A
factor k of 3 or more can be used to identify values that
are extreme outliers
Courtesy: Perez and Tah
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
Interquartile Range method
• We can calculate the percentiles of a dataset
using the percentile() NumPy function that
takes the dataset and specification of the
desired percentile.
• The IQR can then be calculated as the
difference between the 75th and 25th
percentiles.
• We can then calculate the cutoff for outliers as
1.5 times the IQR and subtract this cut-off from
the 25th percentile and add it to the 75th
percentile to give the actual limits on the data.
• We can then use these limits to identify the
outlier values.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
10
End of File
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Feature Selection (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Feature selection
• Feature selection is the process of reducing the
number of input variables when developing a
predictive model.
• This step is important to reduce the computational
cost of modelling and, in many cases, to improve
the performance of the model.
• Statistical-based feature selection methods
involve evaluating the relationship between each
input variable and the target variable using
statistics
• Then, we select those input variables that have
the strongest relationship with the target variable.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Feature selection
• One way to think about feature selection methods are in
terms of supervised and unsupervised methods:
• Unsupervised selection: do not use the target
variable (e.g. remove redundant variables).
• Supervised selection: use the target variable (e.g.
remove irrelevant variables).
• Supervised feature selection methods may further be
classified into three groups:
• Intrinsic: algorithms that perform automatic feature
selection during training.
• Filter: select subsets of features based on their
relationship with the target.
• Wrapper: search subsets of features that perform
according to a predictive model.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Statistics for feature selection
• It is common to use correlation type statistical measures
between input and output variables as the basis for filter
feature selection.
• However, the choice of statistical measures is highly
dependent upon the variable data types.
• Common data types include numerical (such as height) and
categorical (such as a label).
• Input variable: variables used as input to a predictive model.
• Output variable: variables output or predicted by a model:
• Numerical output: regression predictive modelling
problem.
• Categorical output: classification predictive modelling
problem.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Numerical input
• Numerical input and numerical output:
This is a regression predictive modelling problem
with numerical input variables:
• Pearson's correlation coefficient (linear).
• Spearman's rank coefficient (nonlinear).
• Numerical input and categorical output:
This is a classification predictive modelling
problem with numerical input variables. This
might be the most common example of a
classification problem:
• ANOVA correlation coefficient (linear).
• Kendall's rank coefficient (nonlinear).
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Categorical input
• Categorical input, numerical output:
• This is a regression predictive
modelling problem with categorical
input variables.
• This is a strange example of a
regression problem (e.g. you would not
encounter it often).
• You can use the same numerical input,
categorical output methods (described
previously), but in reverse.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Categorical input
• Categorical input, categorical output:
• This is a classification predictive
modelling problem with categorical
input variables.
• The most common correlation
measure for categorical data is the chi-
squared test.
• You can also use mutual information
(information gain) from the field of
information theory.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
8
End of File
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Feature Selection (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Recursive Feature Elimination (RFE)
• RFE is a wrapper-type feature selection algorithm.
• This means that a different machine learning
algorithm is given and used in the core of the
method, is wrapped by RFE, and used to help
select features.
• RFE works by searching for a subset of features
by starting with all features in the training dataset
and successfully removing features until the
desired number remains.
• This is achieved by fitting the given machine
learning algorithm used in the core of the model,
ranking features by importance, discarding the
least important features, and re-fitting the model.
(Source: scikit-learn documentation)
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Recursive Feature Elimination (RFE)
Source: Ravishankar, et al. (2016)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
RFE with scikit-learn library
• Scikit-learn is a machine learning library with the
following features:
• simple and efficient tools for predictive data
analysis
• accessible to everybody, and reusable in
various contexts
• built on NumPy, SciPy, and matplotlib
• open source, commercially usable - BSD
license
• funded by several companies and universities
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
RFE with scikit-learn library
• To use it, first the class is configured with the
chosen algorithm specified via the estimator
argument and the number of features to select via
the n_features_to_select argument.
• RFE requires a nested algorithm that is used to
provide the feature importance scores, such as a
Decision Tree.
• The nested algorithm used in RFE does not have
to be the algorithm that is fit on the selected
features; different algorithms can be used.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Decision Tree
• Root node : is the first node in decision trees
• Splitting : is a process of dividing node into two or
more sub-nodes, starting from the root node
• Node : splitting results from the root node into sub-
nodes and splitting sub-nodes into further sub-
nodes
• Leaf or terminal node : end of a node, since node
cannot be split anymore
• Branch / Sub-Tree : A subsection of the entire tree
is called branch or sub-tree.
• Parent and Child Node: A node, which is divided
into sub-nodes is called parent node of sub-nodes
whereas sub-nodes are the child of parent node.
Source: https://medium.com/@arifromadhan19/the-basics-of-decision-trees-e5837cc2aba7
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
RFE with scikit-learn library
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
RFE with scikit-learn library
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
RFE with scikit-learn library
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
RFE with scikit-learn library
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11
RFE with scikit-learn library
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12
RFE with scikit-learn library
• A box and whisker plot is created for the
distribution of accuracy scores for each
configured number of features.
• We can see that performance improves
as the number of features increase
• The performance peaks around 4-to-7
features as we might expect, given that
only five features are relevant to the
target variable.
• We can see that using 7 features yields
most accurate accuracy score (0.885)
Accuracy
Number of features
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13
13
End of File
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Feature Selection (Part 03)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Feature importance
• Feature importance refers to techniques that
assign a score to input features based on how
useful they are at predicting a target variable.
• There are many types and sources of feature
importance scores.
• Popular examples:
• statistical correlation scores
• coefficients calculated as part of linear models
• decision trees
• permutation importance scores
Source: https://medium.com/analytics-vidhya/ranking-features-based-on-importance-predictive-power-with-respect-to-the-class-labels-of-the-25afaed71e90
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Why feature importance score is useful?
Better understanding the data:
• The relative scores can highlight which
features may be most relevant to the
target, and the converse, which features
are the least relevant.
• This may be interpreted by a domain
expert and could be used as the basis for
gathering more or different data.
Source: https://medium.com/analytics-vidhya/ranking-features-based-on-importance-predictive-power-with-respect-to-the-class-labels-of-the-25afaed71e90
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Why feature importance score is useful?
Better understanding the model:
• Most importance scores are calculated by
a predictive model that has been fit on the
dataset.
• Inspecting the importance score provides
insight into that specific model and which
features are the most important and least
important to the model when making a
prediction.
• This is a type of model interpretation that
can be performed for those models that
support it.
Source: https://medium.com/analytics-vidhya/ranking-features-based-on-importance-predictive-power-with-respect-to-the-class-labels-of-the-25afaed71e90
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Why feature importance score is useful?
Reducing the number of input features:
• This can be achieved by using the
importance scores to select those features
to delete (lowest scores) or those features
to keep (highest scores).
• This is a type of feature selection and can
simplify the problem that is being
modelled, speed up the modelling process
(deleting features is called dimensionality
reduction), and in some cases, improve
the performance of the model.
Source: https://medium.com/analytics-vidhya/ranking-features-based-on-importance-predictive-power-with-respect-to-the-class-labels-of-the-25afaed71e90
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Feature importance in Linear Regression
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Feature importance in Linear Regression
• The scores suggest that the model found
the five important features and marked
all other features with a zero coefficient,
essentially removing them from the
model.
• A bar chart is then created for the feature
importance scores.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Feature importance in Decision Tree
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
Feature importance in Decision Tree
• Running the example fits the model, then
reports the coefficient value for each
feature.
• The results suggest perhaps three of the 10
features as being important to prediction:
feature 4, 5, and 6.
• Note: result of each code execution may
vary given the stochastic nature of the
algorithm or evaluation procedure, or
differences in numerical precision.
• The code should be run a few times and we
can compare the average outcome.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
10
End of File
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Data Transforms (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
The scale of your data is important
• Machine learning models learn a mapping from
input variables to an output variable.
• Unfortunately, the scale and distribution of the
data drawn from the domain may be different for
each variable.
• Input variables may have different units (e.g.
feet, kilometers, and hours) that, in turn, may
mean the variables have different scales.
• Differences in the scales across input variables
may increase the difficulty of the problem being
modelled.
Courtesy: Lagabrielle et al. (2018)
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
What ML algorithms affected by scale of data?
• Algorithms that fit a model that use a weighted sum of input
variables: :
• linear regression
• logistic regression
• artificial neural networks (deep learning).
• Algorithms that use distance measures between examples:
• k-nearest neighbors
• support vector machines.
• It can also be a good idea to scale the target variable for
regression predictive modelling problems to make the problem
easier to learn, most notably in the case of neural network
models.
• A target variable with a large spread of values, in turn, may result
in large error gradient values causing weight values to change
dramatically, making the learning process unstable.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Data normalization
• Normalization is a rescaling of the data from the original range so that all
values are within the new range of 0 and 1.
• Normalization requires that you know or are able to accurately estimate the
minimum and maximum observable values.
• You may be able to estimate these values from your available data.
• A value is normalized as follows:
=
−
−
where the minimum and maximum values pertain to the value x being
normalized.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Data normalization
• For example, for a dataset, we could guesstimate the min and max observable values as
30 and -10. We can then normalize any value, like 18.8, as follows:
=
−
−
=
18.8 − (−10)
30 − (−1))
=
28.8
40
= 0.72
• You can see that if an value is provided that is outside the bounds of the minimum and
maximum values, the resulting value will not be in the range of 0 and 1.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Data normalization in scikit-learn library
• You can normalize your dataset using the scikit-learn object
MinMaxScaler. Good practice usage with the MinMaxScaler and
other scaling techniques is as follows:
• Fit the scaler using available training data: for normalization, this
means the training data will be used to estimate the minimum and
maximum observable values. This is done by calling the fit() function.
• Apply the scale to training data: this means you can use the
normalized data to train your model. This is done by calling the
transform() function.
• Apply the scale to data going forward: this means you can prepare
new data in the future on which you want to make predictions.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Data normalization in scikit-learn library
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Data normalization in scikit-learn library
• Running the example first reports the raw
dataset, showing 2 columns with 5 rows.
• The values are in scientific notation which can
be hard to read if you’re not used to it.
• Next, the scaler is defined, fit on the whole
dataset and then used to create a transformed
version of the dataset with each column
normalized independently.
• We can see that the largest raw value for each
column now has the value 1.0 and the smallest
value for each column now has the value 0.0.
Results of
scaling
Results of
normalization
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
9
End of File
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Data Transforms (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
What is a quantile?
• Wikipedia:
“In statistics and probability, quantiles are cut
points dividing the range of a probability
distribution into continuous intervals with equal
probabilities, or dividing the observations in a
sample in the same way”.
• 2 quantiles  median
• 4 quantiles  quartiles
• 100 quantiles  percentiles
Probability density of a normal distribution, with quartiles shown. The area below
the red curve is the same in the intervals (−∞,Q1), (Q1,Q2), (Q2,Q3), and (Q3,+∞).
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Quantiles on a Cumulative Distribution Function (CDF)
Percent (y-axis) of data that is at or below a given value on x-axis
Source: https://www.youtube.com/watch?v=ByjPLoxQAZk
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Quantiles on Iris dataset
Source: https://www.youtube.com/watch?v=ByjPLoxQAZk
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Quantiles on Iris dataset
Source: https://www.youtube.com/watch?v=ByjPLoxQAZk
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
Quantiles on Iris dataset
Source: https://www.youtube.com/watch?v=ByjPLoxQAZk
Source: https://www.youtube.com/watch?v=ByjPLoxQAZk
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Quantile function
Quantile function is an inverse of cumulative distribution function (CDF)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Non-standard data distribution
• Numerical input variables may have a
highly skewed or non-standard
distribution.
• This could be caused by outliers in the
data, multi-modal distributions, highly
exponential distributions, and so on.
• Many machine learning algorithms
prefer or perform better when numerical
input variables.
• Even output variables in the case of
regression have a standard probability
distribution, such as a Gaussian
(normal) or a uniform distribution
Source: https://www.biologyforlife.com/skew.html
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
Quantile transforms
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
Why quantile transforms?
• Improving accuracy of machine learning model
• Perform monotonic transformation of features  preserve rank of
values
• Robust  less susceptible to outliers
• Disadvantage: distorts correlation and distances within and across
features.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11
Quantile transforms in scikit-learn
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12
Results of quantile transforms
• Running the example first creates a sample of
1,000 random Gaussian values and adds a skew
to the dataset.
• A histogram is created from the skewed dataset
and clearly shows the distribution pushed to the
far left.
• Then a QuantileTransformer is used to map the
data to a Gaussian distribution and standardize
the result, centering the values on the mean value
of 0 and a standard deviation of 1.0.
• A histogram of the transform data is created
showing a Gaussian shaped data distribution.
07/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13
13
End of File
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14
The scale of your data is important
• Machine learning models learn a mapping from
input variables to an output variable.
• Unfortunately, the scale and distribution of the
data drawn from the domain may be different for
each variable.
• Input variables may have different units (e.g.
feet, kilometers, and hours) that, in turn, may
mean the variables have different scales.
• Differences in the scales across input variables
may increase the difficulty of the problem being
modelled.
Courtesy: Lagabrielle et al. (2018)
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Introduction to PCA (Part 01)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Eigendecomposition and Principal Component Analysis (PCA)
• Principal Component Analysis
(PCA) is used for dimensionality
reduction
• Example:
• We want to reduce data from
2D space to 1D.
• However, we don’t want to
loose important information
from our features.
• We transform the data to be
aligned with the most important
direction (red) and remove less
important direction (green).
Source: https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Why using Principal Component Analysis?
• You find that features in your data are highly
correlated each other  multicollinearity
• Ideal case: features should be independent
from each other.
• Solving this issue is super important,
because working with highly correlated
features in multivariate regression models
can lead to inaccurate results.
• To understand PCA, we have to learn about
eigenvalues and eigenvectors
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Eigenvectors & Eigenvalues
• Eigen in German means distinctive, characteristic, particular
to person or place.
• Eigenvalue and eigenvector are important for matrix
decomposition aimed at reducing dimensionality without losing
much information and reducing computational cost of matrix
processing.
• An eigenvector of the matrix is a vector that is contracted or
elongated when transformed by the matrix
• The eigenvalue is the scaling factor by which the vector is
contracted or elongated:
(a) If the scaling factor is positive, the directions of the initial
and the transformed vectors are the same:
(b) If the scaling factor is negative, their directions are reversed. Matrix acts by stretching the vector , not changing its
direction, so is an eigenvector of .
x
x
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Which one is the eigen vector, red or blue?
Answer: blue
Eigenvectors & Eigenvalues: step-by-step
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
• Let be an eigenvector of the matrix . Then there must
exist an eigenvalue such that = or, equivalently,
− = 0 or
( − ) = 0
• If we define a new matrix = − , then
=
• If has an inverse, then = = . But an
eigenvector cannot be zero.
• Thus, it follows that will be an eigenvector of if and
only if does not have an inverse, or equivalently
det( )=0, or
det( − )=0
• This is called the characteristic equation of .
Its roots determine the eigenvalues of .
Mathematical approach (1)
If the determinant equals to zero,
the transformation is scaled into a line
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
Mathematical approach (2)
• Suppose we have matrix =
2 1
1 2
• Applying eigenvectors and eigenvalues equation = :
= =
2 1
1 2
=
• To solve , and , we rearrange this equation:
2 + =
+ 2 =
which we can further rearrange as:
(2 − ) + = 0
+ (2− ) = 0
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
Mathematical approach (3)
• Since det( − )=0, we find the determinant accordingly:
Thus = 3 and = 1. Using these values, we can get and . We can find eigenvectors
which correspond to these eigenvalues by plugging back in to the equations above and
solving for and . To find an eigenvector corresponding to = 3, start with
Finding determinant of
2-dimensional matrix
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
• There are an infinite number of values for which satisfy this equation.
The only restriction is that not all the components in an eigenvector can
equal zero.
• So if = 1, then = 1 and an eigenvector corresponding to = 3 is [1, 1].
• Finding an eigenvector for = 1 works the same way.
• So an eigenvector for = 1 is [1, −1].
Mathematical approach (4)
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
10
End of File
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1
Sunu Wibirama
sunu@ugm.ac.id
Department of Electrical and Information Engineering
Faculty of Engineering
Universitas Gadjah Mada
INDONESIA
Introduction to PCA (Part 02)
Kecerdasan Buatan | Artificial Intelligence
Version: January 2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2
Variance and covariance
• Variance : It is a measure of the
variability or it simply measures how
spread the data set is. Mathematically, it
is the average squared deviation from
the mean score.
• Covariance : It is a measure of the extent
to which corresponding elements from two
sets of ordered data move in the same
direction.
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3
Variance and covariance
• Positive covariance means and are positively related i.e. as increases
also increases. Negative covariance depicts the exact opposite relation.
However zero covariance means and are not related.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4
Step-by-step PCA
The whole process of mathematics in PCA can be divided into 5 parts:
1. Standardizing the data
2. Calculate the co-variance matrix
3. Calculating the eigenvectors and eigenvalues
4. Computing the principal components
5. Reducing the dimension of the datasets
Source: https://medium.com/analytics-vidhya/principal-component-analysis-pca-558969e63613
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5
Bui H-B, Nguyen H, Choi Y, Bui X-N, Nguyen-Thoi T, Zandi Y. A Novel Artificial Intelligence
Technique to Estimate the Gross Calorific Value of Coal Based on Meta-Heuristic and
Support Vector Regression Algorithms. Applied Sciences. 2019; 9(22):4868.
https://doi.org/10.3390/app9224868
Reducing the dimension of the datasets
Z*
1. Find the mean vector 2. Standardizing data: subtracting mean
and dividing by standard deviation 3. Compute covariance matrix:
=
4. Compute eigenvalues and
eigenvectors of , then
decompose into
: matrix of eigenvectors
is the diagonal matrix with
eigenvalues on the diagonal and
values of zero everywhere else
5. Sort eigenvectors from
highest eigenvalues
6. Project original data to
eigenvectors
7. Obtain projected points in low
dimensions, choose most
important principal components
P* ZP*
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6
1. Standardizing the data
• Standardizing is the process of scaling the data in such a way that all the
variables and their values lie within a similar range. The formula for
standardization is shown below:
• where:
• : observation or sample
• : mean
• : standard deviation
• Save the standardized data in a matrix .
Source: https://medium.com/analytics-vidhya/principal-component-analysis-pca-558969e63613
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7
2. Calculate the covariance matrix
• Take the matrix , transpose it, and multiply the transposed matrix by matrix :
=
• The result is a co-variance matrix: expressing the correlation between the different
variables in the data set.
• It is essential to identify highly dependent variables because they contain biased and
redundant information which can hamper the overall performance of the model.
• If our dataset has more than 2 dimensions then it can have more than one covariance
measurement. For example, if we have a dataset with 3 dimensions x, y and z, then the
covariance matrix of this dataset will look like this
Source: https://medium.com/analytics-vidhya/principal-component-analysis-pca-558969e63613
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8
3. Calculate eigenvectors and eigenvalues
• Next, calculate eigenvectors and eigenvalues
from the covariance matrix.
• the eigendecomposition of = is where
we decompose into
• where:
• : matrix of eigenvectors
• is the diagonal matrix with eigenvalues
on the diagonal and values of zero
everywhere else.
(Deisenroth; Faisal; Ong, 2020)
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9
3. Calculate eigenvectors and eigenvalues
• Next, calculate eigenvectors and eigenvalues
from the covariance matrix.
• the eigendecomposition of = is where
we decompose into ,
• where:
• : matrix of eigenvectors
• is the diagonal matrix with eigenvalues
on the diagonal and values of zero
everywhere else.
(Deisenroth; Faisal; Ong, 2020)
Note: p1 and p2 are orthogonal because their dot product equals to zero. Why?
Because A is a symmetric matrix. Eigenvectors of a symmetric matrix will always be
orthogonal. What if these vectors are not orthogonal? Use The Gram–Schmidt
orthogonalization to construct orthogonal or orthonormal vectors.
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10
4. Computing the principal components
• Take the eigenvalues , ,…, and sort
them from largest to smallest.
• Then, sort the eigenvectors in accordingly (if
is the largest eigenvalue, then take the 2nd
column of and place it in the 1st column
position).
• The eigenvector with the highest eigenvalue is
the most significant and therefore forms the
1st principal component (PC 1).
• Call this sorted matrix of eigenvectors *
(the columns of * should be the same as the
columns of , but can be in a different order.)
Source: https://medium.com/analytics-vidhya/principal-component-analysis-pca-558969e63613
• PC 1 is the most significant and stores the
maximum possible information
• PC 2 is the second most significant PC and stores
remaining maximum information
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11
5. Reducing the dimension of the datasets
• Re-arrange the original dataset on the final
principal components which represent the
maximum and most significant information of the
dataset.
• Calculate ∗
= *
• This new matrix, ∗
, is a centered/standardized
version of original data.
• However, each observation in ∗
is a combination
of the original variables, where the weights are
determined by the eigenvector.
• Because our eigenvectors in * are independent
of one another, each column of ∗
is also
independent of one another.
Z*
The left graph is our original data, the right graph
would be our transformed data ∗
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12
5. Reducing the dimension of the datasets
• Finally, we need to determine how many principal component (PC) to
keep versus how many to drop. Normally we keep the most important
PC and drop the less PC.
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13
PCA in Python
1. Data preparation—standardizing the data
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14
PCA in Python
2. Calculate covariance matrix, eigenvalues, eigenvectors,
and computing the principle components
08/06/2022
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 15
PCA in Python
sunu@ugm.ac.id
Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 16
16
End of File

More Related Content

Similar to Modul Topik 4 - Kecerdasan Buatan.pdf

Ijsred v2 i5p95
Ijsred v2 i5p95Ijsred v2 i5p95
Ijsred v2 i5p95IJSRED
 
IRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET Journal
 
Proposing an Interactive Audit Pipeline for Visual Privacy Research
Proposing an Interactive Audit Pipeline for Visual Privacy ResearchProposing an Interactive Audit Pipeline for Visual Privacy Research
Proposing an Interactive Audit Pipeline for Visual Privacy ResearchChristan Grant
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nybergdiannepatricia
 
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNINGCUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNINGIRJET Journal
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET Journal
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksHima Patel
 
IRJET- Fast Phrase Search for Encrypted Cloud Storage
IRJET- Fast Phrase Search for Encrypted Cloud StorageIRJET- Fast Phrase Search for Encrypted Cloud Storage
IRJET- Fast Phrase Search for Encrypted Cloud StorageIRJET Journal
 
Ignou MCA mini project report
Ignou MCA mini project reportIgnou MCA mini project report
Ignou MCA mini project reportHitesh Jangid
 
Modul Topik 7 - Kecerdasan Buatan
Modul Topik 7 - Kecerdasan BuatanModul Topik 7 - Kecerdasan Buatan
Modul Topik 7 - Kecerdasan BuatanSunu Wibirama
 
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdfe3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdfSILVIUSyt
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersIRJET Journal
 
A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)Editor IJCATR
 
Modul Topik 1 - Kecerdasan Buatan
Modul Topik 1 - Kecerdasan BuatanModul Topik 1 - Kecerdasan Buatan
Modul Topik 1 - Kecerdasan BuatanSunu Wibirama
 
Schooladmissionprocessmanagement 140227084915-phpapp01
Schooladmissionprocessmanagement 140227084915-phpapp01Schooladmissionprocessmanagement 140227084915-phpapp01
Schooladmissionprocessmanagement 140227084915-phpapp01Aarambhi Manke
 
Studentinformationmanagementsystem.pdf iyr
Studentinformationmanagementsystem.pdf iyrStudentinformationmanagementsystem.pdf iyr
Studentinformationmanagementsystem.pdf iyr053VENKADESHKUMARVK
 
Clone of an organization
Clone of an organizationClone of an organization
Clone of an organizationIRJET Journal
 
Monitoring Students Using Different Recognition Techniques for Surveilliance ...
Monitoring Students Using Different Recognition Techniques for Surveilliance ...Monitoring Students Using Different Recognition Techniques for Surveilliance ...
Monitoring Students Using Different Recognition Techniques for Surveilliance ...IRJET Journal
 

Similar to Modul Topik 4 - Kecerdasan Buatan.pdf (20)

Ijsred v2 i5p95
Ijsred v2 i5p95Ijsred v2 i5p95
Ijsred v2 i5p95
 
IRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering Technique
 
Proposing an Interactive Audit Pipeline for Visual Privacy Research
Proposing an Interactive Audit Pipeline for Visual Privacy ResearchProposing an Interactive Audit Pipeline for Visual Privacy Research
Proposing an Interactive Audit Pipeline for Visual Privacy Research
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
 
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNINGCUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning Tasks
 
IRJET- Fast Phrase Search for Encrypted Cloud Storage
IRJET- Fast Phrase Search for Encrypted Cloud StorageIRJET- Fast Phrase Search for Encrypted Cloud Storage
IRJET- Fast Phrase Search for Encrypted Cloud Storage
 
Ignou MCA mini project report
Ignou MCA mini project reportIgnou MCA mini project report
Ignou MCA mini project report
 
Modul Topik 7 - Kecerdasan Buatan
Modul Topik 7 - Kecerdasan BuatanModul Topik 7 - Kecerdasan Buatan
Modul Topik 7 - Kecerdasan Buatan
 
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdfe3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)
 
Modul Topik 1 - Kecerdasan Buatan
Modul Topik 1 - Kecerdasan BuatanModul Topik 1 - Kecerdasan Buatan
Modul Topik 1 - Kecerdasan Buatan
 
CSEIT- ALL.pptx
CSEIT- ALL.pptxCSEIT- ALL.pptx
CSEIT- ALL.pptx
 
Schooladmissionprocessmanagement 140227084915-phpapp01
Schooladmissionprocessmanagement 140227084915-phpapp01Schooladmissionprocessmanagement 140227084915-phpapp01
Schooladmissionprocessmanagement 140227084915-phpapp01
 
Studentinformationmanagementsystem.pdf iyr
Studentinformationmanagementsystem.pdf iyrStudentinformationmanagementsystem.pdf iyr
Studentinformationmanagementsystem.pdf iyr
 
Clone of an organization
Clone of an organizationClone of an organization
Clone of an organization
 
Monitoring Students Using Different Recognition Techniques for Surveilliance ...
Monitoring Students Using Different Recognition Techniques for Surveilliance ...Monitoring Students Using Different Recognition Techniques for Surveilliance ...
Monitoring Students Using Different Recognition Techniques for Surveilliance ...
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 

More from Sunu Wibirama

Modul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan BuatanModul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan BuatanModul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 6 - Kecerdasan Buatan.pdf
Modul Topik 6 - Kecerdasan Buatan.pdfModul Topik 6 - Kecerdasan Buatan.pdf
Modul Topik 6 - Kecerdasan Buatan.pdfSunu Wibirama
 
Modul Topik 5 - Kecerdasan Buatan
Modul Topik 5 - Kecerdasan BuatanModul Topik 5 - Kecerdasan Buatan
Modul Topik 5 - Kecerdasan BuatanSunu Wibirama
 
Modul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan BuatanModul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan BuatanSunu Wibirama
 
Pengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdfPengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdfSunu Wibirama
 
Introduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan BuatanIntroduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan BuatanSunu Wibirama
 
Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)Sunu Wibirama
 

More from Sunu Wibirama (8)

Modul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan BuatanModul Topik 9 - Kecerdasan Buatan
Modul Topik 9 - Kecerdasan Buatan
 
Modul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan BuatanModul Topik 8 - Kecerdasan Buatan
Modul Topik 8 - Kecerdasan Buatan
 
Modul Topik 6 - Kecerdasan Buatan.pdf
Modul Topik 6 - Kecerdasan Buatan.pdfModul Topik 6 - Kecerdasan Buatan.pdf
Modul Topik 6 - Kecerdasan Buatan.pdf
 
Modul Topik 5 - Kecerdasan Buatan
Modul Topik 5 - Kecerdasan BuatanModul Topik 5 - Kecerdasan Buatan
Modul Topik 5 - Kecerdasan Buatan
 
Modul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan BuatanModul Topik 3 - Kecerdasan Buatan
Modul Topik 3 - Kecerdasan Buatan
 
Pengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdfPengantar Mata Kuliah Kecerdasan Buatan.pdf
Pengantar Mata Kuliah Kecerdasan Buatan.pdf
 
Introduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan BuatanIntroduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
Introduction to Artificial Intelligence - Pengenalan Kecerdasan Buatan
 
Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)Mengenal Eye Tracking (Introduction to Eye Tracking Research)
Mengenal Eye Tracking (Introduction to Eye Tracking Research)
 

Recently uploaded

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 

Modul Topik 4 - Kecerdasan Buatan.pdf

  • 1. Topik 4 Konsep Transformasi Data, Ekstraksi Fitur, dan Seleksi Fitur Dalam Machine Learning Dr. Sunu Wibirama Modul Kuliah Kecerdasan Buatan Kode mata kuliah: UGMx 001001132012 June 13, 2022
  • 2. June 13, 2022 1 Capaian Pembelajaran Mata Kuliah Topik ini akan memenuhi CPMK 4, yakni mampu mendefinisikan konsep dasar trans- formasi data dan seleksi fitur (feature selection) untuk machine learning. Adapun indikator tercapainya CPMK tersebut adalah mampu memahami konsep data preparation, data cleansing, dan feature selection serta teknik-teknik yang lazim digunakan dalam machine learning. 2 Cakupan Materi Cakupan materi dalam topik ini sebagai berikut: a) Introduction to Data Preparation for Machine Learning: materi ini menjelaskan alasan- alasan pentingnya melakukan persiapan awal sebelum menggunakan dataset dalam machine learning. Pada materi ini juga dijelaskan langkah-langkah praktis untuk mendapatkan data yang akan digunakan pada proses machine learning. b) Overview of Data Preparation: materi ini menjelaskan teknik-teknik dasar yang akan digunakan dalam mempersiapkan data, misalnya data cleaning, feature selection, data transforms, feature engineering, dan dimensionality reduction. c) Data Cleaning: materi ini menjelaskan konsep-konsep dasar data cleaning, yakni mengidentifikasi dan mengoreksi kesalahan dalam data. Pada materi ini dijelaskan konsep untuk mengidentifikasi kolom yang memiliki single value menggunakan pem- rograman Python. Selain itu, materi ini juga menjelaskan cara-cara mengidentifikasi outliers dalam data dengan menggunakan metode statistika seperti halnya standard deviation atau interquartile range. d) Feature Selection: materi ini menjelaskan teknik-teknik dasar pemilihan fitur. Hal penting yang perlu diperhatikan dalam proses pemilihan fitur adalah melihat tipe data pada masukan (input) dan luaran (output) algoritme machine learning. Pada materi ini juga akan dijelaskan teknik Recursive Feature Elimination (RFE) dan Feature Importance untuk memilih fitur pada proses machine learning. e) Data Transforms: materi ini akan menjelaskan teknik-teknik dasar transformasi data, diantaranya data normalization dan quantile transforms. Data normalization digu- nakan untuk melakukan normalisasi data pada level individu atau elemen dataset. Sementara itu, quantile transforms digunakan untuk mengubah distribusi data men- jadi distribusi normal atau distribusi uniform. f) Dimensionality Reduction: materi ini akan terbagi menjadi dua bagian, yakni penge- nalan Principal Component Analysis (PCA) dan implementasi PCA. Pada bagian per- tama, akan dijelaskan konsep dasar PCA, eigenvalues, dan eigenvector. Pada bagian kedua, akan dijelaskan langkah-langkah praktis implementasi PCA dan aplikasinya dengan pemrograman Python. 1
  • 3. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Introduction to Data Preparation for Machine Learning Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Why data preparation • Data preparation / data preprocessing: the act of transforming raw data into a form that is appropriate for modeling. • Data preparation is the most important part and the most difficult process in machine learning project. • Most time consuming, but it is the least discussed topic. • The challenge of data preparation is that each dataset is unique and different for each project: • The number of variables (tens, hundreds, thousands, or more) • The types of the variables (numeric, nominal, ordinal, ratio) • The scale of the variables • The drift in the values overtime
  • 4. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 From raw data to insights “…. the right features can only be defined in the context of both the model and the data; since data and models are so diverse, it's difficult to generalize the practice of feature engineering across projects” (Page vii, Feature Engineering for Machine Learning, 2018.) Courtesy: Sanvendra Singh (2019) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Raw data can’t be used directly • Machine learning algorithms require data to be numbers. • Some machine learning algorithms impose requirements on the data. • Statistical noise and errors in the data may need to be corrected. • Complex nonlinear relationships may be teased out of the data. Courtesy: Sanvendra Singh (2019)
  • 5. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Standard tasks during data preparation • Data cleaning: identifying and correcting mistakes or errors in the data. • Feature selection: identifying those input variables that are most relevant to the task. • Data transforms: changing the scale or distribution of variables. • Feature engineering: deriving new variables from available data. • Dimensionality reduction: creating compact projections of the data. Courtesy: Akira Takezawa (2019) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Before preparing our data • Gather data from the problem domain. • Discuss the project with subject matter experts. • Select those variables to be used as inputs and outputs for a predictive model. • Review the data that has been collected. • Summarize the collected data using statistical methods. • Visualize the collected data using plots and charts.
  • 6. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 7 End of File
  • 7. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Overview of Data Preparation (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Standard tasks during data preparation • Data cleaning: identifying and correcting mistakes or errors in the data. • Feature selection: identifying those input variables that are most relevant to the task. • Data transforms: changing the scale or distribution of variables. • Feature engineering: deriving new variables from available data. • Dimensionality reduction: creating compact projections of the data. Courtesy: Akira Takezawa (2019)
  • 8. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Data cleaning • The most useful data cleaning involves deep domain expertise and could involve identifying and addressing specific observations that may be incorrect. • There are many reasons data may have incorrect values, such as being mistyped, corrupted, duplicated, and so on. • Domain expertise may allow obviously erroneous observations to be identified as they are different from what is expected (a person's height of 60 meters. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 General data cleaning operations • Using statistics to define normal data and identify outliers • Identifying columns that have the same value or no variance and removing them • Identifying duplicate rows of data and removing them. • Marking empty values as missing. • Imputing missing values using statistics or a learned model Courtesy: Jason Brownlee (2020)
  • 9. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Feature selection • Feature selection refers to techniques for selecting a subset of input features that are most relevant to the target variable that is being predicted. • This is important as irrelevant and redundant input variables can distract or mislead learning algorithms possibly resulting in lower predictive performance. • Additionally, it is desirable to develop models only using the data that is required to make a prediction, e.g. to favor the simplest possible well performing model. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Feature selection • Feature selection techniques may generally grouped into those that use the target variable (supervised) and those that do not (unsupervised). • The supervised techniques can be further divided into: • models that automatically select features as part of fitting the model (intrinsic) • those that explicitly choose features that result in the best performing model (wrapper) • those that score each input feature and allow a subset to be selected (filter) Courtesy: Jason Brownlee (2020)
  • 10. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 7 End of File
  • 11. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Overview of Data Preparation (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Data transforms • Data transforms are used to change the type or distribution of data variables. • Numeric data type: number values. • Integer: integers with no fractional part. • Float: floating point values. • Categorical data type: label values. • Ordinal: labels with a rank ordering. • Nominal: labels with no rank ordering. • Boolean: values True and False.
  • 12. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Some techniques of data transforms • Discretization transform: encode a numeric variable as an ordinal variable • Ordinal transform: encode a categorical variable into an integer variable • One hot transform: encode a categorical variable into binary variables • Normalization transform: scale a variable to the range 0 and 1 • Standardization transform: scale a variable to a standard Gaussian • Power transform: change the distribution of a variable to be more Gaussian sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Feature engineering • Feature engineering refers to the process of creating new input variables from the available data. • Engineering new features is highly specific to your data and data types. As such, it often requires the collaboration of a subject matter expert to help identify new features that could be constructed from the data. • This specialization makes it a challenging topic to generalize to general methods.
  • 13. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Some techniques of feature engineering • There are some techniques that can be used in feature engineering: • Adding a Boolean flag variable for some state. • Adding a group or global summary statistic, such as a mean. • Adding new variables for each component of a compound variable, such as a date-time. • Polynomial Transform: Create copies of numerical input variables that are raised to a power sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Dimensionality reduction • The number of input features for a dataset may be considered the dimensionality of the data. • Two input variables together can define a two- dimensional area where each row of data defines a point in that space. • The problem is, the more dimensions this space has (e.g. the more input variables), the more likely it is that the dataset represents a very sparse and likely unrepresentative sampling of that space (curse of dimensionality) • An alternative to feature selection is to create a projection of the data into a lower-dimensional space that still preserves the most important properties of the original data.
  • 14. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Some techniques of dimensionality reduction • The most common approach to dimensionality reduction is to use a matrix factorization technique: • Principal Component Analysis (PCA). • Singular Value Decomposition (SVD). • Other approaches with model-based methods: • linear discriminant analysis • autoencoders. • Sometimes manifold learning algorithms can also be used: • Kohonen self-organizing maps (SOME) • t-Distributed Stochastic Neighbor Embedding (t-SNE). A d-dimensional manifold is a part of an n-dimensional space (d<n) that locally resembles a d- dimensional hyperplane. Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 8 End of File
  • 15. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Data Cleaning (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Data cleaning in machine learning project • Before jumping to the sophisticated methods, there are some very basic data cleaning operations that you probably should perform on every single machine learning project. • Although some techniques seem very basic, they are so critical. • If you skip this step, models may break or report overly optimistic performance results. • Our goal: identifying and correcting mistakes or errors in the data.
  • 16. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Data cleaning Our goal: identifying and correcting mistakes or errors in the data. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Our dataset • For short demonstration, we will use a dataset from Kubat et al. (1998). • The paper describes an application of machine learning to an important environmental problem: detection of oil spills from radar images of the sea surface. • The task involves predicting whether the patch contains an oil spill or not, e.g. from the illegal or accidental dumping of oil in the ocean, given a vector that describes the contents of a patch of a satellite image. • There are 937 cases. Each case is comprised of 48 numerical computer vision derived features, a patch number, and a class label. • The normal case is no oil spill assigned the class label of 0, whereas an oil spill is indicated by a class label of 1. There are 896 cases for no oil spill and 41 cases of an oil spill.
  • 17. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Identifying columns that contain a single value • Columns that have a single observation or value are probably useless for modelling. • These columns or features or predictors are referred to zero- variance predictors as if we measured the variance (average value from the mean), it would be zero. • A single value means that each row for that column has the same value. • Columns that have a single value for all rows do not contain any information for modelling. • Depending on the choice of data preparation and modelling algorithms, variables with a single value can also cause errors or unexpected results. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Python code • We will use Python to demonstrate technical steps in detecting single value columns • You can detect rows that have this property using the unique() NumPy function that will report the number of unique values in each column. • A simpler approach is to use the nunique() Pandas function that does the hard work for you. Below is the same example using the Pandas function. • The example loads the oil-spill classification dataset that contains 50 variables (48 numerical computer vision derived features, a patch number, and a class label). • The the code summarizes the number of unique values for each column.
  • 18. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 We will see that column index 22 only has a single value and should be removed. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Deleting column with a single unique value • Columns are relatively easy to remove from a NumPy array or Pandas DataFrame. • One approach is to record all columns that have a single unique value, then delete them from the Pandas DataFrame by calling the drop() function. • Running the example first loads the dataset and reports the number of rows and columns. • The number of unique values for each column is calculated, and those columns that have a single unique value are identified. In this case, column index 22. • The identified columns are then removed from the DataFrame, and the number of rows and columns in the DataFrame are reported to confirm the change.
  • 19. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 9 End of File
  • 20. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Data Cleaning (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Outlier identification and removal • When modelling, it is important to clean the data sample to ensure that the observations best represent the problem. • Sometimes a dataset can contain extreme values that are outside the range of what is expected and unlike the other data. • These are called outliers and often machine learning modelling and model skill in general can be improved by understanding and even removing these outlier values. • Outliers can have many causes, such as: • Measurement or input error. • Data corruption. • True outlier observation. (Courtesy: Ou Zhang)
  • 21. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Detecting outliers • There is no precise way to define and identify outliers in general because of the specifics of each dataset. • Instead, you, or a domain expert, must interpret the raw observations and decide whether a value is an outlier or not. • We can use statistical methods to identify observations that appear to be rare or unlikely given the available data. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Dataset • We will generate a population 10,000 random numbers drawn from a Gaussian distribution with a mean of 50 and a standard deviation of 5. • Numbers drawn from a Gaussian distribution will have outliers. That is, by virtue of the distribution itself, there will be a few values that will be a long way from the mean, rare values that we can identify as outliers. • We will use the randn() function to generate random Gaussian values with a mean of 0 and a standard deviation of 1, then multiply the results by our own standard deviation and add the mean to shift the values into the preferred range.
  • 22. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Dataset Running the example generates the sample and then prints the mean and standard deviation. As expected, the values are very close to the expected values. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Standard deviation of Gaussian distribution • The Gaussian distribution has the property that the standard deviation from the mean can be used to reliably summarize the percentage of values in the sample. • For example, within one standard deviation of the mean will cover 68 percent of the data. • So, if the mean is 50 and the standard deviation is 5, as in the test dataset above, then all data in the sample between 45 and 55 will account for about 68 percent of the data sample. • A value that falls outside of 3 standard deviations is part of the distribution, but it is an unlikely or rare event at approximately 1 in 370 samples. • Three standard deviations from the mean is a common cut-off in practice for identifying outliers in a Gaussian or Gaussian-like distribution.
  • 23. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Removing outliers • We can calculate the mean and standard deviation of a given sample, then calculate the cut-off for identifying outliers as more than 3 standard deviations from the mean. • We can then identify outliers as those examples that fall outside of the defined lower and upper limits. • Running the example will first print the number of identified outliers and then the number of observations that are not outliers, demonstrating how to identify and filter out outliers respectively. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Interquartile Range method • Not all data is normal or normal enough to treat it as being drawn from a Gaussian distribution. • A good statistic for summarizing a non-Gaussian distribution sample of data is the Interquartile Range (IQR). • The IQR is calculated as the difference between the 75th and the 25th percentiles of the data. • We refer to the percentiles as quartiles (quart meaning 4) because the data is divided into four groups via the 25th, 50th and 75th values. • The IQR can be used to identify outliers by defining limits on the sample values that are a factor k of the IQR below the 25th percentile or above the 75th percentile. • The common value for the factor k is the value 1.5. A factor k of 3 or more can be used to identify values that are extreme outliers Courtesy: Perez and Tah
  • 24. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 Interquartile Range method • We can calculate the percentiles of a dataset using the percentile() NumPy function that takes the dataset and specification of the desired percentile. • The IQR can then be calculated as the difference between the 75th and 25th percentiles. • We can then calculate the cutoff for outliers as 1.5 times the IQR and subtract this cut-off from the 25th percentile and add it to the 75th percentile to give the actual limits on the data. • We can then use these limits to identify the outlier values. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 10 End of File
  • 25. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Feature Selection (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Feature selection • Feature selection is the process of reducing the number of input variables when developing a predictive model. • This step is important to reduce the computational cost of modelling and, in many cases, to improve the performance of the model. • Statistical-based feature selection methods involve evaluating the relationship between each input variable and the target variable using statistics • Then, we select those input variables that have the strongest relationship with the target variable.
  • 26. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Feature selection • One way to think about feature selection methods are in terms of supervised and unsupervised methods: • Unsupervised selection: do not use the target variable (e.g. remove redundant variables). • Supervised selection: use the target variable (e.g. remove irrelevant variables). • Supervised feature selection methods may further be classified into three groups: • Intrinsic: algorithms that perform automatic feature selection during training. • Filter: select subsets of features based on their relationship with the target. • Wrapper: search subsets of features that perform according to a predictive model. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Statistics for feature selection • It is common to use correlation type statistical measures between input and output variables as the basis for filter feature selection. • However, the choice of statistical measures is highly dependent upon the variable data types. • Common data types include numerical (such as height) and categorical (such as a label). • Input variable: variables used as input to a predictive model. • Output variable: variables output or predicted by a model: • Numerical output: regression predictive modelling problem. • Categorical output: classification predictive modelling problem.
  • 27. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Numerical input • Numerical input and numerical output: This is a regression predictive modelling problem with numerical input variables: • Pearson's correlation coefficient (linear). • Spearman's rank coefficient (nonlinear). • Numerical input and categorical output: This is a classification predictive modelling problem with numerical input variables. This might be the most common example of a classification problem: • ANOVA correlation coefficient (linear). • Kendall's rank coefficient (nonlinear). sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Categorical input • Categorical input, numerical output: • This is a regression predictive modelling problem with categorical input variables. • This is a strange example of a regression problem (e.g. you would not encounter it often). • You can use the same numerical input, categorical output methods (described previously), but in reverse.
  • 28. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Categorical input • Categorical input, categorical output: • This is a classification predictive modelling problem with categorical input variables. • The most common correlation measure for categorical data is the chi- squared test. • You can also use mutual information (information gain) from the field of information theory. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 8 End of File
  • 29. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Feature Selection (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Recursive Feature Elimination (RFE) • RFE is a wrapper-type feature selection algorithm. • This means that a different machine learning algorithm is given and used in the core of the method, is wrapped by RFE, and used to help select features. • RFE works by searching for a subset of features by starting with all features in the training dataset and successfully removing features until the desired number remains. • This is achieved by fitting the given machine learning algorithm used in the core of the model, ranking features by importance, discarding the least important features, and re-fitting the model. (Source: scikit-learn documentation)
  • 30. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Recursive Feature Elimination (RFE) Source: Ravishankar, et al. (2016) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 RFE with scikit-learn library • Scikit-learn is a machine learning library with the following features: • simple and efficient tools for predictive data analysis • accessible to everybody, and reusable in various contexts • built on NumPy, SciPy, and matplotlib • open source, commercially usable - BSD license • funded by several companies and universities
  • 31. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 RFE with scikit-learn library • To use it, first the class is configured with the chosen algorithm specified via the estimator argument and the number of features to select via the n_features_to_select argument. • RFE requires a nested algorithm that is used to provide the feature importance scores, such as a Decision Tree. • The nested algorithm used in RFE does not have to be the algorithm that is fit on the selected features; different algorithms can be used. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Decision Tree • Root node : is the first node in decision trees • Splitting : is a process of dividing node into two or more sub-nodes, starting from the root node • Node : splitting results from the root node into sub- nodes and splitting sub-nodes into further sub- nodes • Leaf or terminal node : end of a node, since node cannot be split anymore • Branch / Sub-Tree : A subsection of the entire tree is called branch or sub-tree. • Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes whereas sub-nodes are the child of parent node. Source: https://medium.com/@arifromadhan19/the-basics-of-decision-trees-e5837cc2aba7
  • 32. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 RFE with scikit-learn library sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 RFE with scikit-learn library
  • 33. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 RFE with scikit-learn library sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 RFE with scikit-learn library
  • 34. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11 RFE with scikit-learn library sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12 RFE with scikit-learn library • A box and whisker plot is created for the distribution of accuracy scores for each configured number of features. • We can see that performance improves as the number of features increase • The performance peaks around 4-to-7 features as we might expect, given that only five features are relevant to the target variable. • We can see that using 7 features yields most accurate accuracy score (0.885) Accuracy Number of features
  • 35. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13 13 End of File
  • 36. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Feature Selection (Part 03) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Feature importance • Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable. • There are many types and sources of feature importance scores. • Popular examples: • statistical correlation scores • coefficients calculated as part of linear models • decision trees • permutation importance scores Source: https://medium.com/analytics-vidhya/ranking-features-based-on-importance-predictive-power-with-respect-to-the-class-labels-of-the-25afaed71e90
  • 37. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Why feature importance score is useful? Better understanding the data: • The relative scores can highlight which features may be most relevant to the target, and the converse, which features are the least relevant. • This may be interpreted by a domain expert and could be used as the basis for gathering more or different data. Source: https://medium.com/analytics-vidhya/ranking-features-based-on-importance-predictive-power-with-respect-to-the-class-labels-of-the-25afaed71e90 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Why feature importance score is useful? Better understanding the model: • Most importance scores are calculated by a predictive model that has been fit on the dataset. • Inspecting the importance score provides insight into that specific model and which features are the most important and least important to the model when making a prediction. • This is a type of model interpretation that can be performed for those models that support it. Source: https://medium.com/analytics-vidhya/ranking-features-based-on-importance-predictive-power-with-respect-to-the-class-labels-of-the-25afaed71e90
  • 38. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Why feature importance score is useful? Reducing the number of input features: • This can be achieved by using the importance scores to select those features to delete (lowest scores) or those features to keep (highest scores). • This is a type of feature selection and can simplify the problem that is being modelled, speed up the modelling process (deleting features is called dimensionality reduction), and in some cases, improve the performance of the model. Source: https://medium.com/analytics-vidhya/ranking-features-based-on-importance-predictive-power-with-respect-to-the-class-labels-of-the-25afaed71e90 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Feature importance in Linear Regression
  • 39. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Feature importance in Linear Regression • The scores suggest that the model found the five important features and marked all other features with a zero coefficient, essentially removing them from the model. • A bar chart is then created for the feature importance scores. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Feature importance in Decision Tree
  • 40. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 Feature importance in Decision Tree • Running the example fits the model, then reports the coefficient value for each feature. • The results suggest perhaps three of the 10 features as being important to prediction: feature 4, 5, and 6. • Note: result of each code execution may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. • The code should be run a few times and we can compare the average outcome. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 10 End of File
  • 41. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Data Transforms (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 The scale of your data is important • Machine learning models learn a mapping from input variables to an output variable. • Unfortunately, the scale and distribution of the data drawn from the domain may be different for each variable. • Input variables may have different units (e.g. feet, kilometers, and hours) that, in turn, may mean the variables have different scales. • Differences in the scales across input variables may increase the difficulty of the problem being modelled. Courtesy: Lagabrielle et al. (2018)
  • 42. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 What ML algorithms affected by scale of data? • Algorithms that fit a model that use a weighted sum of input variables: : • linear regression • logistic regression • artificial neural networks (deep learning). • Algorithms that use distance measures between examples: • k-nearest neighbors • support vector machines. • It can also be a good idea to scale the target variable for regression predictive modelling problems to make the problem easier to learn, most notably in the case of neural network models. • A target variable with a large spread of values, in turn, may result in large error gradient values causing weight values to change dramatically, making the learning process unstable. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Data normalization • Normalization is a rescaling of the data from the original range so that all values are within the new range of 0 and 1. • Normalization requires that you know or are able to accurately estimate the minimum and maximum observable values. • You may be able to estimate these values from your available data. • A value is normalized as follows: = − − where the minimum and maximum values pertain to the value x being normalized.
  • 43. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Data normalization • For example, for a dataset, we could guesstimate the min and max observable values as 30 and -10. We can then normalize any value, like 18.8, as follows: = − − = 18.8 − (−10) 30 − (−1)) = 28.8 40 = 0.72 • You can see that if an value is provided that is outside the bounds of the minimum and maximum values, the resulting value will not be in the range of 0 and 1. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Data normalization in scikit-learn library • You can normalize your dataset using the scikit-learn object MinMaxScaler. Good practice usage with the MinMaxScaler and other scaling techniques is as follows: • Fit the scaler using available training data: for normalization, this means the training data will be used to estimate the minimum and maximum observable values. This is done by calling the fit() function. • Apply the scale to training data: this means you can use the normalized data to train your model. This is done by calling the transform() function. • Apply the scale to data going forward: this means you can prepare new data in the future on which you want to make predictions.
  • 44. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Data normalization in scikit-learn library sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Data normalization in scikit-learn library • Running the example first reports the raw dataset, showing 2 columns with 5 rows. • The values are in scientific notation which can be hard to read if you’re not used to it. • Next, the scaler is defined, fit on the whole dataset and then used to create a transformed version of the dataset with each column normalized independently. • We can see that the largest raw value for each column now has the value 1.0 and the smallest value for each column now has the value 0.0. Results of scaling Results of normalization
  • 45. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 9 End of File
  • 46. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Data Transforms (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 What is a quantile? • Wikipedia: “In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way”. • 2 quantiles  median • 4 quantiles  quartiles • 100 quantiles  percentiles Probability density of a normal distribution, with quartiles shown. The area below the red curve is the same in the intervals (−∞,Q1), (Q1,Q2), (Q2,Q3), and (Q3,+∞).
  • 47. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Quantiles on a Cumulative Distribution Function (CDF) Percent (y-axis) of data that is at or below a given value on x-axis Source: https://www.youtube.com/watch?v=ByjPLoxQAZk sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Quantiles on Iris dataset Source: https://www.youtube.com/watch?v=ByjPLoxQAZk
  • 48. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Quantiles on Iris dataset Source: https://www.youtube.com/watch?v=ByjPLoxQAZk sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 Quantiles on Iris dataset Source: https://www.youtube.com/watch?v=ByjPLoxQAZk Source: https://www.youtube.com/watch?v=ByjPLoxQAZk
  • 49. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Quantile function Quantile function is an inverse of cumulative distribution function (CDF) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Non-standard data distribution • Numerical input variables may have a highly skewed or non-standard distribution. • This could be caused by outliers in the data, multi-modal distributions, highly exponential distributions, and so on. • Many machine learning algorithms prefer or perform better when numerical input variables. • Even output variables in the case of regression have a standard probability distribution, such as a Gaussian (normal) or a uniform distribution Source: https://www.biologyforlife.com/skew.html
  • 50. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 Quantile transforms sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 Why quantile transforms? • Improving accuracy of machine learning model • Perform monotonic transformation of features  preserve rank of values • Robust  less susceptible to outliers • Disadvantage: distorts correlation and distances within and across features.
  • 51. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11 Quantile transforms in scikit-learn sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12 Results of quantile transforms • Running the example first creates a sample of 1,000 random Gaussian values and adds a skew to the dataset. • A histogram is created from the skewed dataset and clearly shows the distribution pushed to the far left. • Then a QuantileTransformer is used to map the data to a Gaussian distribution and standardize the result, centering the values on the mean value of 0 and a standard deviation of 1.0. • A histogram of the transform data is created showing a Gaussian shaped data distribution.
  • 52. 07/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13 13 End of File sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14 The scale of your data is important • Machine learning models learn a mapping from input variables to an output variable. • Unfortunately, the scale and distribution of the data drawn from the domain may be different for each variable. • Input variables may have different units (e.g. feet, kilometers, and hours) that, in turn, may mean the variables have different scales. • Differences in the scales across input variables may increase the difficulty of the problem being modelled. Courtesy: Lagabrielle et al. (2018)
  • 53. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Introduction to PCA (Part 01) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Eigendecomposition and Principal Component Analysis (PCA) • Principal Component Analysis (PCA) is used for dimensionality reduction • Example: • We want to reduce data from 2D space to 1D. • However, we don’t want to loose important information from our features. • We transform the data to be aligned with the most important direction (red) and remove less important direction (green). Source: https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c
  • 54. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Why using Principal Component Analysis? • You find that features in your data are highly correlated each other  multicollinearity • Ideal case: features should be independent from each other. • Solving this issue is super important, because working with highly correlated features in multivariate regression models can lead to inaccurate results. • To understand PCA, we have to learn about eigenvalues and eigenvectors sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Eigenvectors & Eigenvalues • Eigen in German means distinctive, characteristic, particular to person or place. • Eigenvalue and eigenvector are important for matrix decomposition aimed at reducing dimensionality without losing much information and reducing computational cost of matrix processing. • An eigenvector of the matrix is a vector that is contracted or elongated when transformed by the matrix • The eigenvalue is the scaling factor by which the vector is contracted or elongated: (a) If the scaling factor is positive, the directions of the initial and the transformed vectors are the same: (b) If the scaling factor is negative, their directions are reversed. Matrix acts by stretching the vector , not changing its direction, so is an eigenvector of . x x
  • 55. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Which one is the eigen vector, red or blue? Answer: blue Eigenvectors & Eigenvalues: step-by-step sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 • Let be an eigenvector of the matrix . Then there must exist an eigenvalue such that = or, equivalently, − = 0 or ( − ) = 0 • If we define a new matrix = − , then = • If has an inverse, then = = . But an eigenvector cannot be zero. • Thus, it follows that will be an eigenvector of if and only if does not have an inverse, or equivalently det( )=0, or det( − )=0 • This is called the characteristic equation of . Its roots determine the eigenvalues of . Mathematical approach (1) If the determinant equals to zero, the transformation is scaled into a line
  • 56. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 Mathematical approach (2) • Suppose we have matrix = 2 1 1 2 • Applying eigenvectors and eigenvalues equation = : = = 2 1 1 2 = • To solve , and , we rearrange this equation: 2 + = + 2 = which we can further rearrange as: (2 − ) + = 0 + (2− ) = 0 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 Mathematical approach (3) • Since det( − )=0, we find the determinant accordingly: Thus = 3 and = 1. Using these values, we can get and . We can find eigenvectors which correspond to these eigenvalues by plugging back in to the equations above and solving for and . To find an eigenvector corresponding to = 3, start with Finding determinant of 2-dimensional matrix
  • 57. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 • There are an infinite number of values for which satisfy this equation. The only restriction is that not all the components in an eigenvector can equal zero. • So if = 1, then = 1 and an eigenvector corresponding to = 3 is [1, 1]. • Finding an eigenvector for = 1 works the same way. • So an eigenvector for = 1 is [1, −1]. Mathematical approach (4) sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 10 End of File
  • 58. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 1 Sunu Wibirama sunu@ugm.ac.id Department of Electrical and Information Engineering Faculty of Engineering Universitas Gadjah Mada INDONESIA Introduction to PCA (Part 02) Kecerdasan Buatan | Artificial Intelligence Version: January 2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 2 Variance and covariance • Variance : It is a measure of the variability or it simply measures how spread the data set is. Mathematically, it is the average squared deviation from the mean score. • Covariance : It is a measure of the extent to which corresponding elements from two sets of ordered data move in the same direction.
  • 59. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 3 Variance and covariance • Positive covariance means and are positively related i.e. as increases also increases. Negative covariance depicts the exact opposite relation. However zero covariance means and are not related. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 4 Step-by-step PCA The whole process of mathematics in PCA can be divided into 5 parts: 1. Standardizing the data 2. Calculate the co-variance matrix 3. Calculating the eigenvectors and eigenvalues 4. Computing the principal components 5. Reducing the dimension of the datasets Source: https://medium.com/analytics-vidhya/principal-component-analysis-pca-558969e63613
  • 60. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 5 Bui H-B, Nguyen H, Choi Y, Bui X-N, Nguyen-Thoi T, Zandi Y. A Novel Artificial Intelligence Technique to Estimate the Gross Calorific Value of Coal Based on Meta-Heuristic and Support Vector Regression Algorithms. Applied Sciences. 2019; 9(22):4868. https://doi.org/10.3390/app9224868 Reducing the dimension of the datasets Z* 1. Find the mean vector 2. Standardizing data: subtracting mean and dividing by standard deviation 3. Compute covariance matrix: = 4. Compute eigenvalues and eigenvectors of , then decompose into : matrix of eigenvectors is the diagonal matrix with eigenvalues on the diagonal and values of zero everywhere else 5. Sort eigenvectors from highest eigenvalues 6. Project original data to eigenvectors 7. Obtain projected points in low dimensions, choose most important principal components P* ZP* sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 6 1. Standardizing the data • Standardizing is the process of scaling the data in such a way that all the variables and their values lie within a similar range. The formula for standardization is shown below: • where: • : observation or sample • : mean • : standard deviation • Save the standardized data in a matrix . Source: https://medium.com/analytics-vidhya/principal-component-analysis-pca-558969e63613
  • 61. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 7 2. Calculate the covariance matrix • Take the matrix , transpose it, and multiply the transposed matrix by matrix : = • The result is a co-variance matrix: expressing the correlation between the different variables in the data set. • It is essential to identify highly dependent variables because they contain biased and redundant information which can hamper the overall performance of the model. • If our dataset has more than 2 dimensions then it can have more than one covariance measurement. For example, if we have a dataset with 3 dimensions x, y and z, then the covariance matrix of this dataset will look like this Source: https://medium.com/analytics-vidhya/principal-component-analysis-pca-558969e63613 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 8 3. Calculate eigenvectors and eigenvalues • Next, calculate eigenvectors and eigenvalues from the covariance matrix. • the eigendecomposition of = is where we decompose into • where: • : matrix of eigenvectors • is the diagonal matrix with eigenvalues on the diagonal and values of zero everywhere else. (Deisenroth; Faisal; Ong, 2020)
  • 62. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 9 3. Calculate eigenvectors and eigenvalues • Next, calculate eigenvectors and eigenvalues from the covariance matrix. • the eigendecomposition of = is where we decompose into , • where: • : matrix of eigenvectors • is the diagonal matrix with eigenvalues on the diagonal and values of zero everywhere else. (Deisenroth; Faisal; Ong, 2020) Note: p1 and p2 are orthogonal because their dot product equals to zero. Why? Because A is a symmetric matrix. Eigenvectors of a symmetric matrix will always be orthogonal. What if these vectors are not orthogonal? Use The Gram–Schmidt orthogonalization to construct orthogonal or orthonormal vectors. sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 10 4. Computing the principal components • Take the eigenvalues , ,…, and sort them from largest to smallest. • Then, sort the eigenvectors in accordingly (if is the largest eigenvalue, then take the 2nd column of and place it in the 1st column position). • The eigenvector with the highest eigenvalue is the most significant and therefore forms the 1st principal component (PC 1). • Call this sorted matrix of eigenvectors * (the columns of * should be the same as the columns of , but can be in a different order.) Source: https://medium.com/analytics-vidhya/principal-component-analysis-pca-558969e63613 • PC 1 is the most significant and stores the maximum possible information • PC 2 is the second most significant PC and stores remaining maximum information
  • 63. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 11 5. Reducing the dimension of the datasets • Re-arrange the original dataset on the final principal components which represent the maximum and most significant information of the dataset. • Calculate ∗ = * • This new matrix, ∗ , is a centered/standardized version of original data. • However, each observation in ∗ is a combination of the original variables, where the weights are determined by the eigenvector. • Because our eigenvectors in * are independent of one another, each column of ∗ is also independent of one another. Z* The left graph is our original data, the right graph would be our transformed data ∗ sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 12 5. Reducing the dimension of the datasets • Finally, we need to determine how many principal component (PC) to keep versus how many to drop. Normally we keep the most important PC and drop the less PC.
  • 64. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 13 PCA in Python 1. Data preparation—standardizing the data sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 14 PCA in Python 2. Calculate covariance matrix, eigenvalues, eigenvectors, and computing the principle components
  • 65. 08/06/2022 sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 15 PCA in Python sunu@ugm.ac.id Copyright © 2022 Sunu Wibirama | Do not distribute without permission @sunu_wibirama 16 16 End of File