SlideShare a Scribd company logo
1 of 39
Download to read offline
Excerpts from Data Science (Chapter 4)
by Kelleher, J.D. and Tierney, B. (2018)
Data Science
Machine Learning 101
Mpumelelo Ndlovu
January 7, 2023
1
“The real challenge in using ML is to
find the algorithm whose learning
bias is the best match for a particular
data set ”
Kelleher & Tierney, (2018)
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
2
Plan of Talk
Introduction
Classification of Algorithms
Prediction Models
Regression Models
Neural Networks and Deep Learning
Decision Trees
Bias in Data Science
Conclusion
References and Bibliography
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
3
Introduction
☞ Chapter 3 introduced the computing infrastructure used by a
data scientists.
☞ Chapter 4 introduces data science as a partnership between the
data scientist and the computer .
☞ As shown in Figure 1 below, the data scientist does most of the
heavy lifting.
☞ The sequence of decisions taken by the data scientist are
determined by the CRISP-DM1
model described in Chapter 2 of
this book.
☞ Machine learning is the field of study that develops and evaluates
the algorithms used by the computer to identify and extract
patterns in data
☞ Machine learning algorithms are mainly applied during the
Modelling phase of the CRISP-DM framework
1https://www.ibm.com/docs/en/spss-modeler/saas?topic=dm-crisp-help-overview
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
4
The Data Scientist - Computer Partnership
Data Science
The Data Scientist
Defines the problem
Designs the data set
Prepares the data
Decides on the type of data analysis
Evaluates & interprets the result
The Computer
Processes data
Searches for patterns in the data
Figure 1 : The Scientist and the Computer
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
5
The Modelling Phase
☞ The Modelling stage of the CRISP-DM process is split into phases:
Phase 1 of the Modelling Stage
1. The algorithm is applied on the data set to identify useful patterns
2. The patterns can be represented in may ways called models which
gives this stage of the CRISP-DM framework its name
3. These models include decision trees, regression models, and neural
networks
Phase 2 of the Modelling Stage
1. The models, the output of Phase 1, are used for data analysis.
2. Sometimes the structure of the model itself can reveal what the
important attributes are. For example, factors that strongly correlate with
Stroke
3. A model can also be used to classify or label new examples like new
types of Spam emails
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
6
Supervised Learning
The majority of algorithms can be classified as either
supervised learning or unsupervised learning.
Supervised Learning Algorithms
- learns a function that map values of attributes
that describe an instance to a target attribute
- the output pattern is a function that maps
the input attributes to the values of the
target attributes
- each instance in the data set must be labelled
with the value of the target attribute.
- searches through lots of functions to find one that
best maps inputs to outputs;
- learning bias is used to limit the number of pre-
ferred functions.
- Regression (lin-
ear, polynomial)
- Decision trees
- Random
Forests
- Classification
(KNN, Logis-
tic regression,
Naive Bayes,
SVM)
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
7
Unsupervised Learning
Unsupervised Learning Algorithms
- there is no target attribute so there is no need
to waste time labelling instances in the data
set with a target attribute;
- unsupervised algorithms are more difficult to
learn
- these algorithms search for irregularities in
the data.
- the main challenge for clustering is to mea-
sure similarity between the instances in the
data set
- Table 3 below lists some of the common sim-
ilarity measures.
- Clustering (K-
Means)
- Association-rule
Mining (Apriori
algorithm)
- Dimensionality
Reduction
(PCAs)
- Gaussian Mix-
ture Models
(GMM)
Table 2 : Unsupervised Learning
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
8
Unsupervised Learning
Common Similarity Measures
Similarity Measure Notes
Euclidean Distance - straight-line similarity used where all at-
tributes are numeric and have similar ranges
Jaccard similarity - measures the similarity between two sets of
data to see which members are shared and
distinct.
Cosine Similarity/Adjusted
Cosine Similarity
- suitable for numeric attributes with different
ranges which need to be normalized before
calculating similarity
Weighted Similarity - takes into account the importance of at-
tributes by ranking them before calculating sim-
ilarity
Table 3 : Similarity Measures
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
9
Learning Prediction Models
✍ Prediction algorithms estimate the value of a target attribute
based on the values of other attributes
✍ Prediction models are produced by supervised learning
algorithms.
✍ They are the most popular type of problems ML is used for.
✍ One concept that is fundamental to Prediction problems is
correlation analysis
Figure 2 : Correlation Analysis (Source:Scribbr2
)
2https://www.scribbr.com/statistics/correlation-coefficient/
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
10
Correlation Analysis
✍ A correlation is the strength of association between 2 attributes.
✍ The Pearson Correlation (r) is the most common measure of
linear strength between 2 numeric attributes whose values range
from −1 to 1.
✍ A coefficient of r = 0 means that the attributes are not correlated;
r = +1 means perfect positive correlation; and r = −1 indicates
the 2 attributes have a perfect negative correlation
✍ Identifying attributes that are highly-correlated to the target
attribute is very key in understanding the cause of an issue.
✍ Like correlation analysis, prediction techniques involve analysing
the relationship between attributes.
✍ If strong correlation exists between an input attribute and a target
attribute, then the ML algorithm is likely to generate an accurate
prediction model and vice versa
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
11
Pearson Correlation Analysis
- To calculate correlation for the population:
ρ =
cov(X, Y)
σx σy
(1)
- To calculate the estimate (sample):
r =
Pn
i=1(xi − x)(yi − y)
qPn
i=1(xi − x)2(yi − y)2
(2)
- Table 4 below shows the general guidelines for interpreting Pearson
coefficients:
r ≈ ±0.7 r ≈ ±0.5 r ≈ ±0.3 r ≈ 0
Strong linear
relationship
Moderate linear
relationship
Weak
relationship
No
relationship
Table 4 : Interpreting Pearson Coefficients
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
12
The BMI Example
The Data Set
ID Height
(m)
Weight(kg) Shoe
Size
Exercise
(min-
utes/day)
Diabetes
(%Likelihood)
1 1.70 70 5 130 0.05
2 1.77 88 9 80 0.11
3 1.85 112 11 0 0.18
Table 5 : Diabetes Data Set
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
13
The BMI Example
✍ A very popular application of correlation in real life is in
calculating the BMI index which classifies people as
underweight, normal weight, and overweight
✍ BMI:
- takes a number of attributes and maps them to a target
value - a new derived value.
- It’s easy to calculate the correlation between BMI and other
person’s attributes
✍ Diabetes has a higher correlation with BMI than with weight and
height independently;
✍ During data preparation, it is also important to check the effect of
a combination of attributes like BMI.
✍ Another benefit of ML is that ML algorithms can learn interactions
between attributes and create useful derived attributes
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
14
Linear Regression
✍ Regression models are preferred when the data set consists of
numeric attributes
✍ The first step is to hypothesize the structure and relationship of
attributes followed by a parameterized mathematical model
called by a regression function
Figure 3 : Regression Functions
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
15
The Regression Function
✍ The Regression Function converts inputs into outputs
✍ The best approach in linear regression analysis is to assume a
simple model first before considering a multi-parameter one
✍ A simple, single-parameter regression function models the
relationship betwen 2 attributes, X and Y:
Y = wo + w1X (3)
✍ The variables w0 and w1 are the parameters of the regression
function, where w0 is the Y intercept and w0 is the gradient of the
line.
✍ Modifying these parameters changes how the function maps
from the input X to the output Y
✍ Finding parameters is equivalent to defining the line that best fits
our data by reducing the overall error
✍ In high school mathematics this equation is usually written as:
y = mx + c (4)
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
16
Regression Analysis
Calculating overall error
Sum of Squared Errors (SSE)
➊ The regression function is applied to the data set to estimate the target
attribute using the input attribute(s).
➋ The error of the function is calculated per instance by subtracting the
estimated value of the target attribute from the actual target value
➌ The error of the function for each instance is squared to eliminate
negative values and the squared values are summed up.
➥ Equation 5 below shows the formula to calculate the Sum of Squared
Error (SSE) for a data set with n instances; targeti is the target attribute
for instance i and predictioni is the predicted target attribute by the
function for the same instance i.
SSE =
n
X
i=i
(targeti − predictioni )2
(5)
➥ The strategy of fitting a linear function by minimizing the SSE is known
as least squares
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
17
The Regression Function
An Implementation Example
✍ Replacing X with the BMI attribute and Y with the diabetes attribute from
Table 5, in equation 3 to find the best-fit line using the least-squares
approach, produced equation 6:
Diabetes = −7.38431 + 0.55593 ∗ BMI (6)
✍ Where, -7.38431 is the Y intercept w0 and 0.55593 is the gradient w1.
✍ If BMI is 24, the model (Diabetes = −7.38431 + 0.55593 ∗ 24) produces
a prediction of 5.96%
✍ The least-squares method calculates a weighted average over the
instances based on their distance from the best-fit line.
✍ The farther the instance is away from the line, the larger the residual
squared and the more weight is applied to the instance. This skews the
algorithm towards outliers if they were not removed during data
preparation
✍ Multiple linear regression functions extend the linear regression model
by taking more parameters
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
18
Neural Networks
✍ A neural network is a set of interconnected neurons which take a
numeric values as input and map them to a single output
✍ Unlike a multiple linear regression model, a neuron passes its output
through an activation function
✍ Figure 4 below shows a neural network with a single activation layer
✍ Table 6 below lists some of the most common non-linear activation
functions
✍ The activation functions take the single-value output of the multi-input
linear regression function and map it to a non-linear output
✍ Each neuron in a neural network:
➊ Multiplies each input by a weight
➋ Adds together the results of the multiplication
➌ Pushes the result to the activation function
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
19
A Simple Neural Network
x1
x2
x3
x4
Output
Hidden
layer
Input
layer
Output
layer
Figure 4 : Neural Network
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
20
Common Activation Functions
Name Function Derivative Figure
Logistic σ(x) =
1
1 + e−x
f′
(x) = f(x)(1 − f(x))2
Tanh σ(x) =
ex
− e−x
ez + e−z
f′
(x) = 1 − f(x)2
ReLU f(x)
(
0 if x < 0
x if x ≥ 0.
f(x)
(
0 if x < 0
x if x ≥ 0.
Softmax f(x) =
ex
P
i ex
f′
(x) =
ex
P
i ex
−
(ex
)2
(
P
i ex )2
Table 6 : Non-linear Activation Functions.
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
21
Understanding a Neural Network
✍ The neural network in figure 5 is organised into 3 layers the input layer,
hidden layer, and the output layer.
✍ Nodes h1 to hn are neurons that make up the hidden layer, a layer which
is neither the input nor the output;
✍ The arrows represent the flow of information. Feed-forward neural
networks have no loops; all the connections point forward;
✍ A fully-connected network is one where every neuron is connected to all
other neurons;
✍ Most of the work in developing neural networks involves finding the best
network layout, number of hidden layers, types of activation functions
used and the direction of the connections;
✍ The labels on each arrow (ωn) represent the weights and the f node
represents the activation function.
✍ The output of figure 5 with a tanh activation function would be
output = tanh(ω1h1 + ω2h2 + ω3h3 + ... + ωnhn)
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
22
Weights in a Neural Network
f
Σ output
.
.
.
.
.
.
x1
h1
ω1
x2
h2
ω2
x3
h3
ω3
xn
hn
ωn
Figure 5 : Neural Network with Weights
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
23
Training Neural Networks
The Weight-update Rule
✍ Training a neural network involves finding the correct weights
(ω1, ω2, ω3, ..., ωn) using the weight-update rule.
✍ At a high level, the weight-update rule works like this:
❶ If the error is 0, then don’t change the weights
❷ If the error is positive, increase the weights for all connections
where the input is positive and reduce he weights for connections
where the input is negative.
❸ If the error is negative, increase the weights for all connections
where the input is negative and reduce he weights for connections
where the input is positive.
✍ The major challenge with the weight-update rule is that it is difficult to
calculate the error for neurons earlier layers in deep neural networks.
✍ The standard way to train a neural network is to use the
backpropagation algorithm, a supervised learning algorithm illustrated in
figure 6.
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
24
The Backpropagation Algorithm
x1
x2
x3
x4
h
h
h
h
h
Σ Output
Hidden
layer
Input
layer
Output
layer
Back propagation
Figure 6 : Backpropagation Simplified
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
25
Backpropagation in Action
✍ The algorithm propagates the error resulting from the training of each
instance back to the network starting from the output layer as shown in
figure 6.
✍ The main steps of the algorithm are as follows:
❶ Calculate the error for the neurons in the output layer and use the
weight-update rule to adjust the weights down the network
❷ Share the error calculated in a neuron with the neurons in the
preceding layer
❸ Work back through the layers repeating steps 1 and 2 above
✍ The idea is to reduce, not eliminate error to avoid overfitting, and allow
the network to generalize to new instances that are not in the data set;
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
26
Deep Learning
✍ Deep Learning networks are neural networks with with more than one
hidden layers.
✍ Figure 7 below has 3 hidden layers of 5 neurons each; and 5 layers
overall;
✍ You don’t need to have the same number of neurons in each layer as
shown in figure 7, the input layer has 3 neurons, the hidden layer has 5
neurons in 3 hidden layers and 3 neurons in the output layer.
✍ Figure 7 is also a feed-forward network since it has no loops.
✍ Visit Kaggle3
for a comprehensive deep learning cheat sheet.
✍ Table 7 gives a summary of some of the common deep learning
networks and their applications.
3https://www.kaggle.com/getting-started/151100
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
27
Deep Neural Networks
Figure 7 : Deep Neural Network
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
28
Common Deep Learning Networks
Network Key Features Applications
Convolutional Neural
Networks (CNN)
one or more convolutional
layers;one or more fully con-
nected layers
image recogni-
tion;classification
Reccurent neural net-
work (RNN)
connections between units
have a directed cycle
Time series predic-
tion; text generation
Long short-term
memory (LSTM)
type of RNN; remember
longer than RNN
Time series predic-
tion; text generation
Deep Belief Network
(DBN)
has connections between
layers but not within layer
unsupervised learn-
ing tasks to reduce
the dimensionality of
features
Self-Organising
Maps (SOM)
convert input data to low
dimensional space
Visualization
Table 7 : Deep Learning Networks
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
29
Decision Trees
✍ Linear regression and neural networks work best with numeric inputs,
not with nominal or ordinal data.
✍ Decision trees work well with nominal and ordinal data types
✍ Figure 8 shows a decision tree for deciding whether an email is a spam
or not. The rounded rectangles are attributes.
✍ A decision tree encodes a set of if then, else rules in a tree structure
✍ Each path in a decision tree, from root to leaf, defines a classification
rule
✍ The Iterative Dichotomiser 3 (ID3)4
algorithm is considered the father of
all decision tree algorithms;
✍ Decision trees are very sensitive to noise in the data set. It is
recommended to keep them shallow.
✍ A random forest model is made up of a set of decision trees.
4https://en.wikipedia.org/wiki/ID3_algorithm
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
30
Decision Tree Example
Figure 8 : A sample decision tree
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
31
The ID3 Algorithm
✍ The ID3 algorithm recursively builds the decision tree in a depth-first
manner adding one node at a time starting with the root node;
✍ The ID3 chooses the attribute to test at each node in the tree so as to
minimize the number of sets that have the same value as the target
attribute (pure sets);
✍ The entropy metric can be used to measure the purity of a set.
✍ The ID3 selects the that results in the lowest weighted entropy attribute
to test a node after splitting data set at the node using this attribute.
✍ To calculate the weighted entropy of a node:
❶ split the data set using the attribute;
❷ calculate the entropy of the resulting sets;
❸ weight each entropy by the fraction of data in the set;
❹ sum-up the results
✍ Decision trees are easy to understand
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
32
Bias in Data Science
✍ The major objective of ML is to create models that encode appropriate
generalizations from the data;
✍ Two key factors determine the quality of an ML model:
❶ The data set the algorithm is run on. If the data set is not a true
reflection of real-life events, the model will not be accurate. This is
referred to a sampling bias
❷ The choice of ML algorithm. ML algorithms use learning bias or
modelling/selection bias to generalize from a data set. A wrong
choice of algorithm will result in an incorrect learning bias.
✍ While sampling bias is bad, without learning bias there can be no
learning and the algorithm will only memorize the data
✍ There is no best ML algorithm so the Modelling Phase of CRISP-DM
process involves building multiple models using different algorithms and
choosing the best in terms of accuracy, generalization and other
performance metrics.
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
33
“The golden rule for evaluating
models is that models should
never be tested on the same data
they were trained on ”
Kelleher & Tierney, (2018)
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
34
Evaluating Models
✍ After generating models, the next step is to create a test plan;
✍ A model that simply memorizes training data will not perform well on test
and any other previously unseen data.
✍ The normal practice is to split the data into 3 sets: training data; testing
data and validation data;
✍ The other important aspect of a test plan is choosing the appropriate
evaluation metrics to use during testing;
✍ Table 8 below shows some of the most commonly used metrics and
their applications.
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
35
Common Evaluation Metrics
Metric Applications Equation
Mean Absolute Error (MAE) Regressions MAE =
Pd
i=1 |xi − yi |
Root Mean Squared Error
(RMSE)
Regressions RMSE =
q
1
n
Pn
i=1(ŷi − yi )2
Recall/AUC Classification Recall = TP
TP+FN
Precision Classification Precision = TP
TP+FP
Accuracy Classification Accuracy = TP+TN
TP+TN+FP+FN
Table 8 : Common Evaluation Metrics
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
36
Conclusion
✓ Chapter 4 started by asserting the partnership between a data scientist
and a computer;
✓ The computer generates a model from a data set prepared by the data
scientist.
✓ The data scientist interprets and evaluates the model;
✓ Model evaluation follows the golden rule
✓ The best model is chosen based on its accuracy, but in future,
data-usage and privacy may affect model selection;
✓ Chapter 5 discusses converting a business problem to a data science
problem and Chapter 6 will discuss the impact of Privacy Laws on data
science.
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
37
References and Bibliography
Kelleher, J.D.& Tierney, B. - 2018 - Data Science,
MIT Press. pp. 101–150.
Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
Thank you!

More Related Content

Similar to Data Science Chapter 4: Machine Learning 101

Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptx
AnanthReddy38
 

Similar to Data Science Chapter 4: Machine Learning 101 (20)

PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
machine learning
machine learningmachine learning
machine learning
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
A Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingA Study on Machine Learning and Its Working
A Study on Machine Learning and Its Working
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
 
Industrial training ppt
Industrial training pptIndustrial training ppt
Industrial training ppt
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
IRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial IntelligenceIRJET- The Machine Learning: The method of Artificial Intelligence
IRJET- The Machine Learning: The method of Artificial Intelligence
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
 
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptx
 
Data Science Chapter 2.pdf
Data Science Chapter 2.pdfData Science Chapter 2.pdf
Data Science Chapter 2.pdf
 

Recently uploaded

Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...
👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...
👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 

Data Science Chapter 4: Machine Learning 101

  • 1. Excerpts from Data Science (Chapter 4) by Kelleher, J.D. and Tierney, B. (2018) Data Science Machine Learning 101 Mpumelelo Ndlovu January 7, 2023
  • 2. 1 “The real challenge in using ML is to find the algorithm whose learning bias is the best match for a particular data set ” Kelleher & Tierney, (2018) Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 3. 2 Plan of Talk Introduction Classification of Algorithms Prediction Models Regression Models Neural Networks and Deep Learning Decision Trees Bias in Data Science Conclusion References and Bibliography Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 4. 3 Introduction ☞ Chapter 3 introduced the computing infrastructure used by a data scientists. ☞ Chapter 4 introduces data science as a partnership between the data scientist and the computer . ☞ As shown in Figure 1 below, the data scientist does most of the heavy lifting. ☞ The sequence of decisions taken by the data scientist are determined by the CRISP-DM1 model described in Chapter 2 of this book. ☞ Machine learning is the field of study that develops and evaluates the algorithms used by the computer to identify and extract patterns in data ☞ Machine learning algorithms are mainly applied during the Modelling phase of the CRISP-DM framework 1https://www.ibm.com/docs/en/spss-modeler/saas?topic=dm-crisp-help-overview Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 5. 4 The Data Scientist - Computer Partnership Data Science The Data Scientist Defines the problem Designs the data set Prepares the data Decides on the type of data analysis Evaluates & interprets the result The Computer Processes data Searches for patterns in the data Figure 1 : The Scientist and the Computer Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 6. 5 The Modelling Phase ☞ The Modelling stage of the CRISP-DM process is split into phases: Phase 1 of the Modelling Stage 1. The algorithm is applied on the data set to identify useful patterns 2. The patterns can be represented in may ways called models which gives this stage of the CRISP-DM framework its name 3. These models include decision trees, regression models, and neural networks Phase 2 of the Modelling Stage 1. The models, the output of Phase 1, are used for data analysis. 2. Sometimes the structure of the model itself can reveal what the important attributes are. For example, factors that strongly correlate with Stroke 3. A model can also be used to classify or label new examples like new types of Spam emails Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 7. 6 Supervised Learning The majority of algorithms can be classified as either supervised learning or unsupervised learning. Supervised Learning Algorithms - learns a function that map values of attributes that describe an instance to a target attribute - the output pattern is a function that maps the input attributes to the values of the target attributes - each instance in the data set must be labelled with the value of the target attribute. - searches through lots of functions to find one that best maps inputs to outputs; - learning bias is used to limit the number of pre- ferred functions. - Regression (lin- ear, polynomial) - Decision trees - Random Forests - Classification (KNN, Logis- tic regression, Naive Bayes, SVM) Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 8. 7 Unsupervised Learning Unsupervised Learning Algorithms - there is no target attribute so there is no need to waste time labelling instances in the data set with a target attribute; - unsupervised algorithms are more difficult to learn - these algorithms search for irregularities in the data. - the main challenge for clustering is to mea- sure similarity between the instances in the data set - Table 3 below lists some of the common sim- ilarity measures. - Clustering (K- Means) - Association-rule Mining (Apriori algorithm) - Dimensionality Reduction (PCAs) - Gaussian Mix- ture Models (GMM) Table 2 : Unsupervised Learning Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 9. 8 Unsupervised Learning Common Similarity Measures Similarity Measure Notes Euclidean Distance - straight-line similarity used where all at- tributes are numeric and have similar ranges Jaccard similarity - measures the similarity between two sets of data to see which members are shared and distinct. Cosine Similarity/Adjusted Cosine Similarity - suitable for numeric attributes with different ranges which need to be normalized before calculating similarity Weighted Similarity - takes into account the importance of at- tributes by ranking them before calculating sim- ilarity Table 3 : Similarity Measures Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 10. 9 Learning Prediction Models ✍ Prediction algorithms estimate the value of a target attribute based on the values of other attributes ✍ Prediction models are produced by supervised learning algorithms. ✍ They are the most popular type of problems ML is used for. ✍ One concept that is fundamental to Prediction problems is correlation analysis Figure 2 : Correlation Analysis (Source:Scribbr2 ) 2https://www.scribbr.com/statistics/correlation-coefficient/ Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 11. 10 Correlation Analysis ✍ A correlation is the strength of association between 2 attributes. ✍ The Pearson Correlation (r) is the most common measure of linear strength between 2 numeric attributes whose values range from −1 to 1. ✍ A coefficient of r = 0 means that the attributes are not correlated; r = +1 means perfect positive correlation; and r = −1 indicates the 2 attributes have a perfect negative correlation ✍ Identifying attributes that are highly-correlated to the target attribute is very key in understanding the cause of an issue. ✍ Like correlation analysis, prediction techniques involve analysing the relationship between attributes. ✍ If strong correlation exists between an input attribute and a target attribute, then the ML algorithm is likely to generate an accurate prediction model and vice versa Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 12. 11 Pearson Correlation Analysis - To calculate correlation for the population: ρ = cov(X, Y) σx σy (1) - To calculate the estimate (sample): r = Pn i=1(xi − x)(yi − y) qPn i=1(xi − x)2(yi − y)2 (2) - Table 4 below shows the general guidelines for interpreting Pearson coefficients: r ≈ ±0.7 r ≈ ±0.5 r ≈ ±0.3 r ≈ 0 Strong linear relationship Moderate linear relationship Weak relationship No relationship Table 4 : Interpreting Pearson Coefficients Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 13. 12 The BMI Example The Data Set ID Height (m) Weight(kg) Shoe Size Exercise (min- utes/day) Diabetes (%Likelihood) 1 1.70 70 5 130 0.05 2 1.77 88 9 80 0.11 3 1.85 112 11 0 0.18 Table 5 : Diabetes Data Set Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 14. 13 The BMI Example ✍ A very popular application of correlation in real life is in calculating the BMI index which classifies people as underweight, normal weight, and overweight ✍ BMI: - takes a number of attributes and maps them to a target value - a new derived value. - It’s easy to calculate the correlation between BMI and other person’s attributes ✍ Diabetes has a higher correlation with BMI than with weight and height independently; ✍ During data preparation, it is also important to check the effect of a combination of attributes like BMI. ✍ Another benefit of ML is that ML algorithms can learn interactions between attributes and create useful derived attributes Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 15. 14 Linear Regression ✍ Regression models are preferred when the data set consists of numeric attributes ✍ The first step is to hypothesize the structure and relationship of attributes followed by a parameterized mathematical model called by a regression function Figure 3 : Regression Functions Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 16. 15 The Regression Function ✍ The Regression Function converts inputs into outputs ✍ The best approach in linear regression analysis is to assume a simple model first before considering a multi-parameter one ✍ A simple, single-parameter regression function models the relationship betwen 2 attributes, X and Y: Y = wo + w1X (3) ✍ The variables w0 and w1 are the parameters of the regression function, where w0 is the Y intercept and w0 is the gradient of the line. ✍ Modifying these parameters changes how the function maps from the input X to the output Y ✍ Finding parameters is equivalent to defining the line that best fits our data by reducing the overall error ✍ In high school mathematics this equation is usually written as: y = mx + c (4) Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 17. 16 Regression Analysis Calculating overall error Sum of Squared Errors (SSE) ➊ The regression function is applied to the data set to estimate the target attribute using the input attribute(s). ➋ The error of the function is calculated per instance by subtracting the estimated value of the target attribute from the actual target value ➌ The error of the function for each instance is squared to eliminate negative values and the squared values are summed up. ➥ Equation 5 below shows the formula to calculate the Sum of Squared Error (SSE) for a data set with n instances; targeti is the target attribute for instance i and predictioni is the predicted target attribute by the function for the same instance i. SSE = n X i=i (targeti − predictioni )2 (5) ➥ The strategy of fitting a linear function by minimizing the SSE is known as least squares Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 18. 17 The Regression Function An Implementation Example ✍ Replacing X with the BMI attribute and Y with the diabetes attribute from Table 5, in equation 3 to find the best-fit line using the least-squares approach, produced equation 6: Diabetes = −7.38431 + 0.55593 ∗ BMI (6) ✍ Where, -7.38431 is the Y intercept w0 and 0.55593 is the gradient w1. ✍ If BMI is 24, the model (Diabetes = −7.38431 + 0.55593 ∗ 24) produces a prediction of 5.96% ✍ The least-squares method calculates a weighted average over the instances based on their distance from the best-fit line. ✍ The farther the instance is away from the line, the larger the residual squared and the more weight is applied to the instance. This skews the algorithm towards outliers if they were not removed during data preparation ✍ Multiple linear regression functions extend the linear regression model by taking more parameters Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 19. 18 Neural Networks ✍ A neural network is a set of interconnected neurons which take a numeric values as input and map them to a single output ✍ Unlike a multiple linear regression model, a neuron passes its output through an activation function ✍ Figure 4 below shows a neural network with a single activation layer ✍ Table 6 below lists some of the most common non-linear activation functions ✍ The activation functions take the single-value output of the multi-input linear regression function and map it to a non-linear output ✍ Each neuron in a neural network: ➊ Multiplies each input by a weight ➋ Adds together the results of the multiplication ➌ Pushes the result to the activation function Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 20. 19 A Simple Neural Network x1 x2 x3 x4 Output Hidden layer Input layer Output layer Figure 4 : Neural Network Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 21. 20 Common Activation Functions Name Function Derivative Figure Logistic σ(x) = 1 1 + e−x f′ (x) = f(x)(1 − f(x))2 Tanh σ(x) = ex − e−x ez + e−z f′ (x) = 1 − f(x)2 ReLU f(x) ( 0 if x < 0 x if x ≥ 0. f(x) ( 0 if x < 0 x if x ≥ 0. Softmax f(x) = ex P i ex f′ (x) = ex P i ex − (ex )2 ( P i ex )2 Table 6 : Non-linear Activation Functions. Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 22. 21 Understanding a Neural Network ✍ The neural network in figure 5 is organised into 3 layers the input layer, hidden layer, and the output layer. ✍ Nodes h1 to hn are neurons that make up the hidden layer, a layer which is neither the input nor the output; ✍ The arrows represent the flow of information. Feed-forward neural networks have no loops; all the connections point forward; ✍ A fully-connected network is one where every neuron is connected to all other neurons; ✍ Most of the work in developing neural networks involves finding the best network layout, number of hidden layers, types of activation functions used and the direction of the connections; ✍ The labels on each arrow (ωn) represent the weights and the f node represents the activation function. ✍ The output of figure 5 with a tanh activation function would be output = tanh(ω1h1 + ω2h2 + ω3h3 + ... + ωnhn) Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 23. 22 Weights in a Neural Network f Σ output . . . . . . x1 h1 ω1 x2 h2 ω2 x3 h3 ω3 xn hn ωn Figure 5 : Neural Network with Weights Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 24. 23 Training Neural Networks The Weight-update Rule ✍ Training a neural network involves finding the correct weights (ω1, ω2, ω3, ..., ωn) using the weight-update rule. ✍ At a high level, the weight-update rule works like this: ❶ If the error is 0, then don’t change the weights ❷ If the error is positive, increase the weights for all connections where the input is positive and reduce he weights for connections where the input is negative. ❸ If the error is negative, increase the weights for all connections where the input is negative and reduce he weights for connections where the input is positive. ✍ The major challenge with the weight-update rule is that it is difficult to calculate the error for neurons earlier layers in deep neural networks. ✍ The standard way to train a neural network is to use the backpropagation algorithm, a supervised learning algorithm illustrated in figure 6. Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 25. 24 The Backpropagation Algorithm x1 x2 x3 x4 h h h h h Σ Output Hidden layer Input layer Output layer Back propagation Figure 6 : Backpropagation Simplified Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 26. 25 Backpropagation in Action ✍ The algorithm propagates the error resulting from the training of each instance back to the network starting from the output layer as shown in figure 6. ✍ The main steps of the algorithm are as follows: ❶ Calculate the error for the neurons in the output layer and use the weight-update rule to adjust the weights down the network ❷ Share the error calculated in a neuron with the neurons in the preceding layer ❸ Work back through the layers repeating steps 1 and 2 above ✍ The idea is to reduce, not eliminate error to avoid overfitting, and allow the network to generalize to new instances that are not in the data set; Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 27. 26 Deep Learning ✍ Deep Learning networks are neural networks with with more than one hidden layers. ✍ Figure 7 below has 3 hidden layers of 5 neurons each; and 5 layers overall; ✍ You don’t need to have the same number of neurons in each layer as shown in figure 7, the input layer has 3 neurons, the hidden layer has 5 neurons in 3 hidden layers and 3 neurons in the output layer. ✍ Figure 7 is also a feed-forward network since it has no loops. ✍ Visit Kaggle3 for a comprehensive deep learning cheat sheet. ✍ Table 7 gives a summary of some of the common deep learning networks and their applications. 3https://www.kaggle.com/getting-started/151100 Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 28. 27 Deep Neural Networks Figure 7 : Deep Neural Network Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 29. 28 Common Deep Learning Networks Network Key Features Applications Convolutional Neural Networks (CNN) one or more convolutional layers;one or more fully con- nected layers image recogni- tion;classification Reccurent neural net- work (RNN) connections between units have a directed cycle Time series predic- tion; text generation Long short-term memory (LSTM) type of RNN; remember longer than RNN Time series predic- tion; text generation Deep Belief Network (DBN) has connections between layers but not within layer unsupervised learn- ing tasks to reduce the dimensionality of features Self-Organising Maps (SOM) convert input data to low dimensional space Visualization Table 7 : Deep Learning Networks Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 30. 29 Decision Trees ✍ Linear regression and neural networks work best with numeric inputs, not with nominal or ordinal data. ✍ Decision trees work well with nominal and ordinal data types ✍ Figure 8 shows a decision tree for deciding whether an email is a spam or not. The rounded rectangles are attributes. ✍ A decision tree encodes a set of if then, else rules in a tree structure ✍ Each path in a decision tree, from root to leaf, defines a classification rule ✍ The Iterative Dichotomiser 3 (ID3)4 algorithm is considered the father of all decision tree algorithms; ✍ Decision trees are very sensitive to noise in the data set. It is recommended to keep them shallow. ✍ A random forest model is made up of a set of decision trees. 4https://en.wikipedia.org/wiki/ID3_algorithm Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 31. 30 Decision Tree Example Figure 8 : A sample decision tree Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 32. 31 The ID3 Algorithm ✍ The ID3 algorithm recursively builds the decision tree in a depth-first manner adding one node at a time starting with the root node; ✍ The ID3 chooses the attribute to test at each node in the tree so as to minimize the number of sets that have the same value as the target attribute (pure sets); ✍ The entropy metric can be used to measure the purity of a set. ✍ The ID3 selects the that results in the lowest weighted entropy attribute to test a node after splitting data set at the node using this attribute. ✍ To calculate the weighted entropy of a node: ❶ split the data set using the attribute; ❷ calculate the entropy of the resulting sets; ❸ weight each entropy by the fraction of data in the set; ❹ sum-up the results ✍ Decision trees are easy to understand Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 33. 32 Bias in Data Science ✍ The major objective of ML is to create models that encode appropriate generalizations from the data; ✍ Two key factors determine the quality of an ML model: ❶ The data set the algorithm is run on. If the data set is not a true reflection of real-life events, the model will not be accurate. This is referred to a sampling bias ❷ The choice of ML algorithm. ML algorithms use learning bias or modelling/selection bias to generalize from a data set. A wrong choice of algorithm will result in an incorrect learning bias. ✍ While sampling bias is bad, without learning bias there can be no learning and the algorithm will only memorize the data ✍ There is no best ML algorithm so the Modelling Phase of CRISP-DM process involves building multiple models using different algorithms and choosing the best in terms of accuracy, generalization and other performance metrics. Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 34. 33 “The golden rule for evaluating models is that models should never be tested on the same data they were trained on ” Kelleher & Tierney, (2018) Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 35. 34 Evaluating Models ✍ After generating models, the next step is to create a test plan; ✍ A model that simply memorizes training data will not perform well on test and any other previously unseen data. ✍ The normal practice is to split the data into 3 sets: training data; testing data and validation data; ✍ The other important aspect of a test plan is choosing the appropriate evaluation metrics to use during testing; ✍ Table 8 below shows some of the most commonly used metrics and their applications. Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 36. 35 Common Evaluation Metrics Metric Applications Equation Mean Absolute Error (MAE) Regressions MAE = Pd i=1 |xi − yi | Root Mean Squared Error (RMSE) Regressions RMSE = q 1 n Pn i=1(ŷi − yi )2 Recall/AUC Classification Recall = TP TP+FN Precision Classification Precision = TP TP+FP Accuracy Classification Accuracy = TP+TN TP+TN+FP+FN Table 8 : Common Evaluation Metrics Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 37. 36 Conclusion ✓ Chapter 4 started by asserting the partnership between a data scientist and a computer; ✓ The computer generates a model from a data set prepared by the data scientist. ✓ The data scientist interprets and evaluates the model; ✓ Model evaluation follows the golden rule ✓ The best model is chosen based on its accuracy, but in future, data-usage and privacy may affect model selection; ✓ Chapter 5 discusses converting a business problem to a data science problem and Chapter 6 will discuss the impact of Privacy Laws on data science. Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4
  • 38. 37 References and Bibliography Kelleher, J.D.& Tierney, B. - 2018 - Data Science, MIT Press. pp. 101–150. Mpumelelo Ndlovu | Data Science: Machine Learning 101 | Kelleher & Tierney (2018) | Chapter 4