SlideShare a Scribd company logo
1 of 100
Download to read offline
Machine Learning Overview
Huawei Confidential
1
Foreword
๏ฌ Machine learning is a core research field of AI, and it is also a necessary
knowledge for deep learning. Therefore, this chapter mainly introduces the
main concepts of machine learning, the classification of machine learning,
the overall process of machine learning, and the common algorithms of
machine learning.
Huawei Confidential
2
Objectives
Upon completion of this course, you will be able to:
๏ฐ Master the learning algorithm definition and machine learning process.
๏ฐ Know common machine learning algorithms.
๏ฐ Understand concepts such as hyperparameters, gradient descent, and cross
validation.
Huawei Confidential
3
Contents
1. Machine Learning Definition
2. Machine Learning Types
3. Machine Learning Process
4. Other Key Machine Learning Methods
5. Common Machine Learning Algorithms
6. Case Study
Huawei Confidential
4
Machine Learning Algorithms (1)
๏ฌ Machine learning (including deep learning) is a study of learning algorithms. A
computer program is said to learn from experience ๐ธ with respect to some class
of tasks ๐‘‡ and performance measure ๐‘ƒ if its performance at tasks in ๐‘‡, as
measured by ๐‘ƒ, improves with experience ๐ธ.
Data
(Experience E)
Learning
algorithms
(Task T)
Basic
understanding
(Measure P)
Huawei Confidential
5
Machine Learning Algorithms (2)
Experience
Regularity
Induction
Prediction
Input
New
problems
Future
Historical
data
Model
Training
Prediction
Input
New
data
Future
attributes
Huawei Confidential
6
Created by: Jim Liang
Differences Between Machine Learning Algorithms and
Traditional Rule-Based Algorithms
โ€ข Samples are used for training.
โ€ข The decision-making rules are complex or
difficult to describe.
โ€ข Rules are automatically learned by machines.
Rule-based algorithms Machine learning
Training
data
Machine
learning
Model Prediction
New data
โ€ข Explicit programming is used to solve problems.
โ€ข Rules can be manually specified.
Huawei Confidential
7
Application Scenarios of Machine Learning (1)
๏ฌ The solution to a problem is complex, or the problem may involve a large
amount of data without a clear data distribution function.
๏ฌ Machine learning can be used in the following scenarios:
Rules are complex or
cannot be described, such
as facial recognition and
voice recognition.
Task rules change over time.
For example, in the part-of-
speech tagging task, new
words or meanings are
generated at any time.
Data distribution changes
over time, requiring constant
readaptation of programs,
such as predicting the trend of
commodity sales.
Huawei Confidential
8
Application Scenarios of Machine Learning (2)
Manual rules
Simple problems
Rule-based
algorithms
Small Large
Simple
Complex
Scale of the problem
Rule
complexity
Machine learning
algorithms
Huawei Confidential
9
Rational Understanding of Machine Learning Algorithms
๏ฌ Target function f is unknown. Learning algorithms cannot obtain a perfect function f.
๏ฌ Assume that hypothesis function g approximates function f, but may be different from
function f.
Target equation
๐‘“: ๐‘‹ โ†’ ๐‘Œ
Learning algorithms
Training data
๐ท: {(๐‘ฅ1, ๐‘ฆ1) โ‹ฏ , (๐‘ฅ๐‘›, ๐‘ฆ๐‘›)}
Hypothesis function
๐‘” โ‰ˆ ๐‘“
Ideal
Actual
Huawei Confidential
10
Main Problems Solved by Machine Learning
๏ฌ Machine learning can deal with many types of tasks. The following describes the most typical and common
types of tasks.
๏ฐ Classification: A computer program needs to specify which of the k categories some input belongs to. To accomplish this
task, learning algorithms usually output a function ๐‘“: ๐‘…๐‘›
โ†’ (1,2, โ€ฆ , ๐‘˜). For example, the image classification algorithm in
computer vision is developed to handle classification tasks.
๏ฐ Regression: For this type of task, a computer program predicts the output for the given input. Learning algorithms
typically output a function ๐‘“: ๐‘…๐‘›
โ†’ ๐‘…. An example of this task type is to predict the claim amount of an insured person (to
set the insurance premium) or predict the security price.
๏ฐ Clustering: A large amount of data from an unlabeled dataset is divided into multiple categories according to internal
similarity of the data. Data in the same category is more similar than that in different categories. This feature can be
used in scenarios such as image retrieval and user profile management.
๏ฌ Classification and regression are two main types of prediction, accounting from 80% to 90%. The output of
classification is discrete category values, and the output of regression is continuous numbers.
Huawei Confidential
11
Contents
1. Machine Learning Definition
2. Machine Learning Types
3. Machine Learning Process
4. Other Key Machine Learning Methods
5. Common Machine Learning Algorithms
6. Case study
Huawei Confidential
12
Machine Learning Classification
๏ฌ Supervised learning: Obtain an optimal model with required performance through training and learning
based on the samples of known categories. Then, use the model to map all inputs to outputs and check the
output for the purpose of classifying unknown data.
๏ฌ Unsupervised learning: For unlabeled samples, the learning algorithms directly model the input datasets.
Clustering is a common form of unsupervised learning. We only need to put highly similar samples together,
calculate the similarity between new samples and existing ones, and classify them by similarity.
๏ฌ Semi-supervised learning: In one task, a machine learning model that automatically uses a large amount of
unlabeled data to assist learning directly of a small amount of labeled data.
๏ฌ Reinforcement learning: It is an area of machine learning concerned with how agents ought to take actions
in an environment to maximize some notion of cumulative reward. The difference between reinforcement
learning and supervised learning is the teacher signal. The reinforcement signal provided by the environment
in reinforcement learning is used to evaluate the action (scalar signal) rather than telling the learning system
how to perform correct actions.
Huawei Confidential
13
Supervised Learning
Feature 1
Feature 1
Feature 1
Feature n
Feature n
Feature n
...
...
...
Goal
Goal
Goal
Data feature Label
Supervised learning
algorithm
Weather Temperature
Wind
Speed
Sunny Warm Strong
Rainy Cold Fair
Sunny Cold Weak
Enjoy
Sports
Yes
No
Yes
Huawei Confidential
14
Supervised Learning - Regression Questions
๏ฌ Regression: reflects the features of attribute values of samples in a sample dataset. The
dependency between attribute values is discovered by expressing the relationship of
sample mapping through functions.
๏ฐ How much will I benefit from the stock next week?
๏ฐ What's the temperature on Tuesday?
Huawei Confidential
15
Supervised Learning - Classification Questions
๏ฌ Classification: maps samples in a sample dataset to a specified category by
using a classification model.
๏ฐ Will there be a traffic jam on XX road during
the morning rush hour tomorrow?
๏ฐ Which method is more attractive to customers:
5 yuan voucher or 25% off?
Huawei Confidential
16
Unsupervised Learning
Feature 1
Feature 1
Feature 1
Feature n
Feature n
Feature n
...
...
...
Data Feature
Unsupervised
learning algorithm
Monthly
Consumption
Commodity
Consumption
Time
1000โ€“2000
Badminton
racket
6:00โ€“12:00
500โ€“1000 Basketball 18:00โ€“24:00
1000โ€“2000 Game console 00:00โ€“6:00
Internal
similarity
Category
Cluster 1
Cluster 2
Huawei Confidential
17
Unsupervised Learning - Clustering Questions
๏ฌ Clustering: classifies samples in a sample dataset into several categories based
on the clustering model. The similarity of samples belonging to the same
category is high.
๏ฐ Which audiences like to watch movies
of the same subject?
๏ฐ Which of these components are
damaged in a similar way?
Huawei Confidential
18
Semi-Supervised Learning
Feature 1
Feature 1
Feature 1
Feature n
Feature n
Feature n
...
...
...
Goal
Unknown
Unknown
Data Feature Label
Semi-supervised
learning algorithms
Weather Temperature
Wind
Speed
Sunny Warm Strong
Rainy Cold Fair
Sunny Cold Weak
Enjoy
Sports
Yes
/
/
Huawei Confidential
19
Reinforcement Learning
๏ฌ The model perceives the environment, takes actions, and makes adjustments
and choices based on the status and award or punishment.
Model
Environment
๐‘Ÿ๐‘ก+1
๐‘ ๐‘ก+1
Award or
punishment ๐‘Ÿ๐‘ก
Status ๐‘ ๐‘ก Action ๐‘Ž๐‘ก
Huawei Confidential
20
Reinforcement Learning - Best Behavior
๏ฌ Reinforcement learning: always looks for best behaviors. Reinforcement learning
is targeted at machines or robots.
๏ฐ Autopilot: Should it brake or accelerate when the yellow light starts to flash?
๏ฐ Cleaning robot: Should it keep working or go back for charging?
Huawei Confidential
21
Contents
1. Machine learning algorithm
2. Machine Learning Classification
3. Machine Learning Process
4. Other Key Machine Learning Methods
5. Common Machine Learning Algorithms
6. Case study
Huawei Confidential
22
Machine Learning Process
Data
collection
Data
cleansing
Model
training
Model
evaluation
Feature
extraction and
selection
Model
deployment
and integration
Feedback and iteration
Huawei Confidential
23
Basic Machine Learning Concept โ€” Dataset
๏ฌ Dataset: a collection of data used in machine learning tasks. Each data record is
called a sample. Events or attributes that reflect the performance or nature of a
sample in a particular aspect are called features.
๏ฌ Training set: a dataset used in the training process, where each sample is
referred to as a training sample. The process of creating a model from data is
called learning (training).
๏ฌ Test set: Testing refers to the process of using the model obtained after learning
for prediction. The dataset used is called a test set, and each sample is called a
test sample.
Huawei Confidential
24
Checking Data Overview
๏ฌ Typical dataset form
No. Area School Districts Direction House Price
1 100 8 South 1000
2 120 9 Southwest 1300
3 60 6 North 700
4 80 9 Southeast 1100
5 95 3 South 850
Feature 2
Feature 1 Feature 3 Label
Training
set
Test set
Huawei Confidential
25
Data
preprocessing
Importance of Data Processing
๏ฌ Data is crucial to models. It is the ceiling of model capabilities. Without good
data, there is no good model.
Fill in missing values,
and detect and
eliminate causes of
dataset exceptions.
Data cleansing
Simplify data
attributes to avoid
dimension explosion.
Data dimension
reduction
Normalize data to
reduce noise and
improve model accuracy.
Data
normalization
Huawei Confidential
26
Workload of Data Cleansing
๏ฌ Statistics on data scientists' work in machine learning
3% Remodeling training datasets
9% Mining modes from data
19% Collecting datasets
60% Cleansing and sorting data
CrowdFlower Data Science Report 2016
5% Others
4% Optimizing models
Huawei Confidential
27
Data Cleansing
๏ฌ Most machine learning models process features, which are usually numeric
representations of input variables that can be used in the model.
๏ฌ In most cases, the collected data can be used by algorithms only after being
preprocessed. The preprocessing operations include the following:
๏ฐ Data filtering
๏ฐ Processing of lost data
๏ฐ Processing of possible exceptions, errors, or abnormal values
๏ฐ Combination of data from multiple data sources
๏ฐ Data consolidation
Huawei Confidential
28
Dirty Data (1)
๏ฌ Generally, real data may have some quality problems.
๏ฐ Incompleteness: contains missing values or the data that lacks attributes
๏ฐ Noise: contains incorrect records or exceptions.
๏ฐ Inconsistency: contains inconsistent records.
Huawei Confidential
29
Dirty Data (2)
# Id Name Birthday Gender
IsTe
ach
er
#Stu
dent
s
Country City
1 111 John 31/12/1990 M 0 0 Ireland Dublin
2 222 Mery 15/10/1978 F 1 15 Iceland
3 333 Alice 19/04/2000 F 0 0 Spain
Madri
d
4 444 Mark 01/11/1997 M 0 0 France Paris
5 555 Alex 15/03/2000 A 1 23 Germany Berlin
6 555 Peter 1983-12-01 M 1 10 Italy Rome
7 777 Calvin 05/05/1995 M 0 0 Italy Italy
8 888 Roxane 03/08/1948 F 0 0 Portugal Lisbon
9 999 Anne 05/09/1992 F 0 5
Switzerlan
d
Genev
a
10 101010 Paul 14/11/1992 M 1 26 Ytali Rome
Invalid duplicate item
Incorrect format Attribute dependency
Misspelling
Value that
should be in
another column
Missing value
Invalid value
Huawei Confidential
30
Data Conversion
๏ฌ After being preprocessed, the data needs to be converted into a representation form suitable for
the machine learning model. Common data conversion forms include the following:
๏ฐ With respect to classification, category data is encoded into a corresponding numerical representation.
๏ฐ Value data is converted to category data to reduce the value of variables (for age segmentation).
๏ฐ Other data
๏ฎ In the text, the word is converted into a word vector through word embedding (generally using the word2vec model,
BERT model, etc).
๏ฎ Process image data (color space, grayscale, geometric change, Haar feature, and image enhancement)
๏ฐ Feature engineering
๏ฎ Normalize features to ensure the same value ranges for input variables of the same model.
๏ฎ Feature expansion: Combine or convert existing variables to generate new features, such as the average.
Huawei Confidential
31
Necessity of Feature Selection
๏ฌ Generally, a dataset has many features, some of which may be redundant or
irrelevant to the value to be predicted.
๏ฌ Feature selection is necessary in the following aspects:
Simplify
models to
make them
easy for users
to interpret
Reduce the
training time
Avoid
dimension
explosion
Improve
model
generalization
and avoid
overfitting
Huawei Confidential
32
Feature Selection Methods - Filter
๏ฌ Filter methods are independent of the model during feature selection.
Traverse all
features
Select the
optimal
feature subset
Train
models
Evaluate the
performance
Procedure of a filter method
By evaluating the correlation between each
feature and the target attribute, these methods
use a statistical measure to assign a value to
each feature. Features are then sorted by score,
which is helpful for preserving or eliminating
specific features.
Common methods
โ€ข Pearson correlation coefficient
โ€ข Chi-square coefficient
โ€ข Mutual information
Limitations
โ€ข The filter method tends to select redundant
variables as the relationship between features
is not considered.
Huawei Confidential
33
Feature Selection Methods - Wrapper
๏ฌ Wrapper methods use a prediction model to score feature subsets.
Procedure of a
wrapper method
Wrapper methods consider feature selection as a
search issue for which different combinations are
evaluated and compared. A predictive model is
used to evaluate a combination of features and
assign a score based on model accuracy.
Common methods
โ€ข Recursive feature elimination (RFE)
Limitations
โ€ข Wrapper methods train a new model for each
subset, resulting in a huge number of
computations.
โ€ข A feature set with the best performance is
usually provided for a specific type of model.
Traverse all
features
Generate
a feature
subset
Train
models
Evaluate
models
Select the optimal
feature subset
Huawei Confidential
34
Feature Selection Methods - Embedded
๏ฌ Embedded methods consider feature selection as a part of model construction.
The most common type of embedded feature
selection method is the regularization method.
Regularization methods are also called penalization
methods that introduce additional constraints into
the optimization of a predictive algorithm that bias
the model toward lower complexity and reduce the
number of features.
Common methods
โ€ข Lasso regression
โ€ข Ridge regression
Procedure of an embedded method
Traverse all
features
Generate a
feature subset
Train models
+ Evaluate the
effect
Select the optimal feature subset
Huawei Confidential
35
Overall Procedure of Building a Model
Model Building Procedure
Data splitting:
Divide data into training
sets, test sets, and
validation sets.
Model training:
Use data that has been
cleaned up and feature
engineering to train a model.
Model verification:
Use validation sets to
validate the model
validity.
Model fine-tuning:
Continuously tune the
model based on the
actual data of a service
scenario.
Model deployment:
Deploy the model in
an actual
production scenario.
Model test:
Use test data to evaluate
the generalization
capability of the model in
a real environment.
1 2 3
4
5
6
Huawei Confidential
36
Examples of Supervised Learning - Learning Phase
๏ฌ Use the classification model to predict whether a person is a basketball player.
Task: Use a classification model to predict
whether a person is a basketball player
under a specific feature.
Splitting
(Cleansed features and tags)
Model
training
Name City Age Label
Mike Miami 42 yes
Jerry New York 32 no
Bryan Orlando 18 no
Patricia Miami 45 yes
Elodie Phoenix 35 no
Remy Chicago 72 yes
John New York 48 yes
Target
Feature
(attribute)
Training set
The model searches
for the relationship
between features
and targets.
Test set
Use new data to
verify the model
validity.
Each feature or a combination of several features can
provide a basis for a model to make a judgment.
Service
data
Huawei Confidential
37
Unknown data
Recent data, it is not
known whether the
people are basketball
players.
Possibility prediction
Apply the model to the
new data to predict
whether the customer
will change the supplier.
Application
model
Examples of Supervised Learning - Prediction Phase
New
data
New
data
Prediction
data
IF city = Miami โ†’ Probability = +0.7
IF city= Orlando โ†’ Probability = +0.2
IF age > 42 โ†’ Probability = +0.05*age + 0.06
IF age โ‰ค 42 โ†’ Probability = +0.01*age + 0.02
Name City Age Label
Marine Miami 45 ?
Julien Miami 52 ?
Fred Orlando 20 ?
Michelle Boston 34 ?
Nicolas Phoenix 90 ?
Name City Age Prediction
Marine Miami 45 0.3
Julien Miami 52 0.9
Fred Orlando 20 0.6
Michelle Boston 34 0.5
Nicolas Phoenix 90 0.4
Huawei Confidential
38
What Is a Good Model?
โ€ข Generalization capability
Can it accurately predict the actual service data?
โ€ข Interpretability
Is the prediction result easy to interpret?
โ€ข Prediction speed
How long does it take to predict each piece of data?
โ€ข Practicability
Is the prediction rate still acceptable when the
service volume increases with a huge data volume?
Huawei Confidential
39
Model Validity (1)
๏ฌ Generalization capability: The goal of machine learning is that the model obtained after learning
should perform well on new samples, not just on samples used for training. The capability of
applying a model to new samples is called generalization or robustness.
๏ฌ Error: difference between the sample result predicted by the model obtained after learning and
the actual sample result.
๏ฐ Training error: error that you get when you run the model on the training data.
๏ฐ Generalization error: error that you get when you run the model on new samples. Obviously, we prefer a
model with a smaller generalization error.
๏ฌ Underfitting: occurs when the model or the algorithm does not fit the data well enough.
๏ฌ Overfitting: occurs when the training error of the model obtained after learning is small but the
generalization error is large (poor generalization capability).
Huawei Confidential
40
Model Validity (2)
๏ฌ Model capacity: model's capability of fitting functions, which is also called model complexity.
๏ฐ When the capacity suits the task complexity and the amount of training data provided, the algorithm
effect is usually optimal.
๏ฐ Models with insufficient capacity cannot solve complex tasks and underfitting may occur.
๏ฐ A high-capacity model can solve complex tasks, but overfitting may occur if the capacity is higher than
that required by a task.
Underfitting
Not all features are learned.
Good fitting Overfitting
Noises are learned.
Huawei Confidential
41
Overfitting Cause โ€” Error
๏ฌ Total error of final prediction = Bias2 + Variance + Irreducible error
๏ฌ Generally, the prediction error can be divided into two types:
๏ฐ Error caused by "bias"
๏ฐ Error caused by "variance"
๏ฌ Variance:
๏ฐ Offset of the prediction result from the average value
๏ฐ Error caused by the model's sensitivity to small fluctuations
in the training set
๏ฌ Bias:
๏ฐ Difference between the expected (or average) prediction value and the
correct value we are trying to predict.
Variance
Bias
Huawei Confidential
42
Variance and Bias
๏ฌ Combinations of variance and bias are as
follows:
๏ฐ Low bias & low variance โ€“> Good model
๏ฐ Low bias & high variance
๏ฐ High bias & low variance
๏ฐ High bias & high variance โ€“> Poor model
๏ฌ Ideally, we want a model that can accurately
capture the rules in the training data and
summarize the invisible data (new data).
However, it is usually impossible for the model
to complete both tasks at the same time.
Huawei Confidential
43
High bias &
low variance
Low bias &
high variance
Model Complexity and Error
๏ฌ As the model complexity increases, the training error decreases.
๏ฌ As the model complexity increases, the test error decreases to a certain point
and then increases in the reverse direction, forming a convex curve.
Testing error
Training error
Error
Model Complexity
Huawei Confidential
44
Machine Learning Performance Evaluation - Regression
๏ฌ The closer the Mean Absolute Error (MAE) is to 0, the better the model can fit the training data.
๐‘€๐ด๐ธ =
1
m
๐‘–=1
๐‘š
๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘–
๏ฌ Mean Square Error (MSE)
๐‘€๐‘†๐ธ =
1
m
๐‘–=1
m
๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘–
2
๏ฌ The value range of R2
is (โ€“โˆž, 1]. A larger value indicates that the model can better fit the training
data. TSS indicates the difference between samples. RSS indicates the difference between the
predicted value and sample value.
๐‘…2 = 1 โˆ’
๐‘…๐‘†๐‘†
๐‘‡๐‘†๐‘†
= 1 โˆ’
๐‘–=1
๐‘š
๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘–
2
๐‘–=1
๐‘š
๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘–
2
Huawei Confidential
45
Machine Learning Performance Evaluation - Classification (1)
๏ฌ Terms and definitions:
๏ฐ ๐‘ƒ: positive, indicating the number of real positive cases
in the data.
๏ฐ ๐‘: negative, indicating the number of real negative cases
in the data.
๏ฐ ๐‘‡P : true positive, indicating the number of positive cases that are correctly
classified by the classifier.
๏ฐ ๐‘‡๐‘: true negative, indicating the number of negative cases that are correctly classified by the classifier.
๏ฐ ๐น๐‘ƒ: false positive, indicating the number of positive cases that are incorrectly classified by the classifier.
๏ฐ ๐น๐‘: false negative, indicating the number of negative cases that are incorrectly classified by the classifier.
๏ฌ Confusion matrix: at least an ๐‘š ร— ๐‘š table. ๐ถ๐‘€๐‘–,๐‘— of the first ๐‘š rows and ๐‘š columns indicates the number of
cases that actually belong to class ๐‘– but are classified into class ๐‘— by the classifier.
๏ฐ Ideally, for a high accuracy classifier, most prediction values should be located in the diagonal from ๐ถ๐‘€1,1 to ๐ถ๐‘€๐‘š,๐‘š of
the table while values outside the diagonal are 0 or close to 0. That is, ๐น๐‘ƒ and ๐น๐‘ƒ are close to 0.
Confusion matrix
Estimated
amount
Actual amount
yes no Total
yes ๐‘‡๐‘ƒ ๐น๐‘ ๐‘ƒ
no ๐น๐‘ƒ ๐‘‡๐‘ ๐‘
Total ๐‘ƒโ€ฒ
๐‘โ€ฒ
๐‘ƒ + ๐‘
Huawei Confidential
46
Machine Learning Performance Evaluation - Classification (2)
Measurement Ratio
Accuracy and recognition rate
๐‘‡๐‘ƒ + ๐‘‡๐‘
๐‘ƒ + ๐‘
Error rate and misclassification rate
๐น๐‘ƒ + ๐น๐‘
๐‘ƒ + ๐‘
Sensitivity, true positive rate, and
recall
๐‘‡๐‘ƒ
๐‘ƒ
Specificity and true negative rate
๐‘‡๐‘
๐‘
Precision
๐‘‡๐‘ƒ
๐‘‡๐‘ƒ + ๐น๐‘ƒ
๐น1, harmonic mean of the recall rate
and precision
2 ร— ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› ร— ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™
๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› + ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™
๐น๐›ฝ, where ๐›ฝ is a non-negative real
number
(1 + ๐›ฝ2
) ร— ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› ร— ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™
๐›ฝ2 ร— ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› + ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™
Huawei Confidential
47
Example of Machine Learning Performance Evaluation
๏ฌ We have trained a machine learning model to identify whether the object in an image is
a cat. Now we use 200 pictures to verify the model performance. Among the 200
images, objects in 170 images are cats, while others are not. The identification result of
the model is that objects in 160 images are cats, while others are not.
Precision: ๐‘ƒ =
๐‘‡๐‘ƒ
๐‘‡๐‘ƒ+๐น๐‘ƒ
=
140
140+20
= 87.5%
Recall: ๐‘… =
๐‘‡๐‘ƒ
๐‘ƒ
=
140
170
= 82.4%
Accuracy: ๐ด๐ถ๐ถ =
๐‘‡๐‘ƒ+๐‘‡๐‘
๐‘ƒ+๐‘
=
140+10
170+30
= 75%
Estimated
amount
Actual
amount
๐’š๐’†๐’” ๐’๐’ Total
๐‘ฆ๐‘’๐‘  140 30 170
๐‘›๐‘œ 20 10 30
Total 160 40 200
Huawei Confidential
48
Contents
1. Machine Learning Definition
2. Machine Learning Types
3. Machine Learning Process
4. Other Key Machine Learning Methods
5. Common Machine Learning Algorithms
6. Case study
Huawei Confidential
49
Machine Learning Training Method - Gradient Descent (1)
๏ฌ The gradient descent method uses the negative gradient
direction of the current position as the search direction,
which is the steepest direction. The formula is as follows:
๏ฌ In the formula, ๐œ‚ indicates the learning rate and ๐‘–
indicates the data record number ๐‘– . The weight
parameter w indicates the change in each iteration.
๏ฌ Convergence: The value of the objective function
changes very little, or the maximum number of
iterations is reached.
1 ( )
k
i
k k w
w w f x
๏จ
๏€ซ ๏€ฝ ๏€ญ ๏ƒ‘
Cost surface
Huawei Confidential
50
Machine Learning Training Method - Gradient Descent (2)
๏ฌ Batch Gradient Descent (BGD) uses the samples (m in total) in all datasets to
update the weight parameter based on the gradient value at the current point.
๏ฌ Stochastic Gradient Descent (SGD) randomly selects a sample in a dataset to
update the weight parameter based on the gradient value at the current point.
๏ฌ Mini-Batch Gradient Descent (MBGD) combines the features of BGD and SGD
and selects the gradients of n samples in a dataset to update the weight
parameter.
1
1
1
( )
๏จ
๏€ซ
๏€ฝ
๏€ฝ ๏€ญ ๏ƒ‘
๏ƒฅ k
m
i
k k w
i
w w f x
m
1 ( )
k
i
k k w
w w f x
๏จ
๏€ซ ๏€ฝ ๏€ญ ๏ƒ‘
1
1
t
1
( )
k
t n
i
k k w
i
w w f x
n
๏จ
๏€ซ ๏€ญ
๏€ซ
๏€ฝ
๏€ฝ ๏€ญ ๏ƒ‘
๏ƒฅ
Huawei Confidential
51
Machine Learning Training Method - Gradient Descent (3)
๏ฌ Comparison of three gradient descent methods
๏ฐ In the SGD, samples selected for each training are stochastic. Such instability causes the loss function to
be unstable or even causes reverse displacement when the loss function decreases to the lowest point.
๏ฐ BGD has the highest stability but consumes too many computing resources. MBGD is a method that
balances SGD and BGD.
BGD
Uses all training samples for training each time.
SGD
Uses one training sample for training each time.
MBGD
Uses a certain number of training samples for
training each time.
Huawei Confidential
52
Parameters and Hyperparameters in Models
๏ฌ The model contains not only parameters but also hyperparameters. The purpose
is to enable the model to learn the optimal parameters.
๏ฐ Parameters are automatically learned by models.
๏ฐ Hyperparameters are manually set.
Training
Use
hyperparameters to
control training.
Model
Model parameters are
"distilled" from data.
Huawei Confidential
53
Hyperparameters of a Model
โ€ข Often used in model parameter
estimation processes.
โ€ข Often specified by the practitioner.
โ€ข Can often be set using heuristics.
โ€ข Often tuned for a given predictive
modeling problem.
Model hyperparameters are
external configurations of
models.
โ€ข ฮป during Lasso/Ridge regression
โ€ข Learning rate for training a
neural network, number of
iterations, batch size, activation
function, and number of neurons
โ€ข ๐ถ and ๐œŽ in support vector
machines (SVM)
โ€ข K in k-nearest neighbor (KNN)
โ€ข Number of trees in a random
forest
Common model
hyperparameters
Huawei Confidential
54
Hyperparameter Search Procedure and Method
Procedure for
searching
hyperparameters
1. Dividing a dataset into a training set, validation set, and test set.
2. Optimizing the model parameters using the training set based on the model
performance indicators.
3. Searching for the model hyper-parameters using the validation set based on the model
performance indicators.
4. Perform step 2 and step 3 alternately. Finally, determine the model parameters and
hyperparameters and assess the model using the test set.
Search algorithm
(step 3)
โ€ข Grid search
โ€ข Random search
โ€ข Heuristic intelligent search
โ€ข Bayesian search
Huawei Confidential
55
Hyperparameter Searching Method - Grid Search
๏ฌ Grid search attempts to exhaustively search all possible
hyperparameter combinations to form a hyperparameter
value grid.
๏ฌ In practice, the range of hyperparameter values to search is
specified manually.
๏ฌ Grid search is an expensive and time-consuming method.
๏ฐ This method works well when the number of hyperparameters
is relatively small. Therefore, it is applicable to generally
machine learning algorithms but inapplicable to neural
networks
(see the deep learning part).
Hyperparameter
1
0 1
Hyperparameter 2
5
1
2
3
4
5
Grid search
2 3 4
Huawei Confidential
56
Hyperparameter Searching Method - Random Search
๏ฌ When the hyperparameter search space is large, random
search is better than grid search.
๏ฌ In random search, each setting is sampled from the
distribution of possible parameter values, in an attempt
to find the best subset of hyperparameters.
๏ฌ Note:
๏ฐ Search is performed within a coarse range, which then will
be narrowed based on where the best result appears.
๏ฐ Some hyperparameters are more important than others, and
the search deviation will be affected during random search.
Random search
Parameter
1
Parameter 2
Huawei Confidential
57
Cross Validation (1)
๏ฌ Cross validation: It is a statistical analysis method used to validate the performance of a
classifier. The basic idea is to divide the original dataset into two parts: training set and validation
set. Train the classifier using the training set and test the model using the validation set to check
the classifier performance.
๏ฌ k-fold cross validation (๐‘ฒ โˆ’ ๐‘ช๐‘ฝ):
๏ฐ Divide the raw data into ๐‘˜ groups (generally, evenly divided).
๏ฐ Use each subset as a validation set, and use the other ๐‘˜ โˆ’ 1 subsets as the training set. A total of ๐‘˜ models
can be obtained.
๏ฐ Use the mean classification accuracy of the final validation sets of ๐‘˜ models as the performance indicator
of the ๐พ โˆ’ ๐ถ๐‘‰ classifier.
Huawei Confidential
58
Cross Validation (2)
๏ฌ Note: The K value in K-fold cross validation is also a hyperparameter.
Entire dataset
Training set Test set
Test set
Training set Validation set
Huawei Confidential
59
Contents
1. Machine Learning Definition
2. Machine Learning Types
3. Machine Learning Process
4. Other Key Machine Learning Methods
5. Common Machine Learning Algorithms
6. Case study
Huawei Confidential
60
Machine Learning Algorithm Overview
Supervised learning Unsupervised learning
Clustering
Classification Regression
Machine learning
Others
SVM
Naive Bayes
KNN
Linear regression
Decision tree
Neural network
K-means
Hierarchical
clustering
Neural network
Logistic regression
Decision tree
Correlation rule
Random forest Random forest
Density-based
clustering
GBDT GBDT
SVM
Principal component
analysis (PCA)
Gaussian mixture
model (GMM)
Huawei Confidential
61
Linear Regression (1)
๏ฌ Linear regression: a statistical analysis method to determine the quantitative
relationships between two or more variables through regression analysis in
mathematical statistics.
๏ฌ Linear regression is a type of supervised learning.
Unary linear regression Multi-dimensional linear regression
Huawei Confidential
62
Linear Regression (2)
๏ฌ The model function of linear regression is as follows, where ๐‘ค indicates the weight parameter, ๐‘ indicates the
bias, and ๐‘ฅ indicates the sample attribute.
๏ฌ The relationship between the value predicted by the model and actual value is as follows, where ๐‘ฆ indicates
the actual value, and ๐œ€ indicates the error.
๏ฌ The error ๐œ€ is influenced by many factors independently. According to the central limit theorem, the error ๐œ€
follows normal distribution. According to the normal distribution function and maximum likelihood
estimation, the loss function of linear regression is as follows:
๏ฌ To make the predicted value close to the actual value, we need to minimize the loss value. We can use the
gradient descent method to calculate the weight parameter ๐‘ค when the loss function reaches the minimum,
and then complete model building.
( ) T
w
h x w x b
๏€ฝ ๏€ซ
๏ฅ
๏€ฝ ๏€ซ ๏€ซ
T
y w x b
๏€จ ๏€ฉ
2
1
( ) ( )
2
w
J w h x y
m
๏€ฝ ๏€ญ
๏ƒฅ
Huawei Confidential
63
Linear Regression Extension - Polynomial Regression
๏ฌ Polynomial regression is an extension of linear regression. Generally, the complexity of
a dataset exceeds the possibility of fitting by a straight line. That is, obvious underfitting
occurs if the original linear regression model is used. The solution is to use polynomial
regression.
Comparison between linear regression
and polynomial regression
2
1 2
( ) n
w n
h x w x w x w x b
๏€ฝ ๏€ซ ๏€ซ ๏€ซ ๏€ซ
L
๏ฌ where, the nth power is a polynomial regression
dimension (degree).
๏ฌ Polynomial regression belongs to linear
regression as the relationship between its weight
parameters ๐‘ค is still linear while its nonlinearity
is reflected in the feature dimension.
Huawei Confidential
64
Linear Regression and Overfitting Prevention
๏ฌ Regularization terms can be used to reduce overfitting. The value of ๐‘ค cannot be too
large or too small in the sample space. You can add a square sum loss on the target
function.
๏ฌ Regularization terms (norm): The regularization term here is called L2-norm. Linear
regression that uses this loss function is also called Ridge regression.
๏ฌ Linear regression with absolute loss is called Lasso regression.
๏€จ ๏€ฉ
2
2
2
1
( ) ( ) +
2
๏ฌ
๏€ฝ ๏€ญ
๏ƒฅ ๏ƒฅ
w
J w h x y w
m
๏€จ ๏€ฉ
2
1
1
( ) ( ) +
2
๏ฌ
๏€ฝ ๏€ญ
๏ƒฅ ๏ƒฅ
w
J w h x y w
m
Huawei Confidential
65
Logistic Regression (1)
๏ฌ Logistic regression: The logistic regression model is used to solve classification problems.
The model is defined as follows:
๐‘ƒ ๐‘Œ = 1 ๐‘ฅ =
๐‘’๐‘ค๐‘ฅ+๐‘
1 + ๐‘’๐‘ค๐‘ฅ+๐‘
๐‘ƒ ๐‘Œ = 0 ๐‘ฅ =
1
1 + ๐‘’๐‘ค๐‘ฅ+๐‘
where ๐‘ค indicates the weight, ๐‘ indicates the bias, and ๐‘ค๐‘ฅ + ๐‘ is regarded as the linear function of ๐‘ฅ.
Compare the preceding two probability values. The class with a higher probability value is the class of ๐‘ฅ.
Huawei Confidential
66
Logistic Regression (2)
๏ฌ Both the logistic regression model and linear regression model are generalized linear
models. Logistic regression introduces nonlinear factors (the sigmoid function) based on
linear regression and sets thresholds, so it can deal with binary classification problems.
๏ฌ According to the model function of logistic regression, the loss function of logistic
regression can be estimated as follows by using the maximum likelihood estimation:
๏ฌ where ๐‘ค indicates the weight parameter, ๐‘š indicates the number of samples, ๐‘ฅ indicates
the sample, and ๐‘ฆ indicates the real value. The values of all the weight parameters ๐‘ค
can also be obtained through the gradient descent algorithm.
๏€จ ๏€ฉ
1
( ) ln ( ) (1 )ln(1 ( ))
๏€ฝ ๏€ญ ๏€ซ ๏€ญ ๏€ญ
๏ƒฅ w w
J w y h x y h x
m
Huawei Confidential
67
Logistic Regression Extension - Softmax Function (1)
๏ฌ Logistic regression applies only to binary classification problems. For multi-class
classification problems, use the Softmax function.
Binary classification problem
Female?
Grape?
Apple?
Banana?
Multi-class classification problem
Male? Orange?
Huawei Confidential
68
Logistic Regression Extension - Softmax Function (2)
๏ฌ Softmax regression is a generalization of logistic regression that we can use for
K-class classification.
๏ฌ The Softmax function is used to map a K-dimensional vector of arbitrary real
values to another K-dimensional vector of real values, where each vector
element is in the interval (0, 1).
๏ฌ The regression probability function of Softmax is as follows:
1
( | ; ) , 1,2 ,
T
k
T
l
l
w
w
x
K
x
e
p y k x w k K
e
๏€ฝ
๏€ฝ ๏€ฝ ๏€ฝ
๏ƒฅ
L
Huawei Confidential
69
Logistic Regression Extension - Softmax Function (3)
๏ฌ Softmax assigns a probability to each class in a multi-class problem. These probabilities
must add up to 1.
๏ฐ Softmax may produce a form belonging to a particular class. Example:
0.09
0.22
0.68
0.01
Category
โ€ข Sum of all probabilities:
โ€ข 0.09 + 0.22 + 0.68 + 0.01 =1
โ€ข Most probably, this picture is
an apple.
Grape?
Apple?
Banana?
Orange?
Probability
Huawei Confidential
70
Decision Tree
๏ฌ A decision tree is a tree structure (a binary tree or a non-binary tree). Each non-leaf node represents a test on
a feature attribute. Each branch represents the output of a feature attribute in a certain value range, and
each leaf node stores a category. To use the decision tree, start from the root node, test the feature attributes
of the items to be classified, select the output branches, and use the category stored on the leaf node as the
final result.
Root
Short Tall
Long
neck
Short
neck
Can
squeak
Might be a
squirrel
Short
nose
Long
nose
In water
On land
Cannot
squeak
Might be
a rat
Might be a
giraffe
Might be an
elephant
Might be a
rhinoceros
Might be
a hippo
Huawei Confidential
71
Decision Tree Structure
Root Node
Internal
Node
Internal
Node
Internal
Node
Leaf Node
Leaf Node Leaf Node
Leaf Node
Leaf Node
Leaf Node
Huawei Confidential
72
Key Points of Decision Tree Construction
๏ฌ To create a decision tree, we need to select attributes and determine the tree structure
between feature attributes. The key step of constructing a decision tree is to divide data
of all feature attributes, compare the result sets in terms of 'purity', and select the
attribute with the highest 'purity' as the data point for dataset division.
๏ฌ The metrics to quantify the 'purity' include the information entropy and GINI Index. The
formula is as follows:
๏ฌ where ๐‘๐‘˜ indicates the probability that the sample belongs to class k (there are K
classes in total). A greater difference between purity before segmentation and that after
segmentation indicates a better decision tree.
๏ฌ Common decision tree algorithms include ID3, C4.5, and CART.
2
1
( )= - log ( )
K
k k
k
H X p p
๏€ฝ
๏ƒฅ 2
1
1
K
k
k
Gini p
๏€ฝ
๏€ฝ ๏€ญ ๏ƒฅ
Huawei Confidential
73
Decision Tree Construction Process
๏ฌ Feature selection: Select a feature from the features of the training data as the
split standard of the current node. (Different standards generate different
decision tree algorithms.)
๏ฌ Decision tree generation: Generate internal node upside down based on the
selected features and stop until the dataset can no longer be split.
๏ฌ Pruning: The decision tree may easily become overfitting unless necessary
pruning (including pre-pruning and post-pruning) is performed to reduce the
tree size and optimize its node structure.
Huawei Confidential
74
Decision Tree Example
๏ฌ The following figure shows a classification when a decision tree is used. The classification result is
impacted by three attributes: Refund, Marital Status, and Taxable Income.
Tid Refund
Marital
Status
Taxable
Income
Cheat
1 Yes Single 125,000 No
2 No Married 100,000 No
3 No Single 70,000 No
4 Yes Married 120,000 No
5 No Divorced 95,000 Yes
6 No Married 60,000 No
7 Yes Divorced 220,000 No
8 No Single 85,000 Yes
9 No Married 75,000 No
10 No Single 90,000 Yes
Yes
No
No
No
Refund
Marital
Status
Taxable
Income
Huawei Confidential
75
SVM
๏ฌ SVM is a binary classification model whose basic model is a linear classifier defined in the
eigenspace with the largest interval. SVMs also include kernel tricks that make them nonlinear
classifiers. The SVM learning algorithm is the optimal solution to convex quadratic programming.
Projection
Complex
segmentation in low-
dimensional space
Easy segmentation in
high-dimensional space
Huawei Confidential
76
With binary classification
Two-dimensional dataset
Both the left and right methods can be used to
divide datasets. Which of them is correct?
or
Linear SVM (1)
๏ฌ How do we split the red and blue datasets by a straight line?
Huawei Confidential
77
Linear SVM (2)
๏ฌ Straight lines are used to divide data into different classes. Actually, we can use multiple straight
lines to divide data. The core idea of the SVM is to find a straight line and keep the point close to
the straight line as far as possible from the straight line. This can enable strong generalization
capability of the model. These points are called support vectors.
๏ฌ In two-dimensional space, we use straight lines for segmentation. In high-dimensional space, we
use hyperplanes for segmentation.
Distance between
support vectors
is as far as
possible
Huawei Confidential
78
Linear SVM can function well
for linear separable datasets.
Nonlinear SVM (1)
๏ฌ How do we classify a nonlinear separable dataset?
Nonlinear datasets cannot be
split with straight lines.
Huawei Confidential
79
Nonlinear SVM (2)
๏ฌ Kernel functions are used to construct nonlinear SVMs.
๏ฌ Kernel functions allow algorithms to fit the largest hyperplane in a transformed high-
dimensional feature space.
Input space High-dimensional
feature space
Linear
kernel
function
Polynomial
kernel
function
Gaussian
kernel
function
Sigmoid
kernel
function
Common kernel functions
Huawei Confidential
80
KNN Algorithm (1)
๏ฌ The KNN classification algorithm is a
theoretically mature method and one of the
simplest machine learning algorithms.
According to this method, if the majority of
k samples most similar to one sample
(nearest neighbors in the eigenspace)
belong to a specific category, this sample
also belongs to this category.
?
The target category of point ? varies with
the number of the most adjacent nodes.
Huawei Confidential
81
KNN Algorithm (2)
๏ฌ As the prediction result is determined based on
the number and weights of neighbors in the
training set, the KNN algorithm has a simple logic.
๏ฌ KNN is a non-parametric method which is usually
used in datasets with irregular decision
boundaries.
๏ฐ The KNN algorithm generally adopts the majority
voting method for classification prediction and the
average value method for regression prediction.
๏ฌ KNN requires a huge number of computations.
Huawei Confidential
82
KNN Algorithm (3)
๏ฌ Generally, a larger k value reduces the impact of noise on classification, but obfuscates the
boundary between classes.
๏ฐ A larger k value means a higher probability of underfitting because the segmentation is too rough. A
smaller k value means a higher probability of overfitting because the segmentation is too refined.
โ€ข The boundary becomes smoother as
the value of k increases.
โ€ข As the k value increases to infinity, all
data points will eventually become all
blue or all red.
Huawei Confidential
83
Naive Bayes (1)
๏ฌ Naive Bayes algorithm: a simple multi-class classification algorithm based on the Bayes theorem.
It assumes that features are independent of each other. For a given sample feature ๐‘‹, the
probability that a sample belongs to a category ๐ป is:
๏ฐ ๐‘‹1, โ€ฆ , ๐‘‹๐‘› are data features, which are usually described by measurement values of m attribute sets.
๏ฎ For example, the color feature may have three attributes: red, yellow, and blue.
๏ฐ ๐ถ๐‘˜ indicates that the data belongs to a specific category ๐ถ
๏ฐ ๐‘ƒ ๐ถ๐‘˜|๐‘‹1, โ€ฆ , ๐‘‹๐‘› is a posterior probability, or a posterior probability of under condition ๐ถ๐‘˜.
๏ฐ ๐‘ƒ ๐ถ๐‘˜ is a prior probability that is independent of ๐‘‹1, โ€ฆ , ๐‘‹๐‘›
๏ฐ ๐‘ƒ ๐‘‹1, โ€ฆ , ๐‘‹๐‘› is the priori probability of ๐‘‹.
๏€จ ๏€ฉ
๏€จ ๏€ฉ ๏€จ ๏€ฉ
๏€จ ๏€ฉ
1 n
1 n
1 n
, , |
| , ,
, ,
k k
k
P X X C P C
P C X X
P X X
๏‚ผ
๏‚ผ ๏€ฝ
๏‚ผ
Huawei Confidential
84
Naive Bayes (2)
๏ฌ Independent assumption of features.
๏ฐ For example, if a fruit is red, round, and about 10 cm (3.94 in.) in diameter, it can be
considered an apple.
๏ฐ A Naive Bayes classifier considers that each feature independently contributes to the
probability that the fruit is an apple, regardless of any possible correlation between
the color, roundness, and diameter.
Huawei Confidential
85
Ensemble Learning
๏ฌ Ensemble learning is a machine learning paradigm in which multiple learners are trained and combined to
solve the same problem. When multiple learners are used, the integrated generalization capability can be
much stronger than that of a single learner.
๏ฌ If you ask a complex question to thousands of people at random and then summarize their answers, the
summarized answer is better than an expert's answer in most cases. This is the wisdom of the masses.
Dataset m
Model 1 Model 2 Model m
Model
synthesis
Large
model
Dataset 1 Dataset 2
Training set
Huawei Confidential
86
Classification of Ensemble Learning
Ensemble learning
Bagging
Boosting
Bagging (Random Forest)
โ€ข Independently builds several basic learners and then
averages their predictions.
โ€ข On average, a composite learner is usually better than
a single-base learner because of a smaller variance.
Boosting (Adaboost, GBDT, and XGboost)
Constructs basic learners in sequence to
gradually reduce the bias of a composite learner.
The composite learner can fit data well, which
may also cause overfitting.
Huawei Confidential
87
Ensemble Methods in Machine Learning (1)
๏ฌ Random forest = Bagging + CART decision tree
๏ฌ Random forests build multiple decision trees and merge them together to make predictions more accurate
and stable.
๏ฐ Random forests can be used for classification and regression problems.
All training data
Data subset 1
Bootstrap sampling Decision tree building Aggregation
prediction result
โ€ข Category:
majority voting
โ€ข Regression:
average value
Final prediction
Prediction 2
Prediction
Prediction n
Prediction 1
Data subset 2
Data subset
Huawei Confidential
88
Ensemble Methods in Machine Learning (2)
๏ฌ GBDT is a type of boosting algorithm.
๏ฌ For an aggregative mode, the sum of the results of all the basic learners equals the predicted
value. In essence, the residual of the error function to the predicted value is fit by the next basic
learner. (The residual is the error between the predicted value and the actual value.)
๏ฌ During model training, GBDT requires that the sample loss for model prediction be as small as
possible.
30 years old 20 years old
Prediction
10 years old
Residual
calculation
9 years old
Prediction
1 year old
Residual
calculation
Prediction
1 year old
Huawei Confidential
89
Unsupervised Learning - K-means
๏ฌ K-means clustering aims to partition n observations into k clusters in which each observation
belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
๏ฌ For the k-means algorithm, specify the final number of clusters (k). Then, divide n data objects
into k clusters. The clusters obtained meet the following conditions: (1) Objects in the same
cluster are highly similar. (2) The similarity of objects in different clusters is small.
x2
x1
x2
x1
The data is not tagged. K-
means clustering can
automatically classify datasets.
K-means clustering
Huawei Confidential
90
Unsupervised Learning - Hierarchical Clustering
๏ฌ Hierarchical clustering divides a dataset at different layers and forms a tree-like clustering
structure. The dataset division may use a "bottom-up" aggregation policy, or a "top-down"
splitting policy. The hierarchy of clustering is represented in a tree graph. The root is the unique
cluster of all samples, and the leaves are the cluster of only a sample.
Agglomerative
hierarchical
clustering
Split
hierarchical
clustering
Huawei Confidential
91
Contents
1. Machine Learning Definition
2. Machine Learning Types
3. Machine Learning Process
4. Other Key Machine Learning Methods
5. Common Machine Learning Algorithms
6. Case study
Huawei Confidential
92
Comprehensive Case
๏ฌ Assume that there is a dataset containing the house areas and prices of 21,613
housing units sold in a city. Based on this data, we can predict the prices of
other houses in the city.
Dataset
House Area Price
1,180 221,900
2,570 538,000
770 180,000
1,960 604,000
1,680 510,000
5,420 1,225,000
1,715 257,500
1,060 291,850
1,160 468,000
1,430 310,000
1,370 400,000
1,810 530,000
โ€ฆ โ€ฆ
Huawei Confidential
93 Price
House area
Problem Analysis
๏ฌ This case contains a large amount of data, including input x (house area), and output y (price), which is a
continuous value. We can use regression of supervised learning. Draw a scatter chart based on the data and
use linear regression.
๏ฌ Our goal is to build a model function h(x) that infinitely approximates the function that expresses true
distribution of the dataset.
๏ฌ Then, use the model to predict unknown price data.
Unary linear regression function
1
( ) o
h x w w x
๏€ฝ ๏€ซ
Learning
algorithm
h(x)
Output
x
Feature: house area
Dataset
Input
y
Label: price
Huawei Confidential
94
Price
House area
Goal of Linear Regression
๏ฌ Linear regression aims to find a straight line that best fits the dataset.
๏ฌ Linear regression is a parameter-based model. Here, we need learning parameters ๐‘ค0
and ๐‘ค1. When these two parameters are found, the best model appears.
Which line is the best parameter?
1
( ) o
h x w w x
๏€ฝ ๏€ซ
Price
House area
Huawei Confidential
95
Loss Function of Linear Regression
๏ฌ To find the optimal parameter, construct a loss function and find the parameter
values when the loss function becomes the minimum.
โ€ข where, m indicates the number of samples,
โ€ข h(x) indicates the predicted value, and y
indicates the actual value.
Loss function of
linear regression:
Error
Error
Price
House area
Error Error
๏€จ ๏€ฉ
2
1
( ) ( )
2
J w h x y
m
๏€ฝ ๏€ญ
๏ƒฅ
Goal:
๏€จ ๏€ฉ
2
1
argmin ( ) ( )
2
w
J w h x y
m
๏€ฝ ๏€ญ
๏ƒฅ
Huawei Confidential
96
Gradient Descent Method
๏ฌ The gradient descent algorithm finds the minimum value of a function through iteration.
๏ฌ It aims to randomize an initial point on the loss function, and then find the global minimum value
of the loss function based on the negative gradient direction. Such parameter value is the optimal
parameter value.
๏ฐ Point A: the position of ๐‘ค0 and ๐‘ค1 after random initialization.
๐‘ค0 and ๐‘ค1 are the required parameters.
๏ฐ A-B connection line: a track formed based on descents in
a negative gradient direction. Upon each descent, values ๐‘ค0
and ๐‘ค1 change, and the regression line also changes.
๏ฐ Point B: global minimum value of the loss function.
Final values of ๐‘ค0 and ๐‘ค1 are also found.
Cost surface
Huawei Confidential
97
Iteration Example
๏ฌ The following is an example of a gradient descent iteration. We can see that as red
points on the loss function surface gradually approach a lowest point, fitting of the
linear regression red line with data becomes better and better. At this time, we can get
the best parameters.
Huawei Confidential
98
Model Debugging and Application
๏ฌ After the model is trained, test it with the test
set to ensure the generalization capability.
๏ฌ If overfitting occurs, use Lasso regression or
Ridge regression with regularization terms
and tune the hyperparameters.
๏ฌ If underfitting occurs, use a more complex
regression model, such as GBDT.
๏ฌ Note:
๏ฐ For real data, pay attention to the functions of
data cleansing and feature engineering.
Price
House area
( ) 280.62 43581
h x x
๏€ฝ ๏€ญ
The final model result is as follows:
Huawei Confidential
99
Summary
๏ฌ First, this course describes the definition and classification of machine learning, as
well as problems machine learning solves. Then, it introduces key knowledge
points of machine learning, including the overall procedure (data collection, data
cleansing, feature extraction, model training, model training and evaluation, and
model deployment), common algorithms (linear regression, logistic regression,
decision tree, SVM, naive Bayes, KNN, ensemble learning, K-means, etc.), gradient
descent algorithm, parameters and hyper-parameters.
๏ฌ Finally, a complete machine learning process is presented by a case of using linear
regression to predict house prices.

More Related Content

What's hot

Pertemuan 14 Jaringan Syaraf (Neural Network)
Pertemuan 14 Jaringan Syaraf (Neural Network)Pertemuan 14 Jaringan Syaraf (Neural Network)
Pertemuan 14 Jaringan Syaraf (Neural Network)Endang Retnoningsih
ย 
Modul Praktikum Sistem Basis Data
Modul Praktikum Sistem Basis Data Modul Praktikum Sistem Basis Data
Modul Praktikum Sistem Basis Data Wahyu Widodo
ย 
Java membuat form data mahasiswa
Java   membuat form data mahasiswaJava   membuat form data mahasiswa
Java membuat form data mahasiswahermawanawang
ย 
(Jst)hebb dan delta rule
(Jst)hebb dan delta rule(Jst)hebb dan delta rule
(Jst)hebb dan delta ruleJunior Iqfar
ย 
Representasi Pengetahuan
Representasi PengetahuanRepresentasi Pengetahuan
Representasi PengetahuanSherly Uda
ย 
Jaringan Kohonen (Unsupervised Learning)
Jaringan Kohonen (Unsupervised Learning)Jaringan Kohonen (Unsupervised Learning)
Jaringan Kohonen (Unsupervised Learning)DindaAyuYunitasari
ย 
Social engineering techniques and solutions
Social engineering techniques and solutionsSocial engineering techniques and solutions
Social engineering techniques and solutionsicalredhat
ย 
Socket Programming UDP Echo Client Server (Python)
Socket Programming  UDP Echo Client Server  (Python)Socket Programming  UDP Echo Client Server  (Python)
Socket Programming UDP Echo Client Server (Python)Lusiana Diyan
ย 
MAKALAH CLOUD COMPUTING
MAKALAH CLOUD COMPUTINGMAKALAH CLOUD COMPUTING
MAKALAH CLOUD COMPUTINGHanny Maharani
ย 
Mp 3 arsitektur-mikroprosesor
Mp 3 arsitektur-mikroprosesorMp 3 arsitektur-mikroprosesor
Mp 3 arsitektur-mikroprosesorOlbers Letfaar
ย 
Requirement Engineering
Requirement EngineeringRequirement Engineering
Requirement EngineeringFebryci Legirian
ย 
Array searching sorting_pert_11,12,13,14,15
Array searching sorting_pert_11,12,13,14,15Array searching sorting_pert_11,12,13,14,15
Array searching sorting_pert_11,12,13,14,15doudomblogspot
ย 
Pertemuan 11-aritmatika
Pertemuan 11-aritmatikaPertemuan 11-aritmatika
Pertemuan 11-aritmatikaFrance Rhezhek
ย 
Modul 8 - Jaringan Syaraf Tiruan (JST)
Modul 8 - Jaringan Syaraf Tiruan (JST)Modul 8 - Jaringan Syaraf Tiruan (JST)
Modul 8 - Jaringan Syaraf Tiruan (JST)ahmad haidaroh
ย 
Slide infringement of privacy
Slide infringement of privacySlide infringement of privacy
Slide infringement of privacyIin Muslichah
ย 
Analisis leksikal tugas
Analisis leksikal tugasAnalisis leksikal tugas
Analisis leksikal tugasAminah Rahayu
ย 
Sistem Pakar Certainty factor
Sistem Pakar Certainty factor Sistem Pakar Certainty factor
Sistem Pakar Certainty factor Adi Ginanjar Kusuma
ย 
[PBO] Pertemuan 5 - Inheritance
[PBO] Pertemuan 5 - Inheritance[PBO] Pertemuan 5 - Inheritance
[PBO] Pertemuan 5 - Inheritancerizki adam kurniawan
ย 

What's hot (20)

Modul io
Modul ioModul io
Modul io
ย 
Pertemuan 14 Jaringan Syaraf (Neural Network)
Pertemuan 14 Jaringan Syaraf (Neural Network)Pertemuan 14 Jaringan Syaraf (Neural Network)
Pertemuan 14 Jaringan Syaraf (Neural Network)
ย 
Modul Praktikum Sistem Basis Data
Modul Praktikum Sistem Basis Data Modul Praktikum Sistem Basis Data
Modul Praktikum Sistem Basis Data
ย 
Java membuat form data mahasiswa
Java   membuat form data mahasiswaJava   membuat form data mahasiswa
Java membuat form data mahasiswa
ย 
(Jst)hebb dan delta rule
(Jst)hebb dan delta rule(Jst)hebb dan delta rule
(Jst)hebb dan delta rule
ย 
Representasi Pengetahuan
Representasi PengetahuanRepresentasi Pengetahuan
Representasi Pengetahuan
ย 
Jaringan Kohonen (Unsupervised Learning)
Jaringan Kohonen (Unsupervised Learning)Jaringan Kohonen (Unsupervised Learning)
Jaringan Kohonen (Unsupervised Learning)
ย 
Social engineering techniques and solutions
Social engineering techniques and solutionsSocial engineering techniques and solutions
Social engineering techniques and solutions
ย 
Socket Programming UDP Echo Client Server (Python)
Socket Programming  UDP Echo Client Server  (Python)Socket Programming  UDP Echo Client Server  (Python)
Socket Programming UDP Echo Client Server (Python)
ย 
MAKALAH CLOUD COMPUTING
MAKALAH CLOUD COMPUTINGMAKALAH CLOUD COMPUTING
MAKALAH CLOUD COMPUTING
ย 
Mp 3 arsitektur-mikroprosesor
Mp 3 arsitektur-mikroprosesorMp 3 arsitektur-mikroprosesor
Mp 3 arsitektur-mikroprosesor
ย 
Requirement Engineering
Requirement EngineeringRequirement Engineering
Requirement Engineering
ย 
Array searching sorting_pert_11,12,13,14,15
Array searching sorting_pert_11,12,13,14,15Array searching sorting_pert_11,12,13,14,15
Array searching sorting_pert_11,12,13,14,15
ย 
Pertemuan 11-aritmatika
Pertemuan 11-aritmatikaPertemuan 11-aritmatika
Pertemuan 11-aritmatika
ย 
Modul 8 - Jaringan Syaraf Tiruan (JST)
Modul 8 - Jaringan Syaraf Tiruan (JST)Modul 8 - Jaringan Syaraf Tiruan (JST)
Modul 8 - Jaringan Syaraf Tiruan (JST)
ย 
Slide infringement of privacy
Slide infringement of privacySlide infringement of privacy
Slide infringement of privacy
ย 
Analisis leksikal tugas
Analisis leksikal tugasAnalisis leksikal tugas
Analisis leksikal tugas
ย 
Sistem Pakar Certainty factor
Sistem Pakar Certainty factor Sistem Pakar Certainty factor
Sistem Pakar Certainty factor
ย 
[PBO] Pertemuan 5 - Inheritance
[PBO] Pertemuan 5 - Inheritance[PBO] Pertemuan 5 - Inheritance
[PBO] Pertemuan 5 - Inheritance
ย 
Jaringan hebb
Jaringan hebbJaringan hebb
Jaringan hebb
ย 

Similar to ch_02 Machine Learning Overview.pdf

03 Machine Learning Overview.pptx
03 Machine Learning Overview.pptx03 Machine Learning Overview.pptx
03 Machine Learning Overview.pptxSagarBurnah
ย 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
ย 
Machine Learning Contents.pptx
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptxNaveenkushwaha18
ย 
machine learning
machine learningmachine learning
machine learningMounisha A
ย 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningSujith Jayaprakash
ย 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentationNaveen Kumar
ย 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxiaeronlineexm
ย 
Machine Learning Presentation
Machine Learning PresentationMachine Learning Presentation
Machine Learning PresentationSk Samiul Islam
ย 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptxDr. Amanpreet Kaur
ย 
What is Machine Learning4-converted.pptx
What is Machine Learning4-converted.pptxWhat is Machine Learning4-converted.pptx
What is Machine Learning4-converted.pptxvinod756504
ย 
Name.pptx
Name.pptxName.pptx
Name.pptxAyan974999
ย 
MACHINE LEARNING(R17A0534).pdf
MACHINE LEARNING(R17A0534).pdfMACHINE LEARNING(R17A0534).pdf
MACHINE LEARNING(R17A0534).pdfFayyoOlani
ย 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning LandscapeEng Teong Cheah
ย 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applicationsBenjaminlapid1
ย 
An-Overview-of-Machine-Learning.pptx
An-Overview-of-Machine-Learning.pptxAn-Overview-of-Machine-Learning.pptx
An-Overview-of-Machine-Learning.pptxsomeyamohsen3
ย 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationAnkit Gupta
ย 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahulKirtoniya
ย 
Artificial Intelligence & QA
Artificial Intelligence & QAArtificial Intelligence & QA
Artificial Intelligence & QAMalihaAshraf
ย 
Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack BigDataExpo
ย 

Similar to ch_02 Machine Learning Overview.pdf (20)

03 Machine Learning Overview.pptx
03 Machine Learning Overview.pptx03 Machine Learning Overview.pptx
03 Machine Learning Overview.pptx
ย 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
ย 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
ย 
Machine Learning Contents.pptx
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptx
ย 
machine learning
machine learningmachine learning
machine learning
ย 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
ย 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentation
ย 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
ย 
Machine Learning Presentation
Machine Learning PresentationMachine Learning Presentation
Machine Learning Presentation
ย 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
ย 
What is Machine Learning4-converted.pptx
What is Machine Learning4-converted.pptxWhat is Machine Learning4-converted.pptx
What is Machine Learning4-converted.pptx
ย 
Name.pptx
Name.pptxName.pptx
Name.pptx
ย 
MACHINE LEARNING(R17A0534).pdf
MACHINE LEARNING(R17A0534).pdfMACHINE LEARNING(R17A0534).pdf
MACHINE LEARNING(R17A0534).pdf
ย 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
ย 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
ย 
An-Overview-of-Machine-Learning.pptx
An-Overview-of-Machine-Learning.pptxAn-Overview-of-Machine-Learning.pptx
An-Overview-of-Machine-Learning.pptx
ย 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
ย 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
ย 
Artificial Intelligence & QA
Artificial Intelligence & QAArtificial Intelligence & QA
Artificial Intelligence & QA
ย 
Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack
ย 

More from AhmedSalah48055

Black and Blue Tech Theme for Career Day Presentation.pptx
Black and Blue Tech Theme for Career Day Presentation.pptxBlack and Blue Tech Theme for Career Day Presentation.pptx
Black and Blue Tech Theme for Career Day Presentation.pptxAhmedSalah48055
ย 
Scientific findings presentation.pptx
Scientific findings presentation.pptxScientific findings presentation.pptx
Scientific findings presentation.pptxAhmedSalah48055
ย 
Copy of Turquoise and Black Dark Neon Modern Tech Thesis Defense Presentation...
Copy of Turquoise and Black Dark Neon Modern Tech Thesis Defense Presentation...Copy of Turquoise and Black Dark Neon Modern Tech Thesis Defense Presentation...
Copy of Turquoise and Black Dark Neon Modern Tech Thesis Defense Presentation...AhmedSalah48055
ย 
Decision Tree Diagram by Slidesgo.pptx
Decision Tree Diagram by Slidesgo.pptxDecision Tree Diagram by Slidesgo.pptx
Decision Tree Diagram by Slidesgo.pptxAhmedSalah48055
ย 
Kate ยท SlidesCarnival.pptx
Kate ยท SlidesCarnival.pptxKate ยท SlidesCarnival.pptx
Kate ยท SlidesCarnival.pptxAhmedSalah48055
ย 
Midterm revision 2022 without answer.pdf
Midterm revision 2022  without answer.pdfMidterm revision 2022  without answer.pdf
Midterm revision 2022 without answer.pdfAhmedSalah48055
ย 
Midterm revision - without answer edit.pdf
Midterm revision - without answer edit.pdfMidterm revision - without answer edit.pdf
Midterm revision - without answer edit.pdfAhmedSalah48055
ย 

More from AhmedSalah48055 (7)

Black and Blue Tech Theme for Career Day Presentation.pptx
Black and Blue Tech Theme for Career Day Presentation.pptxBlack and Blue Tech Theme for Career Day Presentation.pptx
Black and Blue Tech Theme for Career Day Presentation.pptx
ย 
Scientific findings presentation.pptx
Scientific findings presentation.pptxScientific findings presentation.pptx
Scientific findings presentation.pptx
ย 
Copy of Turquoise and Black Dark Neon Modern Tech Thesis Defense Presentation...
Copy of Turquoise and Black Dark Neon Modern Tech Thesis Defense Presentation...Copy of Turquoise and Black Dark Neon Modern Tech Thesis Defense Presentation...
Copy of Turquoise and Black Dark Neon Modern Tech Thesis Defense Presentation...
ย 
Decision Tree Diagram by Slidesgo.pptx
Decision Tree Diagram by Slidesgo.pptxDecision Tree Diagram by Slidesgo.pptx
Decision Tree Diagram by Slidesgo.pptx
ย 
Kate ยท SlidesCarnival.pptx
Kate ยท SlidesCarnival.pptxKate ยท SlidesCarnival.pptx
Kate ยท SlidesCarnival.pptx
ย 
Midterm revision 2022 without answer.pdf
Midterm revision 2022  without answer.pdfMidterm revision 2022  without answer.pdf
Midterm revision 2022 without answer.pdf
ย 
Midterm revision - without answer edit.pdf
Midterm revision - without answer edit.pdfMidterm revision - without answer edit.pdf
Midterm revision - without answer edit.pdf
ย 

Recently uploaded

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
ย 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
ย 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
ย 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
ย 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
ย 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
ย 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
ย 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
ย 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
ย 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
ย 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
ย 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
ย 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
ย 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
ย 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
ย 
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdfssuser54595a
ย 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
ย 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
ย 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
ย 

Recently uploaded (20)

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
ย 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
ย 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
ย 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
ย 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
ย 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
ย 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
ย 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
ย 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
ย 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
ย 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
ย 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
ย 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
ย 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
ย 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
ย 
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAะกY_INDEX-DM_23-1-final-eng.pdf
ย 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
ย 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
ย 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ย 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
ย 

ch_02 Machine Learning Overview.pdf

  • 2. Huawei Confidential 1 Foreword ๏ฌ Machine learning is a core research field of AI, and it is also a necessary knowledge for deep learning. Therefore, this chapter mainly introduces the main concepts of machine learning, the classification of machine learning, the overall process of machine learning, and the common algorithms of machine learning.
  • 3. Huawei Confidential 2 Objectives Upon completion of this course, you will be able to: ๏ฐ Master the learning algorithm definition and machine learning process. ๏ฐ Know common machine learning algorithms. ๏ฐ Understand concepts such as hyperparameters, gradient descent, and cross validation.
  • 4. Huawei Confidential 3 Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case Study
  • 5. Huawei Confidential 4 Machine Learning Algorithms (1) ๏ฌ Machine learning (including deep learning) is a study of learning algorithms. A computer program is said to learn from experience ๐ธ with respect to some class of tasks ๐‘‡ and performance measure ๐‘ƒ if its performance at tasks in ๐‘‡, as measured by ๐‘ƒ, improves with experience ๐ธ. Data (Experience E) Learning algorithms (Task T) Basic understanding (Measure P)
  • 6. Huawei Confidential 5 Machine Learning Algorithms (2) Experience Regularity Induction Prediction Input New problems Future Historical data Model Training Prediction Input New data Future attributes
  • 7. Huawei Confidential 6 Created by: Jim Liang Differences Between Machine Learning Algorithms and Traditional Rule-Based Algorithms โ€ข Samples are used for training. โ€ข The decision-making rules are complex or difficult to describe. โ€ข Rules are automatically learned by machines. Rule-based algorithms Machine learning Training data Machine learning Model Prediction New data โ€ข Explicit programming is used to solve problems. โ€ข Rules can be manually specified.
  • 8. Huawei Confidential 7 Application Scenarios of Machine Learning (1) ๏ฌ The solution to a problem is complex, or the problem may involve a large amount of data without a clear data distribution function. ๏ฌ Machine learning can be used in the following scenarios: Rules are complex or cannot be described, such as facial recognition and voice recognition. Task rules change over time. For example, in the part-of- speech tagging task, new words or meanings are generated at any time. Data distribution changes over time, requiring constant readaptation of programs, such as predicting the trend of commodity sales.
  • 9. Huawei Confidential 8 Application Scenarios of Machine Learning (2) Manual rules Simple problems Rule-based algorithms Small Large Simple Complex Scale of the problem Rule complexity Machine learning algorithms
  • 10. Huawei Confidential 9 Rational Understanding of Machine Learning Algorithms ๏ฌ Target function f is unknown. Learning algorithms cannot obtain a perfect function f. ๏ฌ Assume that hypothesis function g approximates function f, but may be different from function f. Target equation ๐‘“: ๐‘‹ โ†’ ๐‘Œ Learning algorithms Training data ๐ท: {(๐‘ฅ1, ๐‘ฆ1) โ‹ฏ , (๐‘ฅ๐‘›, ๐‘ฆ๐‘›)} Hypothesis function ๐‘” โ‰ˆ ๐‘“ Ideal Actual
  • 11. Huawei Confidential 10 Main Problems Solved by Machine Learning ๏ฌ Machine learning can deal with many types of tasks. The following describes the most typical and common types of tasks. ๏ฐ Classification: A computer program needs to specify which of the k categories some input belongs to. To accomplish this task, learning algorithms usually output a function ๐‘“: ๐‘…๐‘› โ†’ (1,2, โ€ฆ , ๐‘˜). For example, the image classification algorithm in computer vision is developed to handle classification tasks. ๏ฐ Regression: For this type of task, a computer program predicts the output for the given input. Learning algorithms typically output a function ๐‘“: ๐‘…๐‘› โ†’ ๐‘…. An example of this task type is to predict the claim amount of an insured person (to set the insurance premium) or predict the security price. ๏ฐ Clustering: A large amount of data from an unlabeled dataset is divided into multiple categories according to internal similarity of the data. Data in the same category is more similar than that in different categories. This feature can be used in scenarios such as image retrieval and user profile management. ๏ฌ Classification and regression are two main types of prediction, accounting from 80% to 90%. The output of classification is discrete category values, and the output of regression is continuous numbers.
  • 12. Huawei Confidential 11 Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study
  • 13. Huawei Confidential 12 Machine Learning Classification ๏ฌ Supervised learning: Obtain an optimal model with required performance through training and learning based on the samples of known categories. Then, use the model to map all inputs to outputs and check the output for the purpose of classifying unknown data. ๏ฌ Unsupervised learning: For unlabeled samples, the learning algorithms directly model the input datasets. Clustering is a common form of unsupervised learning. We only need to put highly similar samples together, calculate the similarity between new samples and existing ones, and classify them by similarity. ๏ฌ Semi-supervised learning: In one task, a machine learning model that automatically uses a large amount of unlabeled data to assist learning directly of a small amount of labeled data. ๏ฌ Reinforcement learning: It is an area of machine learning concerned with how agents ought to take actions in an environment to maximize some notion of cumulative reward. The difference between reinforcement learning and supervised learning is the teacher signal. The reinforcement signal provided by the environment in reinforcement learning is used to evaluate the action (scalar signal) rather than telling the learning system how to perform correct actions.
  • 14. Huawei Confidential 13 Supervised Learning Feature 1 Feature 1 Feature 1 Feature n Feature n Feature n ... ... ... Goal Goal Goal Data feature Label Supervised learning algorithm Weather Temperature Wind Speed Sunny Warm Strong Rainy Cold Fair Sunny Cold Weak Enjoy Sports Yes No Yes
  • 15. Huawei Confidential 14 Supervised Learning - Regression Questions ๏ฌ Regression: reflects the features of attribute values of samples in a sample dataset. The dependency between attribute values is discovered by expressing the relationship of sample mapping through functions. ๏ฐ How much will I benefit from the stock next week? ๏ฐ What's the temperature on Tuesday?
  • 16. Huawei Confidential 15 Supervised Learning - Classification Questions ๏ฌ Classification: maps samples in a sample dataset to a specified category by using a classification model. ๏ฐ Will there be a traffic jam on XX road during the morning rush hour tomorrow? ๏ฐ Which method is more attractive to customers: 5 yuan voucher or 25% off?
  • 17. Huawei Confidential 16 Unsupervised Learning Feature 1 Feature 1 Feature 1 Feature n Feature n Feature n ... ... ... Data Feature Unsupervised learning algorithm Monthly Consumption Commodity Consumption Time 1000โ€“2000 Badminton racket 6:00โ€“12:00 500โ€“1000 Basketball 18:00โ€“24:00 1000โ€“2000 Game console 00:00โ€“6:00 Internal similarity Category Cluster 1 Cluster 2
  • 18. Huawei Confidential 17 Unsupervised Learning - Clustering Questions ๏ฌ Clustering: classifies samples in a sample dataset into several categories based on the clustering model. The similarity of samples belonging to the same category is high. ๏ฐ Which audiences like to watch movies of the same subject? ๏ฐ Which of these components are damaged in a similar way?
  • 19. Huawei Confidential 18 Semi-Supervised Learning Feature 1 Feature 1 Feature 1 Feature n Feature n Feature n ... ... ... Goal Unknown Unknown Data Feature Label Semi-supervised learning algorithms Weather Temperature Wind Speed Sunny Warm Strong Rainy Cold Fair Sunny Cold Weak Enjoy Sports Yes / /
  • 20. Huawei Confidential 19 Reinforcement Learning ๏ฌ The model perceives the environment, takes actions, and makes adjustments and choices based on the status and award or punishment. Model Environment ๐‘Ÿ๐‘ก+1 ๐‘ ๐‘ก+1 Award or punishment ๐‘Ÿ๐‘ก Status ๐‘ ๐‘ก Action ๐‘Ž๐‘ก
  • 21. Huawei Confidential 20 Reinforcement Learning - Best Behavior ๏ฌ Reinforcement learning: always looks for best behaviors. Reinforcement learning is targeted at machines or robots. ๏ฐ Autopilot: Should it brake or accelerate when the yellow light starts to flash? ๏ฐ Cleaning robot: Should it keep working or go back for charging?
  • 22. Huawei Confidential 21 Contents 1. Machine learning algorithm 2. Machine Learning Classification 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study
  • 23. Huawei Confidential 22 Machine Learning Process Data collection Data cleansing Model training Model evaluation Feature extraction and selection Model deployment and integration Feedback and iteration
  • 24. Huawei Confidential 23 Basic Machine Learning Concept โ€” Dataset ๏ฌ Dataset: a collection of data used in machine learning tasks. Each data record is called a sample. Events or attributes that reflect the performance or nature of a sample in a particular aspect are called features. ๏ฌ Training set: a dataset used in the training process, where each sample is referred to as a training sample. The process of creating a model from data is called learning (training). ๏ฌ Test set: Testing refers to the process of using the model obtained after learning for prediction. The dataset used is called a test set, and each sample is called a test sample.
  • 25. Huawei Confidential 24 Checking Data Overview ๏ฌ Typical dataset form No. Area School Districts Direction House Price 1 100 8 South 1000 2 120 9 Southwest 1300 3 60 6 North 700 4 80 9 Southeast 1100 5 95 3 South 850 Feature 2 Feature 1 Feature 3 Label Training set Test set
  • 26. Huawei Confidential 25 Data preprocessing Importance of Data Processing ๏ฌ Data is crucial to models. It is the ceiling of model capabilities. Without good data, there is no good model. Fill in missing values, and detect and eliminate causes of dataset exceptions. Data cleansing Simplify data attributes to avoid dimension explosion. Data dimension reduction Normalize data to reduce noise and improve model accuracy. Data normalization
  • 27. Huawei Confidential 26 Workload of Data Cleansing ๏ฌ Statistics on data scientists' work in machine learning 3% Remodeling training datasets 9% Mining modes from data 19% Collecting datasets 60% Cleansing and sorting data CrowdFlower Data Science Report 2016 5% Others 4% Optimizing models
  • 28. Huawei Confidential 27 Data Cleansing ๏ฌ Most machine learning models process features, which are usually numeric representations of input variables that can be used in the model. ๏ฌ In most cases, the collected data can be used by algorithms only after being preprocessed. The preprocessing operations include the following: ๏ฐ Data filtering ๏ฐ Processing of lost data ๏ฐ Processing of possible exceptions, errors, or abnormal values ๏ฐ Combination of data from multiple data sources ๏ฐ Data consolidation
  • 29. Huawei Confidential 28 Dirty Data (1) ๏ฌ Generally, real data may have some quality problems. ๏ฐ Incompleteness: contains missing values or the data that lacks attributes ๏ฐ Noise: contains incorrect records or exceptions. ๏ฐ Inconsistency: contains inconsistent records.
  • 30. Huawei Confidential 29 Dirty Data (2) # Id Name Birthday Gender IsTe ach er #Stu dent s Country City 1 111 John 31/12/1990 M 0 0 Ireland Dublin 2 222 Mery 15/10/1978 F 1 15 Iceland 3 333 Alice 19/04/2000 F 0 0 Spain Madri d 4 444 Mark 01/11/1997 M 0 0 France Paris 5 555 Alex 15/03/2000 A 1 23 Germany Berlin 6 555 Peter 1983-12-01 M 1 10 Italy Rome 7 777 Calvin 05/05/1995 M 0 0 Italy Italy 8 888 Roxane 03/08/1948 F 0 0 Portugal Lisbon 9 999 Anne 05/09/1992 F 0 5 Switzerlan d Genev a 10 101010 Paul 14/11/1992 M 1 26 Ytali Rome Invalid duplicate item Incorrect format Attribute dependency Misspelling Value that should be in another column Missing value Invalid value
  • 31. Huawei Confidential 30 Data Conversion ๏ฌ After being preprocessed, the data needs to be converted into a representation form suitable for the machine learning model. Common data conversion forms include the following: ๏ฐ With respect to classification, category data is encoded into a corresponding numerical representation. ๏ฐ Value data is converted to category data to reduce the value of variables (for age segmentation). ๏ฐ Other data ๏ฎ In the text, the word is converted into a word vector through word embedding (generally using the word2vec model, BERT model, etc). ๏ฎ Process image data (color space, grayscale, geometric change, Haar feature, and image enhancement) ๏ฐ Feature engineering ๏ฎ Normalize features to ensure the same value ranges for input variables of the same model. ๏ฎ Feature expansion: Combine or convert existing variables to generate new features, such as the average.
  • 32. Huawei Confidential 31 Necessity of Feature Selection ๏ฌ Generally, a dataset has many features, some of which may be redundant or irrelevant to the value to be predicted. ๏ฌ Feature selection is necessary in the following aspects: Simplify models to make them easy for users to interpret Reduce the training time Avoid dimension explosion Improve model generalization and avoid overfitting
  • 33. Huawei Confidential 32 Feature Selection Methods - Filter ๏ฌ Filter methods are independent of the model during feature selection. Traverse all features Select the optimal feature subset Train models Evaluate the performance Procedure of a filter method By evaluating the correlation between each feature and the target attribute, these methods use a statistical measure to assign a value to each feature. Features are then sorted by score, which is helpful for preserving or eliminating specific features. Common methods โ€ข Pearson correlation coefficient โ€ข Chi-square coefficient โ€ข Mutual information Limitations โ€ข The filter method tends to select redundant variables as the relationship between features is not considered.
  • 34. Huawei Confidential 33 Feature Selection Methods - Wrapper ๏ฌ Wrapper methods use a prediction model to score feature subsets. Procedure of a wrapper method Wrapper methods consider feature selection as a search issue for which different combinations are evaluated and compared. A predictive model is used to evaluate a combination of features and assign a score based on model accuracy. Common methods โ€ข Recursive feature elimination (RFE) Limitations โ€ข Wrapper methods train a new model for each subset, resulting in a huge number of computations. โ€ข A feature set with the best performance is usually provided for a specific type of model. Traverse all features Generate a feature subset Train models Evaluate models Select the optimal feature subset
  • 35. Huawei Confidential 34 Feature Selection Methods - Embedded ๏ฌ Embedded methods consider feature selection as a part of model construction. The most common type of embedded feature selection method is the regularization method. Regularization methods are also called penalization methods that introduce additional constraints into the optimization of a predictive algorithm that bias the model toward lower complexity and reduce the number of features. Common methods โ€ข Lasso regression โ€ข Ridge regression Procedure of an embedded method Traverse all features Generate a feature subset Train models + Evaluate the effect Select the optimal feature subset
  • 36. Huawei Confidential 35 Overall Procedure of Building a Model Model Building Procedure Data splitting: Divide data into training sets, test sets, and validation sets. Model training: Use data that has been cleaned up and feature engineering to train a model. Model verification: Use validation sets to validate the model validity. Model fine-tuning: Continuously tune the model based on the actual data of a service scenario. Model deployment: Deploy the model in an actual production scenario. Model test: Use test data to evaluate the generalization capability of the model in a real environment. 1 2 3 4 5 6
  • 37. Huawei Confidential 36 Examples of Supervised Learning - Learning Phase ๏ฌ Use the classification model to predict whether a person is a basketball player. Task: Use a classification model to predict whether a person is a basketball player under a specific feature. Splitting (Cleansed features and tags) Model training Name City Age Label Mike Miami 42 yes Jerry New York 32 no Bryan Orlando 18 no Patricia Miami 45 yes Elodie Phoenix 35 no Remy Chicago 72 yes John New York 48 yes Target Feature (attribute) Training set The model searches for the relationship between features and targets. Test set Use new data to verify the model validity. Each feature or a combination of several features can provide a basis for a model to make a judgment. Service data
  • 38. Huawei Confidential 37 Unknown data Recent data, it is not known whether the people are basketball players. Possibility prediction Apply the model to the new data to predict whether the customer will change the supplier. Application model Examples of Supervised Learning - Prediction Phase New data New data Prediction data IF city = Miami โ†’ Probability = +0.7 IF city= Orlando โ†’ Probability = +0.2 IF age > 42 โ†’ Probability = +0.05*age + 0.06 IF age โ‰ค 42 โ†’ Probability = +0.01*age + 0.02 Name City Age Label Marine Miami 45 ? Julien Miami 52 ? Fred Orlando 20 ? Michelle Boston 34 ? Nicolas Phoenix 90 ? Name City Age Prediction Marine Miami 45 0.3 Julien Miami 52 0.9 Fred Orlando 20 0.6 Michelle Boston 34 0.5 Nicolas Phoenix 90 0.4
  • 39. Huawei Confidential 38 What Is a Good Model? โ€ข Generalization capability Can it accurately predict the actual service data? โ€ข Interpretability Is the prediction result easy to interpret? โ€ข Prediction speed How long does it take to predict each piece of data? โ€ข Practicability Is the prediction rate still acceptable when the service volume increases with a huge data volume?
  • 40. Huawei Confidential 39 Model Validity (1) ๏ฌ Generalization capability: The goal of machine learning is that the model obtained after learning should perform well on new samples, not just on samples used for training. The capability of applying a model to new samples is called generalization or robustness. ๏ฌ Error: difference between the sample result predicted by the model obtained after learning and the actual sample result. ๏ฐ Training error: error that you get when you run the model on the training data. ๏ฐ Generalization error: error that you get when you run the model on new samples. Obviously, we prefer a model with a smaller generalization error. ๏ฌ Underfitting: occurs when the model or the algorithm does not fit the data well enough. ๏ฌ Overfitting: occurs when the training error of the model obtained after learning is small but the generalization error is large (poor generalization capability).
  • 41. Huawei Confidential 40 Model Validity (2) ๏ฌ Model capacity: model's capability of fitting functions, which is also called model complexity. ๏ฐ When the capacity suits the task complexity and the amount of training data provided, the algorithm effect is usually optimal. ๏ฐ Models with insufficient capacity cannot solve complex tasks and underfitting may occur. ๏ฐ A high-capacity model can solve complex tasks, but overfitting may occur if the capacity is higher than that required by a task. Underfitting Not all features are learned. Good fitting Overfitting Noises are learned.
  • 42. Huawei Confidential 41 Overfitting Cause โ€” Error ๏ฌ Total error of final prediction = Bias2 + Variance + Irreducible error ๏ฌ Generally, the prediction error can be divided into two types: ๏ฐ Error caused by "bias" ๏ฐ Error caused by "variance" ๏ฌ Variance: ๏ฐ Offset of the prediction result from the average value ๏ฐ Error caused by the model's sensitivity to small fluctuations in the training set ๏ฌ Bias: ๏ฐ Difference between the expected (or average) prediction value and the correct value we are trying to predict. Variance Bias
  • 43. Huawei Confidential 42 Variance and Bias ๏ฌ Combinations of variance and bias are as follows: ๏ฐ Low bias & low variance โ€“> Good model ๏ฐ Low bias & high variance ๏ฐ High bias & low variance ๏ฐ High bias & high variance โ€“> Poor model ๏ฌ Ideally, we want a model that can accurately capture the rules in the training data and summarize the invisible data (new data). However, it is usually impossible for the model to complete both tasks at the same time.
  • 44. Huawei Confidential 43 High bias & low variance Low bias & high variance Model Complexity and Error ๏ฌ As the model complexity increases, the training error decreases. ๏ฌ As the model complexity increases, the test error decreases to a certain point and then increases in the reverse direction, forming a convex curve. Testing error Training error Error Model Complexity
  • 45. Huawei Confidential 44 Machine Learning Performance Evaluation - Regression ๏ฌ The closer the Mean Absolute Error (MAE) is to 0, the better the model can fit the training data. ๐‘€๐ด๐ธ = 1 m ๐‘–=1 ๐‘š ๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘– ๏ฌ Mean Square Error (MSE) ๐‘€๐‘†๐ธ = 1 m ๐‘–=1 m ๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘– 2 ๏ฌ The value range of R2 is (โ€“โˆž, 1]. A larger value indicates that the model can better fit the training data. TSS indicates the difference between samples. RSS indicates the difference between the predicted value and sample value. ๐‘…2 = 1 โˆ’ ๐‘…๐‘†๐‘† ๐‘‡๐‘†๐‘† = 1 โˆ’ ๐‘–=1 ๐‘š ๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘– 2 ๐‘–=1 ๐‘š ๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘– 2
  • 46. Huawei Confidential 45 Machine Learning Performance Evaluation - Classification (1) ๏ฌ Terms and definitions: ๏ฐ ๐‘ƒ: positive, indicating the number of real positive cases in the data. ๏ฐ ๐‘: negative, indicating the number of real negative cases in the data. ๏ฐ ๐‘‡P : true positive, indicating the number of positive cases that are correctly classified by the classifier. ๏ฐ ๐‘‡๐‘: true negative, indicating the number of negative cases that are correctly classified by the classifier. ๏ฐ ๐น๐‘ƒ: false positive, indicating the number of positive cases that are incorrectly classified by the classifier. ๏ฐ ๐น๐‘: false negative, indicating the number of negative cases that are incorrectly classified by the classifier. ๏ฌ Confusion matrix: at least an ๐‘š ร— ๐‘š table. ๐ถ๐‘€๐‘–,๐‘— of the first ๐‘š rows and ๐‘š columns indicates the number of cases that actually belong to class ๐‘– but are classified into class ๐‘— by the classifier. ๏ฐ Ideally, for a high accuracy classifier, most prediction values should be located in the diagonal from ๐ถ๐‘€1,1 to ๐ถ๐‘€๐‘š,๐‘š of the table while values outside the diagonal are 0 or close to 0. That is, ๐น๐‘ƒ and ๐น๐‘ƒ are close to 0. Confusion matrix Estimated amount Actual amount yes no Total yes ๐‘‡๐‘ƒ ๐น๐‘ ๐‘ƒ no ๐น๐‘ƒ ๐‘‡๐‘ ๐‘ Total ๐‘ƒโ€ฒ ๐‘โ€ฒ ๐‘ƒ + ๐‘
  • 47. Huawei Confidential 46 Machine Learning Performance Evaluation - Classification (2) Measurement Ratio Accuracy and recognition rate ๐‘‡๐‘ƒ + ๐‘‡๐‘ ๐‘ƒ + ๐‘ Error rate and misclassification rate ๐น๐‘ƒ + ๐น๐‘ ๐‘ƒ + ๐‘ Sensitivity, true positive rate, and recall ๐‘‡๐‘ƒ ๐‘ƒ Specificity and true negative rate ๐‘‡๐‘ ๐‘ Precision ๐‘‡๐‘ƒ ๐‘‡๐‘ƒ + ๐น๐‘ƒ ๐น1, harmonic mean of the recall rate and precision 2 ร— ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› ร— ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™ ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› + ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™ ๐น๐›ฝ, where ๐›ฝ is a non-negative real number (1 + ๐›ฝ2 ) ร— ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› ร— ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™ ๐›ฝ2 ร— ๐‘๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› + ๐‘Ÿ๐‘’๐‘๐‘Ž๐‘™๐‘™
  • 48. Huawei Confidential 47 Example of Machine Learning Performance Evaluation ๏ฌ We have trained a machine learning model to identify whether the object in an image is a cat. Now we use 200 pictures to verify the model performance. Among the 200 images, objects in 170 images are cats, while others are not. The identification result of the model is that objects in 160 images are cats, while others are not. Precision: ๐‘ƒ = ๐‘‡๐‘ƒ ๐‘‡๐‘ƒ+๐น๐‘ƒ = 140 140+20 = 87.5% Recall: ๐‘… = ๐‘‡๐‘ƒ ๐‘ƒ = 140 170 = 82.4% Accuracy: ๐ด๐ถ๐ถ = ๐‘‡๐‘ƒ+๐‘‡๐‘ ๐‘ƒ+๐‘ = 140+10 170+30 = 75% Estimated amount Actual amount ๐’š๐’†๐’” ๐’๐’ Total ๐‘ฆ๐‘’๐‘  140 30 170 ๐‘›๐‘œ 20 10 30 Total 160 40 200
  • 49. Huawei Confidential 48 Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study
  • 50. Huawei Confidential 49 Machine Learning Training Method - Gradient Descent (1) ๏ฌ The gradient descent method uses the negative gradient direction of the current position as the search direction, which is the steepest direction. The formula is as follows: ๏ฌ In the formula, ๐œ‚ indicates the learning rate and ๐‘– indicates the data record number ๐‘– . The weight parameter w indicates the change in each iteration. ๏ฌ Convergence: The value of the objective function changes very little, or the maximum number of iterations is reached. 1 ( ) k i k k w w w f x ๏จ ๏€ซ ๏€ฝ ๏€ญ ๏ƒ‘ Cost surface
  • 51. Huawei Confidential 50 Machine Learning Training Method - Gradient Descent (2) ๏ฌ Batch Gradient Descent (BGD) uses the samples (m in total) in all datasets to update the weight parameter based on the gradient value at the current point. ๏ฌ Stochastic Gradient Descent (SGD) randomly selects a sample in a dataset to update the weight parameter based on the gradient value at the current point. ๏ฌ Mini-Batch Gradient Descent (MBGD) combines the features of BGD and SGD and selects the gradients of n samples in a dataset to update the weight parameter. 1 1 1 ( ) ๏จ ๏€ซ ๏€ฝ ๏€ฝ ๏€ญ ๏ƒ‘ ๏ƒฅ k m i k k w i w w f x m 1 ( ) k i k k w w w f x ๏จ ๏€ซ ๏€ฝ ๏€ญ ๏ƒ‘ 1 1 t 1 ( ) k t n i k k w i w w f x n ๏จ ๏€ซ ๏€ญ ๏€ซ ๏€ฝ ๏€ฝ ๏€ญ ๏ƒ‘ ๏ƒฅ
  • 52. Huawei Confidential 51 Machine Learning Training Method - Gradient Descent (3) ๏ฌ Comparison of three gradient descent methods ๏ฐ In the SGD, samples selected for each training are stochastic. Such instability causes the loss function to be unstable or even causes reverse displacement when the loss function decreases to the lowest point. ๏ฐ BGD has the highest stability but consumes too many computing resources. MBGD is a method that balances SGD and BGD. BGD Uses all training samples for training each time. SGD Uses one training sample for training each time. MBGD Uses a certain number of training samples for training each time.
  • 53. Huawei Confidential 52 Parameters and Hyperparameters in Models ๏ฌ The model contains not only parameters but also hyperparameters. The purpose is to enable the model to learn the optimal parameters. ๏ฐ Parameters are automatically learned by models. ๏ฐ Hyperparameters are manually set. Training Use hyperparameters to control training. Model Model parameters are "distilled" from data.
  • 54. Huawei Confidential 53 Hyperparameters of a Model โ€ข Often used in model parameter estimation processes. โ€ข Often specified by the practitioner. โ€ข Can often be set using heuristics. โ€ข Often tuned for a given predictive modeling problem. Model hyperparameters are external configurations of models. โ€ข ฮป during Lasso/Ridge regression โ€ข Learning rate for training a neural network, number of iterations, batch size, activation function, and number of neurons โ€ข ๐ถ and ๐œŽ in support vector machines (SVM) โ€ข K in k-nearest neighbor (KNN) โ€ข Number of trees in a random forest Common model hyperparameters
  • 55. Huawei Confidential 54 Hyperparameter Search Procedure and Method Procedure for searching hyperparameters 1. Dividing a dataset into a training set, validation set, and test set. 2. Optimizing the model parameters using the training set based on the model performance indicators. 3. Searching for the model hyper-parameters using the validation set based on the model performance indicators. 4. Perform step 2 and step 3 alternately. Finally, determine the model parameters and hyperparameters and assess the model using the test set. Search algorithm (step 3) โ€ข Grid search โ€ข Random search โ€ข Heuristic intelligent search โ€ข Bayesian search
  • 56. Huawei Confidential 55 Hyperparameter Searching Method - Grid Search ๏ฌ Grid search attempts to exhaustively search all possible hyperparameter combinations to form a hyperparameter value grid. ๏ฌ In practice, the range of hyperparameter values to search is specified manually. ๏ฌ Grid search is an expensive and time-consuming method. ๏ฐ This method works well when the number of hyperparameters is relatively small. Therefore, it is applicable to generally machine learning algorithms but inapplicable to neural networks (see the deep learning part). Hyperparameter 1 0 1 Hyperparameter 2 5 1 2 3 4 5 Grid search 2 3 4
  • 57. Huawei Confidential 56 Hyperparameter Searching Method - Random Search ๏ฌ When the hyperparameter search space is large, random search is better than grid search. ๏ฌ In random search, each setting is sampled from the distribution of possible parameter values, in an attempt to find the best subset of hyperparameters. ๏ฌ Note: ๏ฐ Search is performed within a coarse range, which then will be narrowed based on where the best result appears. ๏ฐ Some hyperparameters are more important than others, and the search deviation will be affected during random search. Random search Parameter 1 Parameter 2
  • 58. Huawei Confidential 57 Cross Validation (1) ๏ฌ Cross validation: It is a statistical analysis method used to validate the performance of a classifier. The basic idea is to divide the original dataset into two parts: training set and validation set. Train the classifier using the training set and test the model using the validation set to check the classifier performance. ๏ฌ k-fold cross validation (๐‘ฒ โˆ’ ๐‘ช๐‘ฝ): ๏ฐ Divide the raw data into ๐‘˜ groups (generally, evenly divided). ๏ฐ Use each subset as a validation set, and use the other ๐‘˜ โˆ’ 1 subsets as the training set. A total of ๐‘˜ models can be obtained. ๏ฐ Use the mean classification accuracy of the final validation sets of ๐‘˜ models as the performance indicator of the ๐พ โˆ’ ๐ถ๐‘‰ classifier.
  • 59. Huawei Confidential 58 Cross Validation (2) ๏ฌ Note: The K value in K-fold cross validation is also a hyperparameter. Entire dataset Training set Test set Test set Training set Validation set
  • 60. Huawei Confidential 59 Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study
  • 61. Huawei Confidential 60 Machine Learning Algorithm Overview Supervised learning Unsupervised learning Clustering Classification Regression Machine learning Others SVM Naive Bayes KNN Linear regression Decision tree Neural network K-means Hierarchical clustering Neural network Logistic regression Decision tree Correlation rule Random forest Random forest Density-based clustering GBDT GBDT SVM Principal component analysis (PCA) Gaussian mixture model (GMM)
  • 62. Huawei Confidential 61 Linear Regression (1) ๏ฌ Linear regression: a statistical analysis method to determine the quantitative relationships between two or more variables through regression analysis in mathematical statistics. ๏ฌ Linear regression is a type of supervised learning. Unary linear regression Multi-dimensional linear regression
  • 63. Huawei Confidential 62 Linear Regression (2) ๏ฌ The model function of linear regression is as follows, where ๐‘ค indicates the weight parameter, ๐‘ indicates the bias, and ๐‘ฅ indicates the sample attribute. ๏ฌ The relationship between the value predicted by the model and actual value is as follows, where ๐‘ฆ indicates the actual value, and ๐œ€ indicates the error. ๏ฌ The error ๐œ€ is influenced by many factors independently. According to the central limit theorem, the error ๐œ€ follows normal distribution. According to the normal distribution function and maximum likelihood estimation, the loss function of linear regression is as follows: ๏ฌ To make the predicted value close to the actual value, we need to minimize the loss value. We can use the gradient descent method to calculate the weight parameter ๐‘ค when the loss function reaches the minimum, and then complete model building. ( ) T w h x w x b ๏€ฝ ๏€ซ ๏ฅ ๏€ฝ ๏€ซ ๏€ซ T y w x b ๏€จ ๏€ฉ 2 1 ( ) ( ) 2 w J w h x y m ๏€ฝ ๏€ญ ๏ƒฅ
  • 64. Huawei Confidential 63 Linear Regression Extension - Polynomial Regression ๏ฌ Polynomial regression is an extension of linear regression. Generally, the complexity of a dataset exceeds the possibility of fitting by a straight line. That is, obvious underfitting occurs if the original linear regression model is used. The solution is to use polynomial regression. Comparison between linear regression and polynomial regression 2 1 2 ( ) n w n h x w x w x w x b ๏€ฝ ๏€ซ ๏€ซ ๏€ซ ๏€ซ L ๏ฌ where, the nth power is a polynomial regression dimension (degree). ๏ฌ Polynomial regression belongs to linear regression as the relationship between its weight parameters ๐‘ค is still linear while its nonlinearity is reflected in the feature dimension.
  • 65. Huawei Confidential 64 Linear Regression and Overfitting Prevention ๏ฌ Regularization terms can be used to reduce overfitting. The value of ๐‘ค cannot be too large or too small in the sample space. You can add a square sum loss on the target function. ๏ฌ Regularization terms (norm): The regularization term here is called L2-norm. Linear regression that uses this loss function is also called Ridge regression. ๏ฌ Linear regression with absolute loss is called Lasso regression. ๏€จ ๏€ฉ 2 2 2 1 ( ) ( ) + 2 ๏ฌ ๏€ฝ ๏€ญ ๏ƒฅ ๏ƒฅ w J w h x y w m ๏€จ ๏€ฉ 2 1 1 ( ) ( ) + 2 ๏ฌ ๏€ฝ ๏€ญ ๏ƒฅ ๏ƒฅ w J w h x y w m
  • 66. Huawei Confidential 65 Logistic Regression (1) ๏ฌ Logistic regression: The logistic regression model is used to solve classification problems. The model is defined as follows: ๐‘ƒ ๐‘Œ = 1 ๐‘ฅ = ๐‘’๐‘ค๐‘ฅ+๐‘ 1 + ๐‘’๐‘ค๐‘ฅ+๐‘ ๐‘ƒ ๐‘Œ = 0 ๐‘ฅ = 1 1 + ๐‘’๐‘ค๐‘ฅ+๐‘ where ๐‘ค indicates the weight, ๐‘ indicates the bias, and ๐‘ค๐‘ฅ + ๐‘ is regarded as the linear function of ๐‘ฅ. Compare the preceding two probability values. The class with a higher probability value is the class of ๐‘ฅ.
  • 67. Huawei Confidential 66 Logistic Regression (2) ๏ฌ Both the logistic regression model and linear regression model are generalized linear models. Logistic regression introduces nonlinear factors (the sigmoid function) based on linear regression and sets thresholds, so it can deal with binary classification problems. ๏ฌ According to the model function of logistic regression, the loss function of logistic regression can be estimated as follows by using the maximum likelihood estimation: ๏ฌ where ๐‘ค indicates the weight parameter, ๐‘š indicates the number of samples, ๐‘ฅ indicates the sample, and ๐‘ฆ indicates the real value. The values of all the weight parameters ๐‘ค can also be obtained through the gradient descent algorithm. ๏€จ ๏€ฉ 1 ( ) ln ( ) (1 )ln(1 ( )) ๏€ฝ ๏€ญ ๏€ซ ๏€ญ ๏€ญ ๏ƒฅ w w J w y h x y h x m
  • 68. Huawei Confidential 67 Logistic Regression Extension - Softmax Function (1) ๏ฌ Logistic regression applies only to binary classification problems. For multi-class classification problems, use the Softmax function. Binary classification problem Female? Grape? Apple? Banana? Multi-class classification problem Male? Orange?
  • 69. Huawei Confidential 68 Logistic Regression Extension - Softmax Function (2) ๏ฌ Softmax regression is a generalization of logistic regression that we can use for K-class classification. ๏ฌ The Softmax function is used to map a K-dimensional vector of arbitrary real values to another K-dimensional vector of real values, where each vector element is in the interval (0, 1). ๏ฌ The regression probability function of Softmax is as follows: 1 ( | ; ) , 1,2 , T k T l l w w x K x e p y k x w k K e ๏€ฝ ๏€ฝ ๏€ฝ ๏€ฝ ๏ƒฅ L
  • 70. Huawei Confidential 69 Logistic Regression Extension - Softmax Function (3) ๏ฌ Softmax assigns a probability to each class in a multi-class problem. These probabilities must add up to 1. ๏ฐ Softmax may produce a form belonging to a particular class. Example: 0.09 0.22 0.68 0.01 Category โ€ข Sum of all probabilities: โ€ข 0.09 + 0.22 + 0.68 + 0.01 =1 โ€ข Most probably, this picture is an apple. Grape? Apple? Banana? Orange? Probability
  • 71. Huawei Confidential 70 Decision Tree ๏ฌ A decision tree is a tree structure (a binary tree or a non-binary tree). Each non-leaf node represents a test on a feature attribute. Each branch represents the output of a feature attribute in a certain value range, and each leaf node stores a category. To use the decision tree, start from the root node, test the feature attributes of the items to be classified, select the output branches, and use the category stored on the leaf node as the final result. Root Short Tall Long neck Short neck Can squeak Might be a squirrel Short nose Long nose In water On land Cannot squeak Might be a rat Might be a giraffe Might be an elephant Might be a rhinoceros Might be a hippo
  • 72. Huawei Confidential 71 Decision Tree Structure Root Node Internal Node Internal Node Internal Node Leaf Node Leaf Node Leaf Node Leaf Node Leaf Node Leaf Node
  • 73. Huawei Confidential 72 Key Points of Decision Tree Construction ๏ฌ To create a decision tree, we need to select attributes and determine the tree structure between feature attributes. The key step of constructing a decision tree is to divide data of all feature attributes, compare the result sets in terms of 'purity', and select the attribute with the highest 'purity' as the data point for dataset division. ๏ฌ The metrics to quantify the 'purity' include the information entropy and GINI Index. The formula is as follows: ๏ฌ where ๐‘๐‘˜ indicates the probability that the sample belongs to class k (there are K classes in total). A greater difference between purity before segmentation and that after segmentation indicates a better decision tree. ๏ฌ Common decision tree algorithms include ID3, C4.5, and CART. 2 1 ( )= - log ( ) K k k k H X p p ๏€ฝ ๏ƒฅ 2 1 1 K k k Gini p ๏€ฝ ๏€ฝ ๏€ญ ๏ƒฅ
  • 74. Huawei Confidential 73 Decision Tree Construction Process ๏ฌ Feature selection: Select a feature from the features of the training data as the split standard of the current node. (Different standards generate different decision tree algorithms.) ๏ฌ Decision tree generation: Generate internal node upside down based on the selected features and stop until the dataset can no longer be split. ๏ฌ Pruning: The decision tree may easily become overfitting unless necessary pruning (including pre-pruning and post-pruning) is performed to reduce the tree size and optimize its node structure.
  • 75. Huawei Confidential 74 Decision Tree Example ๏ฌ The following figure shows a classification when a decision tree is used. The classification result is impacted by three attributes: Refund, Marital Status, and Taxable Income. Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125,000 No 2 No Married 100,000 No 3 No Single 70,000 No 4 Yes Married 120,000 No 5 No Divorced 95,000 Yes 6 No Married 60,000 No 7 Yes Divorced 220,000 No 8 No Single 85,000 Yes 9 No Married 75,000 No 10 No Single 90,000 Yes Yes No No No Refund Marital Status Taxable Income
  • 76. Huawei Confidential 75 SVM ๏ฌ SVM is a binary classification model whose basic model is a linear classifier defined in the eigenspace with the largest interval. SVMs also include kernel tricks that make them nonlinear classifiers. The SVM learning algorithm is the optimal solution to convex quadratic programming. Projection Complex segmentation in low- dimensional space Easy segmentation in high-dimensional space
  • 77. Huawei Confidential 76 With binary classification Two-dimensional dataset Both the left and right methods can be used to divide datasets. Which of them is correct? or Linear SVM (1) ๏ฌ How do we split the red and blue datasets by a straight line?
  • 78. Huawei Confidential 77 Linear SVM (2) ๏ฌ Straight lines are used to divide data into different classes. Actually, we can use multiple straight lines to divide data. The core idea of the SVM is to find a straight line and keep the point close to the straight line as far as possible from the straight line. This can enable strong generalization capability of the model. These points are called support vectors. ๏ฌ In two-dimensional space, we use straight lines for segmentation. In high-dimensional space, we use hyperplanes for segmentation. Distance between support vectors is as far as possible
  • 79. Huawei Confidential 78 Linear SVM can function well for linear separable datasets. Nonlinear SVM (1) ๏ฌ How do we classify a nonlinear separable dataset? Nonlinear datasets cannot be split with straight lines.
  • 80. Huawei Confidential 79 Nonlinear SVM (2) ๏ฌ Kernel functions are used to construct nonlinear SVMs. ๏ฌ Kernel functions allow algorithms to fit the largest hyperplane in a transformed high- dimensional feature space. Input space High-dimensional feature space Linear kernel function Polynomial kernel function Gaussian kernel function Sigmoid kernel function Common kernel functions
  • 81. Huawei Confidential 80 KNN Algorithm (1) ๏ฌ The KNN classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms. According to this method, if the majority of k samples most similar to one sample (nearest neighbors in the eigenspace) belong to a specific category, this sample also belongs to this category. ? The target category of point ? varies with the number of the most adjacent nodes.
  • 82. Huawei Confidential 81 KNN Algorithm (2) ๏ฌ As the prediction result is determined based on the number and weights of neighbors in the training set, the KNN algorithm has a simple logic. ๏ฌ KNN is a non-parametric method which is usually used in datasets with irregular decision boundaries. ๏ฐ The KNN algorithm generally adopts the majority voting method for classification prediction and the average value method for regression prediction. ๏ฌ KNN requires a huge number of computations.
  • 83. Huawei Confidential 82 KNN Algorithm (3) ๏ฌ Generally, a larger k value reduces the impact of noise on classification, but obfuscates the boundary between classes. ๏ฐ A larger k value means a higher probability of underfitting because the segmentation is too rough. A smaller k value means a higher probability of overfitting because the segmentation is too refined. โ€ข The boundary becomes smoother as the value of k increases. โ€ข As the k value increases to infinity, all data points will eventually become all blue or all red.
  • 84. Huawei Confidential 83 Naive Bayes (1) ๏ฌ Naive Bayes algorithm: a simple multi-class classification algorithm based on the Bayes theorem. It assumes that features are independent of each other. For a given sample feature ๐‘‹, the probability that a sample belongs to a category ๐ป is: ๏ฐ ๐‘‹1, โ€ฆ , ๐‘‹๐‘› are data features, which are usually described by measurement values of m attribute sets. ๏ฎ For example, the color feature may have three attributes: red, yellow, and blue. ๏ฐ ๐ถ๐‘˜ indicates that the data belongs to a specific category ๐ถ ๏ฐ ๐‘ƒ ๐ถ๐‘˜|๐‘‹1, โ€ฆ , ๐‘‹๐‘› is a posterior probability, or a posterior probability of under condition ๐ถ๐‘˜. ๏ฐ ๐‘ƒ ๐ถ๐‘˜ is a prior probability that is independent of ๐‘‹1, โ€ฆ , ๐‘‹๐‘› ๏ฐ ๐‘ƒ ๐‘‹1, โ€ฆ , ๐‘‹๐‘› is the priori probability of ๐‘‹. ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ 1 n 1 n 1 n , , | | , , , , k k k P X X C P C P C X X P X X ๏‚ผ ๏‚ผ ๏€ฝ ๏‚ผ
  • 85. Huawei Confidential 84 Naive Bayes (2) ๏ฌ Independent assumption of features. ๏ฐ For example, if a fruit is red, round, and about 10 cm (3.94 in.) in diameter, it can be considered an apple. ๏ฐ A Naive Bayes classifier considers that each feature independently contributes to the probability that the fruit is an apple, regardless of any possible correlation between the color, roundness, and diameter.
  • 86. Huawei Confidential 85 Ensemble Learning ๏ฌ Ensemble learning is a machine learning paradigm in which multiple learners are trained and combined to solve the same problem. When multiple learners are used, the integrated generalization capability can be much stronger than that of a single learner. ๏ฌ If you ask a complex question to thousands of people at random and then summarize their answers, the summarized answer is better than an expert's answer in most cases. This is the wisdom of the masses. Dataset m Model 1 Model 2 Model m Model synthesis Large model Dataset 1 Dataset 2 Training set
  • 87. Huawei Confidential 86 Classification of Ensemble Learning Ensemble learning Bagging Boosting Bagging (Random Forest) โ€ข Independently builds several basic learners and then averages their predictions. โ€ข On average, a composite learner is usually better than a single-base learner because of a smaller variance. Boosting (Adaboost, GBDT, and XGboost) Constructs basic learners in sequence to gradually reduce the bias of a composite learner. The composite learner can fit data well, which may also cause overfitting.
  • 88. Huawei Confidential 87 Ensemble Methods in Machine Learning (1) ๏ฌ Random forest = Bagging + CART decision tree ๏ฌ Random forests build multiple decision trees and merge them together to make predictions more accurate and stable. ๏ฐ Random forests can be used for classification and regression problems. All training data Data subset 1 Bootstrap sampling Decision tree building Aggregation prediction result โ€ข Category: majority voting โ€ข Regression: average value Final prediction Prediction 2 Prediction Prediction n Prediction 1 Data subset 2 Data subset
  • 89. Huawei Confidential 88 Ensemble Methods in Machine Learning (2) ๏ฌ GBDT is a type of boosting algorithm. ๏ฌ For an aggregative mode, the sum of the results of all the basic learners equals the predicted value. In essence, the residual of the error function to the predicted value is fit by the next basic learner. (The residual is the error between the predicted value and the actual value.) ๏ฌ During model training, GBDT requires that the sample loss for model prediction be as small as possible. 30 years old 20 years old Prediction 10 years old Residual calculation 9 years old Prediction 1 year old Residual calculation Prediction 1 year old
  • 90. Huawei Confidential 89 Unsupervised Learning - K-means ๏ฌ K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. ๏ฌ For the k-means algorithm, specify the final number of clusters (k). Then, divide n data objects into k clusters. The clusters obtained meet the following conditions: (1) Objects in the same cluster are highly similar. (2) The similarity of objects in different clusters is small. x2 x1 x2 x1 The data is not tagged. K- means clustering can automatically classify datasets. K-means clustering
  • 91. Huawei Confidential 90 Unsupervised Learning - Hierarchical Clustering ๏ฌ Hierarchical clustering divides a dataset at different layers and forms a tree-like clustering structure. The dataset division may use a "bottom-up" aggregation policy, or a "top-down" splitting policy. The hierarchy of clustering is represented in a tree graph. The root is the unique cluster of all samples, and the leaves are the cluster of only a sample. Agglomerative hierarchical clustering Split hierarchical clustering
  • 92. Huawei Confidential 91 Contents 1. Machine Learning Definition 2. Machine Learning Types 3. Machine Learning Process 4. Other Key Machine Learning Methods 5. Common Machine Learning Algorithms 6. Case study
  • 93. Huawei Confidential 92 Comprehensive Case ๏ฌ Assume that there is a dataset containing the house areas and prices of 21,613 housing units sold in a city. Based on this data, we can predict the prices of other houses in the city. Dataset House Area Price 1,180 221,900 2,570 538,000 770 180,000 1,960 604,000 1,680 510,000 5,420 1,225,000 1,715 257,500 1,060 291,850 1,160 468,000 1,430 310,000 1,370 400,000 1,810 530,000 โ€ฆ โ€ฆ
  • 94. Huawei Confidential 93 Price House area Problem Analysis ๏ฌ This case contains a large amount of data, including input x (house area), and output y (price), which is a continuous value. We can use regression of supervised learning. Draw a scatter chart based on the data and use linear regression. ๏ฌ Our goal is to build a model function h(x) that infinitely approximates the function that expresses true distribution of the dataset. ๏ฌ Then, use the model to predict unknown price data. Unary linear regression function 1 ( ) o h x w w x ๏€ฝ ๏€ซ Learning algorithm h(x) Output x Feature: house area Dataset Input y Label: price
  • 95. Huawei Confidential 94 Price House area Goal of Linear Regression ๏ฌ Linear regression aims to find a straight line that best fits the dataset. ๏ฌ Linear regression is a parameter-based model. Here, we need learning parameters ๐‘ค0 and ๐‘ค1. When these two parameters are found, the best model appears. Which line is the best parameter? 1 ( ) o h x w w x ๏€ฝ ๏€ซ Price House area
  • 96. Huawei Confidential 95 Loss Function of Linear Regression ๏ฌ To find the optimal parameter, construct a loss function and find the parameter values when the loss function becomes the minimum. โ€ข where, m indicates the number of samples, โ€ข h(x) indicates the predicted value, and y indicates the actual value. Loss function of linear regression: Error Error Price House area Error Error ๏€จ ๏€ฉ 2 1 ( ) ( ) 2 J w h x y m ๏€ฝ ๏€ญ ๏ƒฅ Goal: ๏€จ ๏€ฉ 2 1 argmin ( ) ( ) 2 w J w h x y m ๏€ฝ ๏€ญ ๏ƒฅ
  • 97. Huawei Confidential 96 Gradient Descent Method ๏ฌ The gradient descent algorithm finds the minimum value of a function through iteration. ๏ฌ It aims to randomize an initial point on the loss function, and then find the global minimum value of the loss function based on the negative gradient direction. Such parameter value is the optimal parameter value. ๏ฐ Point A: the position of ๐‘ค0 and ๐‘ค1 after random initialization. ๐‘ค0 and ๐‘ค1 are the required parameters. ๏ฐ A-B connection line: a track formed based on descents in a negative gradient direction. Upon each descent, values ๐‘ค0 and ๐‘ค1 change, and the regression line also changes. ๏ฐ Point B: global minimum value of the loss function. Final values of ๐‘ค0 and ๐‘ค1 are also found. Cost surface
  • 98. Huawei Confidential 97 Iteration Example ๏ฌ The following is an example of a gradient descent iteration. We can see that as red points on the loss function surface gradually approach a lowest point, fitting of the linear regression red line with data becomes better and better. At this time, we can get the best parameters.
  • 99. Huawei Confidential 98 Model Debugging and Application ๏ฌ After the model is trained, test it with the test set to ensure the generalization capability. ๏ฌ If overfitting occurs, use Lasso regression or Ridge regression with regularization terms and tune the hyperparameters. ๏ฌ If underfitting occurs, use a more complex regression model, such as GBDT. ๏ฌ Note: ๏ฐ For real data, pay attention to the functions of data cleansing and feature engineering. Price House area ( ) 280.62 43581 h x x ๏€ฝ ๏€ญ The final model result is as follows:
  • 100. Huawei Confidential 99 Summary ๏ฌ First, this course describes the definition and classification of machine learning, as well as problems machine learning solves. Then, it introduces key knowledge points of machine learning, including the overall procedure (data collection, data cleansing, feature extraction, model training, model training and evaluation, and model deployment), common algorithms (linear regression, logistic regression, decision tree, SVM, naive Bayes, KNN, ensemble learning, K-means, etc.), gradient descent algorithm, parameters and hyper-parameters. ๏ฌ Finally, a complete machine learning process is presented by a case of using linear regression to predict house prices.