More Related Content
Similar to Chapter 4 microsoft azure machine learning studio
Similar to Chapter 4 microsoft azure machine learning studio (20)
Chapter 4 microsoft azure machine learning studio
- 1. Chapter 4
Microsoft Azure
Machine Learning Studio
The Presentation Slides for Teaching
Financial Regulations and Regulatory Technology
Website : https://sites.google.com/site/quanrisk
E-mail : quanrisk@gmail.com
Copyright © 2021 Dr. LAM Yat-fai
- 2. Declaration
Copyright © 2021 Dr. LAM Yat-fai
All rights reserved. No part of this presentation file may be
reproduced, in any form or by any means, without written
permission from Dr. LAM Yat-fai.
Authored by Dr. LAM Yat-fai (林日辉),
Chief Data Scientist, CapitaLogic Limited,
Adjunct Professor of Finance, City University of Hong Kong,
Doctor of Business Administration,
CFA, CAIA, CAMS, CFE, FRM, PRM, MCSE, MCNE.
Copyright © 2021 Dr. LAM Yat-fai 2
- 3. Outline
Monotonic causal relationship
Sample data set
Feature selection
Two class model
Prediction model
Regression model
Copyright © 2021 Dr. LAM Yat-fai 3
- 4. Monotonic causal relationship
Label
Response variable
Features
Explanatory variables
Noise
Unexplainable effect
1 2 3 N
x , x , x ,
y = F + N
… ,x oise
Copyright © 2021 Dr. LAM Yat-fai 4
- 5. Label
y
The value largely determined by the features
To be predicted today
Two class: Up, down
Multiple class: A, B, C, D
Any value: from -∞ to ∞
To be observable some times later after the
prediction
Copyright © 2021 Dr. LAM Yat-fai 5
- 6. Features
x1, x2, x3, … xN
Numeric
Largely determine the value of the label
Observable
Measureable
Monotonically related to the label
Copyright © 2021 Dr. LAM Yat-fai 6
- 7. Noise
Unobservable and/or immeasurable
Small noise
Most critical features are ready
Can explain the majority of the label
Large noise
Some critical features are missed
Fail to explain the majority of the label
Copyright © 2021 Dr. LAM Yat-fai 7
- 9. Not a monotonic relationship
x1 x2 y
+
↑
↑
- ↓
1 2
y = x × x
Copyright © 2021 Dr. LAM Yat-fai 9
- 11. Example
Label
The chance that a student can graduate successfully
from a university master programme
Features
Undergraduate results ↑
Financial resources ↑
Disability ↓
Noise
Pressure
Sickness
Luck
Copyright © 2021 Dr. LAM Yat-fai 11
- 12. Machine learning
Historical records
A set of data recording the label and features in the
past
Monotonic causal relationship
A hypothetical assumption
To be discovered by machine learning algorithms
Prediction
To estimate the label before it becomes observable
Copyright © 2021 Dr. LAM Yat-fai 12
- 16. Microsoft Machine Learning
Azure ML
Paid service
Integrating with all
Azure products
Highly technical
For real life application
Azure ML Studio (Classic)
Free service
Standalone
Easy to use
Easy to make mistake
For proof of concept
Copyright © 2021 Dr. LAM Yat-fai 16
- 17. Outline
Monotonic causal relationship
Sample data set
Feature selection
Two class model
Prediction model
Regression model
Copyright © 2021 Dr. LAM Yat-fai 17
- 18. Full data set
Data
A lot of historical records with correct values
Good records
Label
Major features monotonically impacting the label
Bad records
With extreme values of label and/or features
With missed label and/or features
Duplicated records
Copyright © 2021 Dr. LAM Yat-fai 18
- 19. Record preparation
Outliers
Largest or smallest 1% values of a feature/label
To be replaced with missing value “”
Missing values
To delete records with missing value
Duplicated records
To delete duplicated records
Random sample
Stratified sampling
Copyright © 2021 Dr. LAM Yat-fai 19
- 22. From Full data set
To Sample data set (1)
Copyright © 2021 Dr. LAM Yat-fai 22
- 23. From Full data set
To Sample data set (2)
Copyright © 2021 Dr. LAM Yat-fai 23
- 24. Prepare sample data set (1)
Dataset
Chapter 4a1 – Full data set.csv
Clip Values
Threshold Percentile
Substitute value Missing
List of columns x1,x2,x3,x4,x5,x6
Clean Missing Data
Columns to be cleaned y,x1,x2,x3,x4,x5,x6
Cleaning mode Remove entire row
Copyright © 2021 Dr. LAM Yat-fai 24
- 25. Prepare sample data set (2)
Remove Duplicate Rows
Key column selection y,x1,x2,x3,x4,x5,x6
Split Data
Fraction of rows 1
Split Data
Splitting mode Regular Expression
Regular Expression ”y”^0
Copyright © 2021 Dr. LAM Yat-fai 25
- 26. Prepare sample data set (3)
Partition and Sample
Number of rows 400
Add Rows
Chapter 4a2 – Sample data set
Convert to CSV
Copyright © 2021 Dr. LAM Yat-fai 26
- 27. Sample size
400 records in each category [0, 1]
200 records for training data set
To build the model
100 records for validation data set
To calibrate the best set of model parameters
100 records for testing data set
To assess the accuracy of the model
Copyright © 2021 Dr. LAM Yat-fai 27
Example 4.a.2
- 28. Outline
Monotonic causal relationship
Sample data set
Feature selection
Two class model
Prediction model
Regression model
Copyright © 2021 Dr. LAM Yat-fai 28
- 29. Monotonicity
Between the label and a feature
Quantified by the p-value
A smaller p-value suggests a stronger
monotonicity
To exclude weak monotonic features
Example 4.a.3
Copyright © 2021 Dr. LAM Yat-fai 29
- 30. p-value
2-mean t-test
p-value
< 5% suggests good monotonicity in general
Copyright © 2021 Dr. LAM Yat-fai 30
2 2
0 1
0 1
0 1
2
2 2
0 1
0 1
4 4
0 1
2 2
0 0 1 1
SD SD
Standard error = +
N N
x - x
t-statistic =
Standard error
SD SD
+
N N
df =
SD SD
+
N N -1 N N -1
p-value = TDIST ABS t-statistic ,df,2
- 31. Principal components
Principal components
Linearly transformed independent features
How many principal components are
sufficient?
Sum of eigenvalues > 95% is good in general
Example 4.a.4
Copyright © 2021 Dr. LAM Yat-fai 31
- 32. Features and principal components
Label
0, 1
Features
x1, x2, x4, x5, x6
Principal components
4
Copyright © 2021 Dr. LAM Yat-fai 32
- 33. Outline
Monotonic causal relationship
Sample data set
Feature selection
Two class model
Prediction model
Regression model
Copyright © 2021 Dr. LAM Yat-fai 33
- 34. From Sample data set
To Prediction model (1)
Copyright © 2021 Dr. LAM Yat-fai 34
- 35. From Sample data set
To Prediction model (2)
Copyright © 2021 Dr. LAM Yat-fai 35
- 36. Prediction model (1)
Dataset
Chapter 4a3 – Sample data set
Select Columns in Dataset
Select columns y,x1,x2,x4,x5,x6
Normalize Data
Columns to transform x1,x2,x4,x5,x6
Copyright © 2021 Dr. LAM Yat-fai 36
- 37. Prediction model (2)
Principal Component Analysis
Selected columns x1,x2,x4,x5,x6
Number of dimensions 4
Normalize dense columns Blank
Split Data × 2
Stratified split True
Stratification key column y
Two-Class Neural Network
Copyright © 2021 Dr. LAM Yat-fai 37
- 38. Prediction model (3)
Tune Model Hyperparameters
Label columns y
Score Model
Evaluate Model
Copyright © 2021 Dr. LAM Yat-fai 38
- 39. Normalization
To transform all features into a compatible
range
k k
k
k
x i - Average All x
z i =
S.D. All x
Copyright © 2021 Dr. LAM Yat-fai 39
- 45. Cutoff scores
Score
Probability of Label = 1
Unbiased cutoff score
50%
Upper cutoff score
The minimum score above which false positive is < 1%
Lower cutoff score
The maximum score below which false negative is < 1%
Data quality
Good: Large positive and negative zones, small noise
Bad: Small middle zone, large noise
Copyright © 2021 Dr. LAM Yat-fai 45
- 46. Four group classification
Upper group
Above upper cutoff score
Upper-middle group
Between upper and middle cutoff scores
Lower-middle group
Between middle and lower cutoff scores
Lower group
Below cutoff score
Copyright © 2021 Dr. LAM Yat-fai 46
- 48. Outline
Monotonic causal relationship
Sample data set
Feature selection
Two class model
Prediction model
Regression model
Copyright © 2021 Dr. LAM Yat-fai 48
- 54. Prediction with Excel for web
Register the Microsoft OneDrive with a personal
e-mail account (gmail, hotmail, qq)
Upload the Excel prediction model to OneDrive
Click on the prediction model to open the Excel
prediction model
Warning
Never register the Microsoft OneDrive using your
company or school e-mail address
Never use the Microsoft Excel desktop edition to
conduct prediction
Copyright © 2021 Dr. LAM Yat-fai 54
- 57. Outline
Monotonic causal relationship
Sample data set
Feature selection
Two class model
Prediction model
Regression model
Copyright © 2021 Dr. LAM Yat-fai 57
- 59. Prepare sample data set (1)
Dataset
Chapter 4b1 – Full data set.csv
Clip Values
Threshold Percentile
Substitute value Missing
List of columns y,x1,x2,x3,x4,x5,x6
Clean Missing Data
Columns to be cleaned y,x1,x2,x3,x4,x5,x6
Cleaning mode Remove entire row
Copyright © 2021 Dr. LAM Yat-fai 59
- 60. Prepare sample data set (2)
Remove Duplicate Rows
Key column selection y,x1,x2,x3,x4,x5,x6
Split Data
Fraction of rows 1
Partition and Sample
Number of rows 400
Convert to CSV
Copyright © 2021 Dr. LAM Yat-fai 60
- 61. p-value
Correlation t-test
p-value
< 5% suggests a good monotonicity in general
Copyright © 2021 Dr. LAM Yat-fai 61
2
1 - ρ
Standard error =
N - 2
ρ
t-statistic =
Standard error
df = N - 2
p-value = TDIST ABS t-statistic ,df,2
- 62. Two correlation tests
Pearson correlation coefficient
Rank correlation coefficient
Copyright © 2021 Dr. LAM Yat-fai 62
- 63. Sample size
400 records
200 records for training data set
To build the model
100 records for validation data set
To calibrate the best set of model parameters
100 records for testing data set
To assess the accuracy of the model
Copyright © 2021 Dr. LAM Yat-fai 63
Example 4.c.2
- 64. From Sample data set
To Prediction model (1)
Copyright © 2021 Dr. LAM Yat-fai 64
- 65. From Sample data set
To Prediction model (2)
Copyright © 2021 Dr. LAM Yat-fai 65
- 66. Prediction model (1)
Dataset
Chapter 4b3 – Sample data set
Select Columns in Dataset
Select columns y, x1, x2, x4, x5, x6
Normalize Data
Columns to transform y, x1, x2, x4, x5, x6
Copyright © 2021 Dr. LAM Yat-fai 66
- 67. Prediction model (2)
Principal Component Analysis
Selected columns x1,x2,x4,x5,x6
Number of dimensions 4
Normalize dense columns Blank
Split Data
Stratified split False
Linear Regression
Copyright © 2021 Dr. LAM Yat-fai 67
- 68. Prediction model (3)
Tune Model Hyperparameters
Label columns y
Score Model
Evaluate Model
Copyright © 2021 Dr. LAM Yat-fai 68
- 71. What is a liner regression?
A best fit straight line only
Copyright © 2021 Dr. LAM Yat-fai 71
- 72. Common issues
Excel prediction model does not work
Use the Excel web edition
Download a new prediction model Excel file
Copyright © 2021 Dr. LAM Yat-fai 72