SlideShare a Scribd company logo
1 of 72
Chapter 4
Microsoft Azure
Machine Learning Studio
The Presentation Slides for Teaching
Financial Regulations and Regulatory Technology
Website : https://sites.google.com/site/quanrisk
E-mail : quanrisk@gmail.com
Copyright © 2021 Dr. LAM Yat-fai
Declaration
 Copyright © 2021 Dr. LAM Yat-fai
 All rights reserved. No part of this presentation file may be
reproduced, in any form or by any means, without written
permission from Dr. LAM Yat-fai.
 Authored by Dr. LAM Yat-fai (林日辉),
Chief Data Scientist, CapitaLogic Limited,
Adjunct Professor of Finance, City University of Hong Kong,
Doctor of Business Administration,
CFA, CAIA, CAMS, CFE, FRM, PRM, MCSE, MCNE.
Copyright © 2021 Dr. LAM Yat-fai 2
Outline
 Monotonic causal relationship
 Sample data set
 Feature selection
 Two class model
 Prediction model
 Regression model
Copyright © 2021 Dr. LAM Yat-fai 3
Monotonic causal relationship
 Label
 Response variable
 Features
 Explanatory variables
 Noise
 Unexplainable effect
 
1 2 3 N
x , x , x ,
y = F + N
… ,x oise
Copyright © 2021 Dr. LAM Yat-fai 4
Label
 y
 The value largely determined by the features
 To be predicted today
 Two class: Up, down
 Multiple class: A, B, C, D
 Any value: from -∞ to ∞
 To be observable some times later after the
prediction
Copyright © 2021 Dr. LAM Yat-fai 5
Features
 x1, x2, x3, … xN
 Numeric
 Largely determine the value of the label
 Observable
 Measureable
 Monotonically related to the label
Copyright © 2021 Dr. LAM Yat-fai 6
Noise
 Unobservable and/or immeasurable
 Small noise
 Most critical features are ready
 Can explain the majority of the label
 Large noise
 Some critical features are missed
 Fail to explain the majority of the label
Copyright © 2021 Dr. LAM Yat-fai 7
A monotonic relationship
x1 x2 y
+
↑
↑
- ↑
1 2
y = x + x
Copyright © 2021 Dr. LAM Yat-fai 8
Not a monotonic relationship
x1 x2 y
+
↑
↑
- ↓
1 2
y = x × x
Copyright © 2021 Dr. LAM Yat-fai 9
A monotonic relationship
x2 x2 y
+
↑
↑
- ↓
2 2
1 2
y = x + x
Copyright © 2021 Dr. LAM Yat-fai 10
Example
 Label
 The chance that a student can graduate successfully
from a university master programme
 Features
 Undergraduate results ↑
 Financial resources ↑
 Disability ↓
 Noise
 Pressure
 Sickness
 Luck
Copyright © 2021 Dr. LAM Yat-fai 11
Machine learning
 Historical records
 A set of data recording the label and features in the
past
 Monotonic causal relationship
 A hypothetical assumption
 To be discovered by machine learning algorithms
 Prediction
 To estimate the label before it becomes observable
Copyright © 2021 Dr. LAM Yat-fai 12
Machine learning framework
Copyright © 2020 CapitaLogic Limited 13
Machine
learning
Historical
records
Monotonic
causal
relationship
Prediction
Machine learning approaches
Traditional
Rapid
development
Web
servcie
Development Personal computer Personal computer Web
Execution Personal computer
Personal computer
/ Web
Web
Programming Yes No No
Platform
Python, R,
Scikit-learn
RapidMiner,
Weka, Orange
Google, Amazon,
Microsoft, IBM
Target user Programmer Data scientist
Risk manager and
compliance officer
Copyright © 2021 Dr. LAM Yat-fai 14
Copyright © 2021 Dr. LAM Yat-fai 15
Microsoft Machine Learning
Azure ML
 Paid service
 Integrating with all
Azure products
 Highly technical
 For real life application
Azure ML Studio (Classic)
 Free service
 Standalone
 Easy to use
 Easy to make mistake
 For proof of concept
Copyright © 2021 Dr. LAM Yat-fai 16
Outline
 Monotonic causal relationship
 Sample data set
 Feature selection
 Two class model
 Prediction model
 Regression model
Copyright © 2021 Dr. LAM Yat-fai 17
Full data set
 Data
 A lot of historical records with correct values
 Good records
 Label
 Major features monotonically impacting the label
 Bad records
 With extreme values of label and/or features
 With missed label and/or features
 Duplicated records
Copyright © 2021 Dr. LAM Yat-fai 18
Record preparation
 Outliers
 Largest or smallest 1% values of a feature/label
 To be replaced with missing value “”
 Missing values
 To delete records with missing value
 Duplicated records
 To delete duplicated records
 Random sample
 Stratified sampling
Copyright © 2021 Dr. LAM Yat-fai 19
Sign in
Copyright © 2021 Dr. LAM Yat-fai 20
Datasets
Copyright © 2021 Dr. LAM Yat-fai 21
From Full data set
To Sample data set (1)
Copyright © 2021 Dr. LAM Yat-fai 22
From Full data set
To Sample data set (2)
Copyright © 2021 Dr. LAM Yat-fai 23
Prepare sample data set (1)
 Dataset
 Chapter 4a1 – Full data set.csv
 Clip Values
 Threshold Percentile
 Substitute value Missing
 List of columns x1,x2,x3,x4,x5,x6
 Clean Missing Data
 Columns to be cleaned y,x1,x2,x3,x4,x5,x6
 Cleaning mode Remove entire row
Copyright © 2021 Dr. LAM Yat-fai 24
Prepare sample data set (2)
 Remove Duplicate Rows
 Key column selection y,x1,x2,x3,x4,x5,x6
 Split Data
 Fraction of rows 1
 Split Data
 Splitting mode Regular Expression
 Regular Expression ”y”^0
Copyright © 2021 Dr. LAM Yat-fai 25
Prepare sample data set (3)
 Partition and Sample
 Number of rows 400
 Add Rows
 Chapter 4a2 – Sample data set
 Convert to CSV
Copyright © 2021 Dr. LAM Yat-fai 26
Sample size
 400 records in each category [0, 1]
 200 records for training data set
 To build the model
 100 records for validation data set
 To calibrate the best set of model parameters
 100 records for testing data set
 To assess the accuracy of the model
Copyright © 2021 Dr. LAM Yat-fai 27
Example 4.a.2
Outline
 Monotonic causal relationship
 Sample data set
 Feature selection
 Two class model
 Prediction model
 Regression model
Copyright © 2021 Dr. LAM Yat-fai 28
Monotonicity
 Between the label and a feature
 Quantified by the p-value
 A smaller p-value suggests a stronger
monotonicity
 To exclude weak monotonic features
Example 4.a.3
Copyright © 2021 Dr. LAM Yat-fai 29
p-value
 2-mean t-test
 p-value
 < 5% suggests good monotonicity in general
Copyright © 2021 Dr. LAM Yat-fai 30
   
 
 
2 2
0 1
0 1
0 1
2
2 2
0 1
0 1
4 4
0 1
2 2
0 0 1 1
SD SD
Standard error = +
N N
x - x
t-statistic =
Standard error
SD SD
+
N N
df =
SD SD
+
N N -1 N N -1
p-value = TDIST ABS t-statistic ,df,2
 
 
 
Principal components
 Principal components
 Linearly transformed independent features
 How many principal components are
sufficient?
 Sum of eigenvalues > 95% is good in general
Example 4.a.4
Copyright © 2021 Dr. LAM Yat-fai 31
Features and principal components
 Label
 0, 1
 Features
 x1, x2, x4, x5, x6
 Principal components
 4
Copyright © 2021 Dr. LAM Yat-fai 32
Outline
 Monotonic causal relationship
 Sample data set
 Feature selection
 Two class model
 Prediction model
 Regression model
Copyright © 2021 Dr. LAM Yat-fai 33
From Sample data set
To Prediction model (1)
Copyright © 2021 Dr. LAM Yat-fai 34
From Sample data set
To Prediction model (2)
Copyright © 2021 Dr. LAM Yat-fai 35
Prediction model (1)
 Dataset
 Chapter 4a3 – Sample data set
 Select Columns in Dataset
 Select columns y,x1,x2,x4,x5,x6
 Normalize Data
 Columns to transform x1,x2,x4,x5,x6
Copyright © 2021 Dr. LAM Yat-fai 36
Prediction model (2)
 Principal Component Analysis
 Selected columns x1,x2,x4,x5,x6
 Number of dimensions 4
 Normalize dense columns Blank
 Split Data × 2
 Stratified split True
 Stratification key column y
 Two-Class Neural Network
Copyright © 2021 Dr. LAM Yat-fai 37
Prediction model (3)
 Tune Model Hyperparameters
 Label columns y
 Score Model
 Evaluate Model
Copyright © 2021 Dr. LAM Yat-fai 38
Normalization
 To transform all features into a compatible
range
 
   
 
k k
k
k
x i - Average All x
z i =
S.D. All x
Copyright © 2021 Dr. LAM Yat-fai 39
Score Model
Copyright © 2021 Dr. LAM Yat-fai 40
Two group classification
41
Copyright © 2020 CapitaLogic Limited
Middle cutoff
Evaluate Model
Copyright © 2021 Dr. LAM Yat-fai 42
Model performance
Model prediction
1 0
Testing
data
1
True
positive
False
negative
0
False
positive
True
negative
Copyright © 2021 Dr. LAM Yat-fai 43
Three group classification
44
Copyright © 2020 CapitaLogic Limited
Lower cutoff
Upper cutoff
Cutoff scores
 Score
 Probability of Label = 1
 Unbiased cutoff score
 50%
 Upper cutoff score
 The minimum score above which false positive is < 1%
 Lower cutoff score
 The maximum score below which false negative is < 1%
 Data quality
 Good: Large positive and negative zones, small noise
 Bad: Small middle zone, large noise
Copyright © 2021 Dr. LAM Yat-fai 45
Four group classification
 Upper group
 Above upper cutoff score
 Upper-middle group
 Between upper and middle cutoff scores
 Lower-middle group
 Between middle and lower cutoff scores
 Lower group
 Below cutoff score
Copyright © 2021 Dr. LAM Yat-fai 46
Four group classification
47
Copyright © 2020 CapitaLogic Limited
Lower cutoff
Upper cutoff
Middle cutoff
Outline
 Monotonic causal relationship
 Sample data set
 Feature selection
 Two class model
 Prediction model
 Regression model
Copyright © 2021 Dr. LAM Yat-fai 48
Predictive Web Service
Copyright © 2021 Dr. LAM Yat-fai 49
Select Inputs
Copyright © 2021 Dr. LAM Yat-fai 50
Select Outputs
Copyright © 2021 Dr. LAM Yat-fai 51
Deploy Web Service
Copyright © 2021 Dr. LAM Yat-fai 52
Download Excel prediction model
Copyright © 2021 Dr. LAM Yat-fai 53
Prediction with Excel for web
 Register the Microsoft OneDrive with a personal
e-mail account (gmail, hotmail, qq)
 Upload the Excel prediction model to OneDrive
 Click on the prediction model to open the Excel
prediction model
 Warning
 Never register the Microsoft OneDrive using your
company or school e-mail address
 Never use the Microsoft Excel desktop edition to
conduct prediction
Copyright © 2021 Dr. LAM Yat-fai 54
Microsoft OneDrive
Copyright © 2021 Dr. LAM Yat-fai 55
Prediction
Copyright © 2021 Dr. LAM Yat-fai 56
Outline
 Monotonic causal relationship
 Sample data set
 Feature selection
 Two class model
 Prediction model
 Regression model
Copyright © 2021 Dr. LAM Yat-fai 57
Datasets
Copyright © 2021 Dr. LAM Yat-fai 58
Prepare sample data set (1)
 Dataset
 Chapter 4b1 – Full data set.csv
 Clip Values
 Threshold Percentile
 Substitute value Missing
 List of columns y,x1,x2,x3,x4,x5,x6
 Clean Missing Data
 Columns to be cleaned y,x1,x2,x3,x4,x5,x6
 Cleaning mode Remove entire row
Copyright © 2021 Dr. LAM Yat-fai 59
Prepare sample data set (2)
 Remove Duplicate Rows
 Key column selection y,x1,x2,x3,x4,x5,x6
 Split Data
 Fraction of rows 1
 Partition and Sample
 Number of rows 400
 Convert to CSV
Copyright © 2021 Dr. LAM Yat-fai 60
p-value
 Correlation t-test
 p-value
 < 5% suggests a good monotonicity in general
Copyright © 2021 Dr. LAM Yat-fai 61
 
 
2
1 - ρ
Standard error =
N - 2
ρ
t-statistic =
Standard error
df = N - 2
p-value = TDIST ABS t-statistic ,df,2
Two correlation tests
 Pearson correlation coefficient
 Rank correlation coefficient
Copyright © 2021 Dr. LAM Yat-fai 62
Sample size
 400 records
 200 records for training data set
 To build the model
 100 records for validation data set
 To calibrate the best set of model parameters
 100 records for testing data set
 To assess the accuracy of the model
Copyright © 2021 Dr. LAM Yat-fai 63
Example 4.c.2
From Sample data set
To Prediction model (1)
Copyright © 2021 Dr. LAM Yat-fai 64
From Sample data set
To Prediction model (2)
Copyright © 2021 Dr. LAM Yat-fai 65
Prediction model (1)
 Dataset
 Chapter 4b3 – Sample data set
 Select Columns in Dataset
 Select columns y, x1, x2, x4, x5, x6
 Normalize Data
 Columns to transform y, x1, x2, x4, x5, x6
Copyright © 2021 Dr. LAM Yat-fai 66
Prediction model (2)
 Principal Component Analysis
 Selected columns x1,x2,x4,x5,x6
 Number of dimensions 4
 Normalize dense columns Blank
 Split Data
 Stratified split False
 Linear Regression
Copyright © 2021 Dr. LAM Yat-fai 67
Prediction model (3)
 Tune Model Hyperparameters
 Label columns y
 Score Model
 Evaluate Model
Copyright © 2021 Dr. LAM Yat-fai 68
Score Model
Copyright © 2021 Dr. LAM Yat-fai 69
Evaluate Model
Copyright © 2021 Dr. LAM Yat-fai 70
What is a liner regression?
A best fit straight line only
Copyright © 2021 Dr. LAM Yat-fai 71
Common issues
 Excel prediction model does not work
 Use the Excel web edition
 Download a new prediction model Excel file
Copyright © 2021 Dr. LAM Yat-fai 72

More Related Content

What's hot

Software Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining TechniquesSoftware Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
 
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
 
Final Report
Final ReportFinal Report
Final Reportimu409
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge DiscoverySSSW
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Edureka!
 
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingIRJET Journal
 

What's hot (9)

B0930610
B0930610B0930610
B0930610
 
Software Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining TechniquesSoftware Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining Techniques
 
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...
 
Final Report
Final ReportFinal Report
Final Report
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
 
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 

Similar to Chapter 4 microsoft azure machine learning studio

Chapter 5 consumer lending
Chapter 5   consumer lendingChapter 5   consumer lending
Chapter 5 consumer lendingQuan Risk
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsFrancesca Lazzeri, PhD
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsChakkrit (Kla) Tantithamthavorn
 
Oracle analytics Live September 2021
Oracle analytics Live September 2021Oracle analytics Live September 2021
Oracle analytics Live September 2021Benjamin Arnulf
 
Optim test data management for IMS 2011
Optim test data management for IMS 2011Optim test data management for IMS 2011
Optim test data management for IMS 2011evgeni77
 
Big Data becomes Big Analysis
Big Data becomes Big Analysis Big Data becomes Big Analysis
Big Data becomes Big Analysis OSTHUS
 
Analytics, Data Science and A I Systems for Decision SupportE.docx
Analytics, Data Science and A I Systems for Decision SupportE.docxAnalytics, Data Science and A I Systems for Decision SupportE.docx
Analytics, Data Science and A I Systems for Decision SupportE.docxdaniahendric
 
Analytics, Data Science and A I Systems for Decision SupportE.docx
Analytics, Data Science and A I Systems for Decision SupportE.docxAnalytics, Data Science and A I Systems for Decision SupportE.docx
Analytics, Data Science and A I Systems for Decision SupportE.docxSHIVA101531
 
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...Mariana de Azevedo Santos
 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseDataWorks Summit
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Chakkrit (Kla) Tantithamthavorn
 
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...Association for Computational Linguistics
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Chapter 6 corporate lending
Chapter 6   corporate lendingChapter 6   corporate lending
Chapter 6 corporate lendingQuan Risk
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
 
Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008Eduardo Castro
 
Big data, data science & fast data
Big data, data science & fast dataBig data, data science & fast data
Big data, data science & fast dataKunal Joshi
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc
 

Similar to Chapter 4 microsoft azure machine learning studio (20)

Chapter 5 consumer lending
Chapter 5   consumer lendingChapter 5   consumer lending
Chapter 5 consumer lending
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
AI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOpsAI-Driven Software Quality Assurance in the Age of DevOps
AI-Driven Software Quality Assurance in the Age of DevOps
 
Oracle analytics Live September 2021
Oracle analytics Live September 2021Oracle analytics Live September 2021
Oracle analytics Live September 2021
 
dq_fail.pdf
dq_fail.pdfdq_fail.pdf
dq_fail.pdf
 
Optim test data management for IMS 2011
Optim test data management for IMS 2011Optim test data management for IMS 2011
Optim test data management for IMS 2011
 
Big Data becomes Big Analysis
Big Data becomes Big Analysis Big Data becomes Big Analysis
Big Data becomes Big Analysis
 
Analytics, Data Science and A I Systems for Decision SupportE.docx
Analytics, Data Science and A I Systems for Decision SupportE.docxAnalytics, Data Science and A I Systems for Decision SupportE.docx
Analytics, Data Science and A I Systems for Decision SupportE.docx
 
Analytics, Data Science and A I Systems for Decision SupportE.docx
Analytics, Data Science and A I Systems for Decision SupportE.docxAnalytics, Data Science and A I Systems for Decision SupportE.docx
Analytics, Data Science and A I Systems for Decision SupportE.docx
 
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
 
Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...Towards a Better Understanding of the Impact of Experimental Components on De...
Towards a Better Understanding of the Impact of Experimental Components on De...
 
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
Daniel Preotiuc-Pietro - 2015 - An analysis of the user occupational class th...
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Chapter 6 corporate lending
Chapter 6   corporate lendingChapter 6   corporate lending
Chapter 6 corporate lending
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
 
Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008
 
Big data, data science & fast data
Big data, data science & fast dataBig data, data science & fast data
Big data, data science & fast data
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 

More from Quan Risk

Chapter 1 the fatf's initiatives on aml
Chapter 1   the fatf's initiatives on amlChapter 1   the fatf's initiatives on aml
Chapter 1 the fatf's initiatives on amlQuan Risk
 
Chapter 10 control self-assessment
Chapter 10   control self-assessmentChapter 10   control self-assessment
Chapter 10 control self-assessmentQuan Risk
 
Chapter 9 private banking
Chapter 9   private bankingChapter 9   private banking
Chapter 9 private bankingQuan Risk
 
Chapter 8 career and professional development
Chapter 8   career and professional developmentChapter 8   career and professional development
Chapter 8 career and professional developmentQuan Risk
 
Chapter 7 regulatory technology
Chapter 7   regulatory technologyChapter 7   regulatory technology
Chapter 7 regulatory technologyQuan Risk
 
Chapter 6 aml compliance programme
Chapter 6   aml compliance programmeChapter 6   aml compliance programme
Chapter 6 aml compliance programmeQuan Risk
 
Chapter 5 internal investigation
Chapter 5   internal investigationChapter 5   internal investigation
Chapter 5 internal investigationQuan Risk
 
Chapter 4 supsicious transactions
Chapter 4   supsicious transactionsChapter 4   supsicious transactions
Chapter 4 supsicious transactionsQuan Risk
 
Chapter 3 know your customer
Chapter 3   know your customerChapter 3   know your customer
Chapter 3 know your customerQuan Risk
 
Chapter 2 the regulatory framework of aml
Chapter 2   the regulatory framework of amlChapter 2   the regulatory framework of aml
Chapter 2 the regulatory framework of amlQuan Risk
 
Chapter 6 career and professional development
Chapter 6   career and professional developmentChapter 6   career and professional development
Chapter 6 career and professional developmentQuan Risk
 
Chapter 5 financial compliance programme
Chapter 5   financial compliance programmeChapter 5   financial compliance programme
Chapter 5 financial compliance programmeQuan Risk
 
Chapter 4 securities and futures regulations
Chapter 4   securities and futures regulationsChapter 4   securities and futures regulations
Chapter 4 securities and futures regulationsQuan Risk
 
Chapter 3 insurance regulations
Chapter 3   insurance regulationsChapter 3   insurance regulations
Chapter 3 insurance regulationsQuan Risk
 
Chapter 2 banking regulations
Chapter 2   banking regulationsChapter 2   banking regulations
Chapter 2 banking regulationsQuan Risk
 
Chapter 1 financial regulations in hong kong
Chapter 1   financial regulations in hong kongChapter 1   financial regulations in hong kong
Chapter 1 financial regulations in hong kongQuan Risk
 
Chapter 10 aml technologies
Chapter 10   aml technologiesChapter 10   aml technologies
Chapter 10 aml technologiesQuan Risk
 
Chapter 9 anti-money laundering
Chapter 9   anti-money launderingChapter 9   anti-money laundering
Chapter 9 anti-money launderingQuan Risk
 
Chapter 7 algo trading and back testing
Chapter 7   algo trading and back testingChapter 7   algo trading and back testing
Chapter 7 algo trading and back testingQuan Risk
 
Chapter 6 machine learning regulatory technology
Chapter 6   machine learning regulatory technologyChapter 6   machine learning regulatory technology
Chapter 6 machine learning regulatory technologyQuan Risk
 

More from Quan Risk (20)

Chapter 1 the fatf's initiatives on aml
Chapter 1   the fatf's initiatives on amlChapter 1   the fatf's initiatives on aml
Chapter 1 the fatf's initiatives on aml
 
Chapter 10 control self-assessment
Chapter 10   control self-assessmentChapter 10   control self-assessment
Chapter 10 control self-assessment
 
Chapter 9 private banking
Chapter 9   private bankingChapter 9   private banking
Chapter 9 private banking
 
Chapter 8 career and professional development
Chapter 8   career and professional developmentChapter 8   career and professional development
Chapter 8 career and professional development
 
Chapter 7 regulatory technology
Chapter 7   regulatory technologyChapter 7   regulatory technology
Chapter 7 regulatory technology
 
Chapter 6 aml compliance programme
Chapter 6   aml compliance programmeChapter 6   aml compliance programme
Chapter 6 aml compliance programme
 
Chapter 5 internal investigation
Chapter 5   internal investigationChapter 5   internal investigation
Chapter 5 internal investigation
 
Chapter 4 supsicious transactions
Chapter 4   supsicious transactionsChapter 4   supsicious transactions
Chapter 4 supsicious transactions
 
Chapter 3 know your customer
Chapter 3   know your customerChapter 3   know your customer
Chapter 3 know your customer
 
Chapter 2 the regulatory framework of aml
Chapter 2   the regulatory framework of amlChapter 2   the regulatory framework of aml
Chapter 2 the regulatory framework of aml
 
Chapter 6 career and professional development
Chapter 6   career and professional developmentChapter 6   career and professional development
Chapter 6 career and professional development
 
Chapter 5 financial compliance programme
Chapter 5   financial compliance programmeChapter 5   financial compliance programme
Chapter 5 financial compliance programme
 
Chapter 4 securities and futures regulations
Chapter 4   securities and futures regulationsChapter 4   securities and futures regulations
Chapter 4 securities and futures regulations
 
Chapter 3 insurance regulations
Chapter 3   insurance regulationsChapter 3   insurance regulations
Chapter 3 insurance regulations
 
Chapter 2 banking regulations
Chapter 2   banking regulationsChapter 2   banking regulations
Chapter 2 banking regulations
 
Chapter 1 financial regulations in hong kong
Chapter 1   financial regulations in hong kongChapter 1   financial regulations in hong kong
Chapter 1 financial regulations in hong kong
 
Chapter 10 aml technologies
Chapter 10   aml technologiesChapter 10   aml technologies
Chapter 10 aml technologies
 
Chapter 9 anti-money laundering
Chapter 9   anti-money launderingChapter 9   anti-money laundering
Chapter 9 anti-money laundering
 
Chapter 7 algo trading and back testing
Chapter 7   algo trading and back testingChapter 7   algo trading and back testing
Chapter 7 algo trading and back testing
 
Chapter 6 machine learning regulatory technology
Chapter 6   machine learning regulatory technologyChapter 6   machine learning regulatory technology
Chapter 6 machine learning regulatory technology
 

Recently uploaded

Dividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxDividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxanshikagoel52
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...makika9823
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...ssifa0344
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdfAdnet Communications
 
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptxFinTech Belgium
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfAdnet Communications
 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfGale Pooley
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Roomdivyansh0kumar0
 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptxFinTech Belgium
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesMarketing847413
 
The Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdfThe Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdfGale Pooley
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...Suhani Kapoor
 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfGale Pooley
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfGale Pooley
 
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikHigh Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...Call Girls in Nagpur High Profile
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure servicePooja Nehwal
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdfFinTech Belgium
 
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Pooja Nehwal
 
The Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdfThe Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdfGale Pooley
 

Recently uploaded (20)

Dividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptxDividend Policy and Dividend Decision Theories.pptx
Dividend Policy and Dividend Decision Theories.pptx
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf
 
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdf
 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdf
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast Slides
 
The Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdfThe Economic History of the U.S. Lecture 21.pdf
The Economic History of the U.S. Lecture 21.pdf
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdf
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdf
 
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikHigh Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
 
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
 
The Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdfThe Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdf
 

Chapter 4 microsoft azure machine learning studio

  • 1. Chapter 4 Microsoft Azure Machine Learning Studio The Presentation Slides for Teaching Financial Regulations and Regulatory Technology Website : https://sites.google.com/site/quanrisk E-mail : quanrisk@gmail.com Copyright © 2021 Dr. LAM Yat-fai
  • 2. Declaration  Copyright © 2021 Dr. LAM Yat-fai  All rights reserved. No part of this presentation file may be reproduced, in any form or by any means, without written permission from Dr. LAM Yat-fai.  Authored by Dr. LAM Yat-fai (林日辉), Chief Data Scientist, CapitaLogic Limited, Adjunct Professor of Finance, City University of Hong Kong, Doctor of Business Administration, CFA, CAIA, CAMS, CFE, FRM, PRM, MCSE, MCNE. Copyright © 2021 Dr. LAM Yat-fai 2
  • 3. Outline  Monotonic causal relationship  Sample data set  Feature selection  Two class model  Prediction model  Regression model Copyright © 2021 Dr. LAM Yat-fai 3
  • 4. Monotonic causal relationship  Label  Response variable  Features  Explanatory variables  Noise  Unexplainable effect   1 2 3 N x , x , x , y = F + N … ,x oise Copyright © 2021 Dr. LAM Yat-fai 4
  • 5. Label  y  The value largely determined by the features  To be predicted today  Two class: Up, down  Multiple class: A, B, C, D  Any value: from -∞ to ∞  To be observable some times later after the prediction Copyright © 2021 Dr. LAM Yat-fai 5
  • 6. Features  x1, x2, x3, … xN  Numeric  Largely determine the value of the label  Observable  Measureable  Monotonically related to the label Copyright © 2021 Dr. LAM Yat-fai 6
  • 7. Noise  Unobservable and/or immeasurable  Small noise  Most critical features are ready  Can explain the majority of the label  Large noise  Some critical features are missed  Fail to explain the majority of the label Copyright © 2021 Dr. LAM Yat-fai 7
  • 8. A monotonic relationship x1 x2 y + ↑ ↑ - ↑ 1 2 y = x + x Copyright © 2021 Dr. LAM Yat-fai 8
  • 9. Not a monotonic relationship x1 x2 y + ↑ ↑ - ↓ 1 2 y = x × x Copyright © 2021 Dr. LAM Yat-fai 9
  • 10. A monotonic relationship x2 x2 y + ↑ ↑ - ↓ 2 2 1 2 y = x + x Copyright © 2021 Dr. LAM Yat-fai 10
  • 11. Example  Label  The chance that a student can graduate successfully from a university master programme  Features  Undergraduate results ↑  Financial resources ↑  Disability ↓  Noise  Pressure  Sickness  Luck Copyright © 2021 Dr. LAM Yat-fai 11
  • 12. Machine learning  Historical records  A set of data recording the label and features in the past  Monotonic causal relationship  A hypothetical assumption  To be discovered by machine learning algorithms  Prediction  To estimate the label before it becomes observable Copyright © 2021 Dr. LAM Yat-fai 12
  • 13. Machine learning framework Copyright © 2020 CapitaLogic Limited 13 Machine learning Historical records Monotonic causal relationship Prediction
  • 14. Machine learning approaches Traditional Rapid development Web servcie Development Personal computer Personal computer Web Execution Personal computer Personal computer / Web Web Programming Yes No No Platform Python, R, Scikit-learn RapidMiner, Weka, Orange Google, Amazon, Microsoft, IBM Target user Programmer Data scientist Risk manager and compliance officer Copyright © 2021 Dr. LAM Yat-fai 14
  • 15. Copyright © 2021 Dr. LAM Yat-fai 15
  • 16. Microsoft Machine Learning Azure ML  Paid service  Integrating with all Azure products  Highly technical  For real life application Azure ML Studio (Classic)  Free service  Standalone  Easy to use  Easy to make mistake  For proof of concept Copyright © 2021 Dr. LAM Yat-fai 16
  • 17. Outline  Monotonic causal relationship  Sample data set  Feature selection  Two class model  Prediction model  Regression model Copyright © 2021 Dr. LAM Yat-fai 17
  • 18. Full data set  Data  A lot of historical records with correct values  Good records  Label  Major features monotonically impacting the label  Bad records  With extreme values of label and/or features  With missed label and/or features  Duplicated records Copyright © 2021 Dr. LAM Yat-fai 18
  • 19. Record preparation  Outliers  Largest or smallest 1% values of a feature/label  To be replaced with missing value “”  Missing values  To delete records with missing value  Duplicated records  To delete duplicated records  Random sample  Stratified sampling Copyright © 2021 Dr. LAM Yat-fai 19
  • 20. Sign in Copyright © 2021 Dr. LAM Yat-fai 20
  • 21. Datasets Copyright © 2021 Dr. LAM Yat-fai 21
  • 22. From Full data set To Sample data set (1) Copyright © 2021 Dr. LAM Yat-fai 22
  • 23. From Full data set To Sample data set (2) Copyright © 2021 Dr. LAM Yat-fai 23
  • 24. Prepare sample data set (1)  Dataset  Chapter 4a1 – Full data set.csv  Clip Values  Threshold Percentile  Substitute value Missing  List of columns x1,x2,x3,x4,x5,x6  Clean Missing Data  Columns to be cleaned y,x1,x2,x3,x4,x5,x6  Cleaning mode Remove entire row Copyright © 2021 Dr. LAM Yat-fai 24
  • 25. Prepare sample data set (2)  Remove Duplicate Rows  Key column selection y,x1,x2,x3,x4,x5,x6  Split Data  Fraction of rows 1  Split Data  Splitting mode Regular Expression  Regular Expression ”y”^0 Copyright © 2021 Dr. LAM Yat-fai 25
  • 26. Prepare sample data set (3)  Partition and Sample  Number of rows 400  Add Rows  Chapter 4a2 – Sample data set  Convert to CSV Copyright © 2021 Dr. LAM Yat-fai 26
  • 27. Sample size  400 records in each category [0, 1]  200 records for training data set  To build the model  100 records for validation data set  To calibrate the best set of model parameters  100 records for testing data set  To assess the accuracy of the model Copyright © 2021 Dr. LAM Yat-fai 27 Example 4.a.2
  • 28. Outline  Monotonic causal relationship  Sample data set  Feature selection  Two class model  Prediction model  Regression model Copyright © 2021 Dr. LAM Yat-fai 28
  • 29. Monotonicity  Between the label and a feature  Quantified by the p-value  A smaller p-value suggests a stronger monotonicity  To exclude weak monotonic features Example 4.a.3 Copyright © 2021 Dr. LAM Yat-fai 29
  • 30. p-value  2-mean t-test  p-value  < 5% suggests good monotonicity in general Copyright © 2021 Dr. LAM Yat-fai 30         2 2 0 1 0 1 0 1 2 2 2 0 1 0 1 4 4 0 1 2 2 0 0 1 1 SD SD Standard error = + N N x - x t-statistic = Standard error SD SD + N N df = SD SD + N N -1 N N -1 p-value = TDIST ABS t-statistic ,df,2      
  • 31. Principal components  Principal components  Linearly transformed independent features  How many principal components are sufficient?  Sum of eigenvalues > 95% is good in general Example 4.a.4 Copyright © 2021 Dr. LAM Yat-fai 31
  • 32. Features and principal components  Label  0, 1  Features  x1, x2, x4, x5, x6  Principal components  4 Copyright © 2021 Dr. LAM Yat-fai 32
  • 33. Outline  Monotonic causal relationship  Sample data set  Feature selection  Two class model  Prediction model  Regression model Copyright © 2021 Dr. LAM Yat-fai 33
  • 34. From Sample data set To Prediction model (1) Copyright © 2021 Dr. LAM Yat-fai 34
  • 35. From Sample data set To Prediction model (2) Copyright © 2021 Dr. LAM Yat-fai 35
  • 36. Prediction model (1)  Dataset  Chapter 4a3 – Sample data set  Select Columns in Dataset  Select columns y,x1,x2,x4,x5,x6  Normalize Data  Columns to transform x1,x2,x4,x5,x6 Copyright © 2021 Dr. LAM Yat-fai 36
  • 37. Prediction model (2)  Principal Component Analysis  Selected columns x1,x2,x4,x5,x6  Number of dimensions 4  Normalize dense columns Blank  Split Data × 2  Stratified split True  Stratification key column y  Two-Class Neural Network Copyright © 2021 Dr. LAM Yat-fai 37
  • 38. Prediction model (3)  Tune Model Hyperparameters  Label columns y  Score Model  Evaluate Model Copyright © 2021 Dr. LAM Yat-fai 38
  • 39. Normalization  To transform all features into a compatible range         k k k k x i - Average All x z i = S.D. All x Copyright © 2021 Dr. LAM Yat-fai 39
  • 40. Score Model Copyright © 2021 Dr. LAM Yat-fai 40
  • 41. Two group classification 41 Copyright © 2020 CapitaLogic Limited Middle cutoff
  • 42. Evaluate Model Copyright © 2021 Dr. LAM Yat-fai 42
  • 43. Model performance Model prediction 1 0 Testing data 1 True positive False negative 0 False positive True negative Copyright © 2021 Dr. LAM Yat-fai 43
  • 44. Three group classification 44 Copyright © 2020 CapitaLogic Limited Lower cutoff Upper cutoff
  • 45. Cutoff scores  Score  Probability of Label = 1  Unbiased cutoff score  50%  Upper cutoff score  The minimum score above which false positive is < 1%  Lower cutoff score  The maximum score below which false negative is < 1%  Data quality  Good: Large positive and negative zones, small noise  Bad: Small middle zone, large noise Copyright © 2021 Dr. LAM Yat-fai 45
  • 46. Four group classification  Upper group  Above upper cutoff score  Upper-middle group  Between upper and middle cutoff scores  Lower-middle group  Between middle and lower cutoff scores  Lower group  Below cutoff score Copyright © 2021 Dr. LAM Yat-fai 46
  • 47. Four group classification 47 Copyright © 2020 CapitaLogic Limited Lower cutoff Upper cutoff Middle cutoff
  • 48. Outline  Monotonic causal relationship  Sample data set  Feature selection  Two class model  Prediction model  Regression model Copyright © 2021 Dr. LAM Yat-fai 48
  • 49. Predictive Web Service Copyright © 2021 Dr. LAM Yat-fai 49
  • 50. Select Inputs Copyright © 2021 Dr. LAM Yat-fai 50
  • 51. Select Outputs Copyright © 2021 Dr. LAM Yat-fai 51
  • 52. Deploy Web Service Copyright © 2021 Dr. LAM Yat-fai 52
  • 53. Download Excel prediction model Copyright © 2021 Dr. LAM Yat-fai 53
  • 54. Prediction with Excel for web  Register the Microsoft OneDrive with a personal e-mail account (gmail, hotmail, qq)  Upload the Excel prediction model to OneDrive  Click on the prediction model to open the Excel prediction model  Warning  Never register the Microsoft OneDrive using your company or school e-mail address  Never use the Microsoft Excel desktop edition to conduct prediction Copyright © 2021 Dr. LAM Yat-fai 54
  • 55. Microsoft OneDrive Copyright © 2021 Dr. LAM Yat-fai 55
  • 56. Prediction Copyright © 2021 Dr. LAM Yat-fai 56
  • 57. Outline  Monotonic causal relationship  Sample data set  Feature selection  Two class model  Prediction model  Regression model Copyright © 2021 Dr. LAM Yat-fai 57
  • 58. Datasets Copyright © 2021 Dr. LAM Yat-fai 58
  • 59. Prepare sample data set (1)  Dataset  Chapter 4b1 – Full data set.csv  Clip Values  Threshold Percentile  Substitute value Missing  List of columns y,x1,x2,x3,x4,x5,x6  Clean Missing Data  Columns to be cleaned y,x1,x2,x3,x4,x5,x6  Cleaning mode Remove entire row Copyright © 2021 Dr. LAM Yat-fai 59
  • 60. Prepare sample data set (2)  Remove Duplicate Rows  Key column selection y,x1,x2,x3,x4,x5,x6  Split Data  Fraction of rows 1  Partition and Sample  Number of rows 400  Convert to CSV Copyright © 2021 Dr. LAM Yat-fai 60
  • 61. p-value  Correlation t-test  p-value  < 5% suggests a good monotonicity in general Copyright © 2021 Dr. LAM Yat-fai 61     2 1 - ρ Standard error = N - 2 ρ t-statistic = Standard error df = N - 2 p-value = TDIST ABS t-statistic ,df,2
  • 62. Two correlation tests  Pearson correlation coefficient  Rank correlation coefficient Copyright © 2021 Dr. LAM Yat-fai 62
  • 63. Sample size  400 records  200 records for training data set  To build the model  100 records for validation data set  To calibrate the best set of model parameters  100 records for testing data set  To assess the accuracy of the model Copyright © 2021 Dr. LAM Yat-fai 63 Example 4.c.2
  • 64. From Sample data set To Prediction model (1) Copyright © 2021 Dr. LAM Yat-fai 64
  • 65. From Sample data set To Prediction model (2) Copyright © 2021 Dr. LAM Yat-fai 65
  • 66. Prediction model (1)  Dataset  Chapter 4b3 – Sample data set  Select Columns in Dataset  Select columns y, x1, x2, x4, x5, x6  Normalize Data  Columns to transform y, x1, x2, x4, x5, x6 Copyright © 2021 Dr. LAM Yat-fai 66
  • 67. Prediction model (2)  Principal Component Analysis  Selected columns x1,x2,x4,x5,x6  Number of dimensions 4  Normalize dense columns Blank  Split Data  Stratified split False  Linear Regression Copyright © 2021 Dr. LAM Yat-fai 67
  • 68. Prediction model (3)  Tune Model Hyperparameters  Label columns y  Score Model  Evaluate Model Copyright © 2021 Dr. LAM Yat-fai 68
  • 69. Score Model Copyright © 2021 Dr. LAM Yat-fai 69
  • 70. Evaluate Model Copyright © 2021 Dr. LAM Yat-fai 70
  • 71. What is a liner regression? A best fit straight line only Copyright © 2021 Dr. LAM Yat-fai 71
  • 72. Common issues  Excel prediction model does not work  Use the Excel web edition  Download a new prediction model Excel file Copyright © 2021 Dr. LAM Yat-fai 72