SlideShare a Scribd company logo
1 of 47
GEN101
Introductory
Artificial Intelligence
Artificial Intelligence Performance Evaluation
College of Engineering
1
Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Performance Evaluation
How to obtain reliable estimates?
Methods for Model Comparison
How to compare the relative performance among competing
models?
2
We have already seen: the Confusion Matrix
Most widely-used metric:
Metrics for Performance Evaluation
We will focus on the predictive capability of a model
Rather than how fast it takes to classify or build models,
scalability, etc.PREDICTED CLASS (AI)ACTUAL
CLASS
(Human)Class=YesClass=NoClass=Yesa (TP)b (FN)Class=Noc
(FP)d (TN)
TP: True Positive
FP: False Positive
TN: True Negative
FN: False Negative
Limitation of Accuracy
Consider a 2-class problem
Number of Class 0 (e.g., Benign) examples = 9990
Number of Class 1 (e.g., Malignant) examples = 10
If model predicts everything to be class 0, accuracy is
9990/10000 = 99.9 %
Accuracy is misleading because model does not detect any class
1 example
4
Other MeasuresPREDICTED CLASSACTUAL
CLASSClass=YesClass=NoClass=Yesa (TP)b (FN)Class=Noc
(FP)d (TN)
Precision-Recall
Two other metrics that are often used to quantify model
performance are precision and recall.
Precision is defined as the number of true positives divided by
the total number of positive predictions.
It quantifies what percentage of the positive predictions were
correct
How correct your model’s positive predictions were.
Recall/Sensitivity is defined as the number of true positives
divided by the total number of true positives and false negatives
(all actual positives)
It quantifies what percentage of the actual positives you were
able to identify
How sensitive your model was in identifying positives.
https://medium.com/@douglaspsteen/precision-recall-curves-
d32e5b290248
6
Precision-Recall
Precision =
Recall =
Sensitivity =
Specificity =
Out of all the examples that were predicted as positive, how
many are correct? How precise was the AI?
Out of all the positive examples, how many were caught
(recalled)? Did the AI miss?
Out of all the people that do have the disease, how many got
positive test results? Did the AI miss anyone?
Out of all the people that do NOT have the disease, how many
got negative results?
https://towardsdatascience.com/should-i-look-at-precision-
recall-or-specificity-sensitivity-3946158aace1
7
Precision-Recall
Example
In this example:
If the classifier predicts negative, you can trust it, the example
is negative. Our AI does not do a mistake in negative. It is
sensitive. However, pay attention, if the example is negative,
you can’t be sure it will predict it as negative
(specificity=78%).
If the classifier predicts positive, you can’t trust it
(precision=33%)
However, if the example is positive, you can trust the classifier
will find it (will not miss it) (recall=100%).
Precision: Is the green box pure?
Recall/Sensitivity: Did I lose any of the green balls to the red
box? Did I miss any greens? Am I so sensitive to greens that I
do not miss them?
Specificity: Are all the red balls in the red box?
8
Precision-Recall
Example
In this example:
Since the population is imbalanced:
The precision is relatively high
The recall is 100% because all the positive examples are
predicted as positive.
The specificity is 0% because no negative example is predicted
as negative.
9
Precision-Recall
Example
In this example:
If it predicts that an example is positive, you can trust it — it is
positive.
However, if it predicts it is negative, you can’t trust it, the
chances are that it is still positive.
Can be a useful classifier
10
Precision-Recall
Example
In this example:
The classifier detects all the positive examples as positive
It also detects all negative examples as negative.
All the measures are at 100%.
11
Why so many measures? Which is more important?
Because it depends
False positive is expensive. False alarm is a nightmare!
Precision is more important than recall
False negative is expensive. Missing one is a nightmare!
Recall is more important than precision
12
Precision-Recall Curves
The precision-recall curve is used for evaluating the
performance of binary classification algorithms.
They provide a graphical representation of a classifier’s
performance across many thresholds, rather than a single value
It is constructed by calculating and plotting the precision
against the recall for a single classifier at a variety of
thresholds.
It helps to visualize how the choice of threshold affects
classifier performance and can even help us select the best
threshold for a specific problem.
13
Precision-Recall Curves Interpretation
https://medium.com/@douglaspsteen/precision-recall-curves-
d32e5b290248
14
Precision-Recall Curves
A model that produces a precision-recall curve that is closer to
the top-right corner is better than a model that produces a
precision-recall curve that is skewed towards the bottom of the
plot.
https://paulvanderlaken.com/2019/08/16/roc-auc-precision-and-
recall-visually-explained/
15
Precision-Recall Curves
Example
Which of the following P-R curves produce represent a perfect
classifier?
(a)
(b)
(c)
https://towardsdatascience.com/gaining-an-intuitive-
understanding-of-precision-and-recall-3b9df37804a7
(b) -> perfect classifier
(a) -> very bad classifier (random classifier)
16
Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Performance Evaluation
How to obtain reliable estimates?
Methods for Model Comparison
How to compare the relative performance among competing
models?
17
Methods for Performance Evaluation
Performance of a model may depend on other factors besides the
learning algorithm:
How to obtain a reliable estimate of performance?
Methods of Estimation:
18
Class distribution
Size of training and test sets
Holdout
Reserve 2/3 for training and 1/3 for testing
Random subsampling
Repeated holdout
Cross validation
Partition data into k disjoint subsets
k-fold: train on k-1 partitions, test on the remaining one
Leave-one-out: k=n
Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Performance Evaluation
How to obtain reliable estimates?
Methods for Model Comparison
How to compare the relative performance among competing
models?
19
ROC (Receiver Operating Characteristic)
Developed in 1950s for signal detection theory to analyze noisy
signals
Characterize the trade-off between positive hits and false
alarms
ROC curve plots TP rate (y-axis) against FP rate (x-axis)
Performance of each classifier represented as a point on ROC
curve
changing the threshold of the algorithm, or sample distribution
changes the location of the point
TP rate (TPR) =
FP rate (TPR) =
20
ROC (Receiver Operating Characteristic)
1-dimensional data set containing 2 classes (positive and
negative)
- any points located at x > t is classified as positive
21
At threshold t:
TPR=0.5, FPR=0.12
ROC (Receiver Operating Characteristic)
(TPR,FPR):
(0,0): declare everything
to be negative class
(1,1): declare everything
to be positive class
(0,1): ideal
Diagonal line:
Random guessing
Below diagonal line:
prediction is opposite of the true class
Using ROC for Model Comparison
No model consistently outperform the other
M1 is better for small FPR
M2 is better for large FPR
Area Under the ROC curve
Ideal:
Area = 1
Random guess:
Area = 0.5
How to construct an ROC curveInstanceP(+|A)True
Class10.85+20.53+30.87-40.85-50.85-60.95+70.76-
80.93+90.43-100.25+
Use a classifier that produces a probability for each test
instance P(+|A) for each test A
Sort the instances according to P(+|A) in decreasing order
Apply threshold at each unique value of P(+|A) and count the
number of TP, FP, TN, FN at each threshold
TP rate (TPR) =
FP rate (TPR) =
How to construct an ROC curveInstanceP(+|A)True
Class10.95+20.93+30.87-40.85-50.85-60.85+70.76-
80.53+90.43-100.25+
Use a classifier that produces a probability for each test
instance P(+|A) for each test A
Sort the instances according to P(+|A) in decreasing order
Apply threshold at each unique value of P(+|A) and count the
number of TP, FP, TN, FN at each threshold
TP rate (TPR) =
FP rate (TPR) =
How to construct an ROC curve
Use a classifier that produces a probability for each test
instance P(+|A) for each test A
Sort the instances according to P(+|A) in decreasing order
Apply threshold at each unique value of P(+|A) and count the
number of TP, FP, TN, FN at each threshold
TP rate (TPR) =
FP rate (TPR) = #Threshold >=True Class
(Human)AITPFPTNFN10.95++105420.93+-205330.87--
214340.85--50.85--60.85+-332270.76--341280.53+-441190.43--
4501100.25+-5500
26
How to construct an ROC curveInstanceP(+|A)True
ClassTPFPTNFNFPRTPR10.95+105401/520.93+205302/530.87-
21431/52/540.85-50.85-60.85+33223/53/570.76-
34124/53/580.53+44114/54/590.43-450114/5100.25+550011
Breakout Session
How to construct an ROC curveInstanceP(+|A)True
ClassFPRTPR10.95+20.93+30.87+40.85+50.83+60.80-70.76-
80.53-90.43-100.25-
Example
How to construct an ROC curveInstanceP(+|A)True
ClassFPRTPR10.95+01/520.93+02/530.87+03/540.85+04/550.8
3+0160.80-1/5170.76-2/5180.53-3/5190.43-4/51100.25-11
Example
How to construct an ROC curveInstanceP(+|A)True
ClassFPRTPR10.95+01/520.93-1/51/530.87+1/52/540.85-
2/52/550.83+2/53/560.80-3/53/570.76+3/54/580.53-
4/54/590.43+4/51100.25-11
Example
ROC Interpretation
AUC (Area Under the Curve)
AUC stands for "Area under the ROC Curve."
It measures the entire two-dimensional area underneath the
entire ROC curve from (0,0) to (1,1)
It provides an aggregate measure of performance across all
possible classification thresholds.
AUC ranges in value from 0 to 1.
A model whose predictions are 100% wrong has an AUC of 0.0;
which means it has the worst measure of separability
A model whose predictions are 100% correct has an AUC of
1.0; which means it has a good measure of separability
When AUC is 0.5, it means the model has no class separation
capacity
33
AUC (Area Under the Curve)
https://paulvanderlaken.com/2019/08/16/roc-auc-precision-and-
recall-visually-explained/
34
AUC (Area Under the Curve) Interpretation
Example
Red distribution curve is of the positive class (patients with
disease) and the green distribution curve is of the negative
class(patients with no disease)
This is an ideal situation. When two curves don’t overlap at all
means model has an ideal measure of separability.
It is perfectly able to distinguish between positive class and
negative class.
https://towardsdatascience.com/understanding-auc-roc-curve-
68b2303cc9c5
35
AUC (Area Under the Curve) Interpretation
Example
When AUC is 0.7, it means there is a 70% chance that the
model will be able to distinguish between positive class and
negative class.
https://towardsdatascience.com/understanding-auc-roc-curve-
68b2303cc9c5
36
AUC (Area Under the Curve) Interpretation
Example
This is the worst situation. When AUC is approximately 0.5, the
model has no discrimination capacity to distinguish between
positive class and negative class.
https://towardsdatascience.com/understandi ng-auc-roc-curve-
68b2303cc9c5
37
AUC (Area Under the Curve) Interpretation
Example
When AUC is approximately 0, the model is actually
reciprocating the classes. It means the model is predicting a
negative class as a positive class and vice versa.
https://towardsdatascience.com/understanding-auc-roc-curve-
68b2303cc9c5
38
AUC (Area Under the Curve) Interpretation
Example
Which of the following ROC curves produce AUC values
greater than 0.5?
a
b
c
d
e
https://developers.google.com/machine-learning/crash-
course/classification/check-your-understanding-roc-and-auc
A -> This ROC curve has an AUC between 0.5 and 1.0, meaning
it ranks a random positive example higher than a random
negative example more than 50% of the time. Real-world binary
classification AUC values generally fall into this range.
E->Best possible ROC curve, as it ranks all positives above all
negatives. It has an AUC of 1.0.
In practice, if you have a "perfect" classifier with an AUC of
1.0, you should be suspicious, as it likely indicates a bug in
your model. For example, you may have overfit to your training
data, or the label data may be replicated in one of your features.
39
ROC vs. PRC
The main difference between ROC curves and precision-recall
curves is that the number of true-negative results is not used for
making a PRC
Curvex-axisy-
axisConceptCalculationConceptCalculationPrecision-recall
(PRC)RecallTP / (TP + FN)PrecisionTP / (TP + FP)Receiver
Operating Characteristics (ROC)False Positive RateFP / (FP +
TN)Recall
Sensitivity
True Positive RateTP / (TP + FN)
40
Intersection over Union (IoU) for object detection
http://arogozhnikov.github.io/2015/10/05/roc-curve.html
41
Is object detection classification or regression?
http://arogozhnikov.github.io/2015/10/05/roc-curve.html
42
Intersection over union (IoU) is an evaluation metric used to
measure the accuracy of an object detector on a particular
dataset.
It is a number from 0 to 1 that specifies the amount of overlap
between the predicted and ground truth bounding box.
It is known to be the most popular evaluation metric for tasks
such as segmentation, object detection and tracking.
What is Intersection over Union?
Definition
https://towardsdatascience.com/intersection-over-union-iou-
calculation-for-evaluating-an-image-segmentation-model-
8b22e2e84686
43
What is Intersection over Union?
Which one is best?
https://towardsdatascience.com/intersection-over-union-iou-
calculation-for-evaluating-an-image-segmentation-model-
8b22e2e84686
44
An IoU of 0 means that there is no overlap between the boxes
An IoU of 1 means that the union of the boxes is the same as
their overlap indicating that they are completely overlapping
The lower the IoU
The worse the prediction result
Intersection over Union
https://towardsdatascience.com/iou-a-better-detection-
evaluation-metric-45a511185be1
45
In order to apply Intersection over Union to evaluate an
(arbitrary) object detector we need:
The ground-truth bounding boxes (i.e., the hand labeled
bounding boxes that specify where in the image our object is).
The predicted bounding boxes from our model.
Intersection over Union
Your goal is to take the:
Training images
Bounding boxes
Then evaluate its performance on the testing set.
An Intersection over Union score > 0.5 is normally considered a
“good” prediction.
Intersection over Union
construct an object detector
Intersection over Union
48
Confidence Intervals
http://arogozhnikov.github.io/2015/10/05/roc-curve.html
49
Confidence, in statistics, is a way to describe probability.
Confidence interval is the range of values you expect your
estimate to fall between if you redo your test, within a certain
level of confidence.
For example: if you construct a confidence interval with a 95%
confidence level, you are confident that 95 out of 100 times the
estimate will fall between the upper and lower values specified
by the confidence interval.
Confidence Interval for Accuracy
Definition
Confidence level = 1 −α
https://www.scribbr.com/statistics/confidence-interval/
50
Prediction can be regarded as a Bernoulli trial
A Bernoulli trial has 2 possible outcomes
Possible outcomes for prediction: correct or wrong
Collection of Bernoulli trials has a Binomial distribution:
x ~ Bin(N, p) x: number of correct predictions
Example: Toss a fair coin 50 times, how many heads would
turn up?
Expected number of heads = Nxp = 50 x 0.5 = 25
Classification Accuracy =
Given x (# of correct predictions) or equivalently,
acc = x / N, and N (# of test instances),
Can we predict p (true accuracy of model)?
Confidence Interval for Accuracy
For large test sets (N > 30), acc has a normal distribution
with mean p and variance p(1-p) / N
Confidence Interval for p:
Confidence Interval for Accuracy
Consider a model that produces an accuracy of 80% when
evaluated on 100 test instances:
N=100, acc = 0.8
Let 1-α = 0.95 (95% confidence)
From probability table, Zα /2=1.96
Confidence Interval for Accuracy
ExampleN5010050010005000p(lower)0.6700.7110.7630.774
0.789p(upper)0.8880.8660.8330.8240.8111-αZ
0.992.580.982.330.951.960.901.65
Standard Normal distribution
Performance Metrics for Multicalss AI
Turn it into Binary Classification
Binary Classification Problem 1: red vs. blue
Binary Classification Problem 2: red vs. green
Binary Classification Problem 3: red vs. yellow
Binary Classification Problem 4: blue vs. green
Binary Classification Problem 5: blue vs. yellow
Binary Classification Problem 6: green vs. yellow
Binary Classification Problem 1: red vs [blue, green, yellow]
Binary Classification Problem 2: blue vs [red, green, yellow]
Binary Classification Problem 3: green vs [red, blue, yellow]
Binary Classification Problem 4: yellow vs [red, blue, green]
One vs. All (Rest)
One vs. One
-1)/2 binary classifiers
Performance Metrics for Multicalss AI
Turn it into Binary Classification
One vs. All (Rest)
One vs. One
55
Breakout Session
One vs. All (Rest)
57
One vs. One
58
FN
FP
TN
TP
TN
TP
d
c
b
a
d
a
+
+
+
+
=
+
+
+
+
=
Accuracy
c
b
a
a
p
r
rp
b
a
a
c
a
a
+
+
=
+
=
+
=
+
=
2
2
2
(F)
measure
-
F
(r)
Recall
(p)
Precision
d
w
c
w
b
w
a
w
d
w
a
w
4
3
2
1
4
1
Accuracy
Weighted
+
+
+
+
=
b
a
a
d
c
c
+
=
+
=
rate
positive
True
rate
positive
False
sheet1Name IDMajorTPFPTNFNAmal
Almansoori1078176Biomedical Engineering91100Precision-
Recall CurveCourse CodeGEN101InstanceP(+|A)True Class
(Human)Predicited (AI)TPFPTNFNPrecisionRecall (Sensitivity)
TPRFPRSpecificity/TNRACCF1Course
Name10.10BenignBenign101181.000.110.001.000.600.15Introd
uctory Artificial Intelligence Given the following test set of 20
cancer grades (Benign(+) and Malignant(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.20.87MalignantBenign111080.500.110.090.910.550.15Ass
ignment
No30.78BenignBenign211070.670.220.090.910.600.27340.98Ma
lignantBenign22970.500.220.180.820.550.27Assignment
Title50.36BenignBenign33860.500.330.270.730.550.35Performa
nce
Evaluation60.12MalignantBenign33860.500.330.270.730.550.35
Instructors70.85BenignBenign43850.570.440.270.730.600.42Pr
of. Mohammed
Ghazal80.87MalignantBenign44750.500.440.360.640.550.42Eng
. Maha
Yaghi90.69MalignantBenign45650.440.440.450.550.500.42Eng.
Malaz
Osman100.83MalignantBenign46550.400.440.550.450.450.42En
g. Marah
AlHalabi110.90BenignBenign66530.500.670.550.450.550.52En
g. Tasnim
Basmaji120.71BenignBenign66530.500.670.550.450.550.52Eng.
Yasmin Abu-
Haeyeh130.72MalignantBenign68330.430.670.730.270.450.521
40.93MalignantBenign68330.430.670.730.270.450.52150.94Mal
ignantBenign69230.400.670.820.180.400.52160.34MalignantBe
nign610130.380.670.910.090.350.52170.71BenignBenign71012
0.410.780.910.090.400.56Receiver Operating Characteristics
Curve180.69BenignBenign810110.440.890.910.090.450.59190.3
2BenignBenign910100.471.000.910.090.500.62+-
200.35MalignantBenign911000.451.001.000.000.450.62AUCSel
ect Your MajorWelcome to Assignment 3. Please follow the
instructions below to start working on your
assignment.0.5205ArchitectureGiven the following test set of 20
architectural styles (Cape_code(+) and Art_deco(-)). Calculate
the different performance measures (precision, recall, accuracy,
…). Construct the ROC and then calculate the
AUC.Cape_codeArt_deco00.250.360.130.350.360.120.320.380.3
60.720.750.930.670.300.860.460.750.910.720.36+-+-+-+++--+--
+---++AviationGiven the following test set of 20 airplane
failure reasons (Overload(+) and Design_flaw(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.OverloadDesign_flaw10.570.900.320.750.530.320.850.760
.340.770.730.610.770.210.730.450.930.240.280.99+-++++++---
+--+++---Biomedical EngineeringGiven the following test set of
20 cancer grades (Benign(+) and Malignant(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.BenignMalignant20.810.830.200.820.800.650.720.430.700
.720.850.360.330.410.740.360.450.750.480.53++---++-+--+++-
+-+--BusinessGiven the following test set of 20 bank accounts
(Investment(+) and Saving(-)). Calculate the different
performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.InvestmentSaving30.990.840.960.850.900.130.750.860.82
0.880.590.670.990.950.680.730.110.640.310.49+++-+--+++-+-
+-+-+--Chemical EngineeringGiven the following test set of 20
gas sensor responses (Linear(+) and Exponential(-)). Calculate
the different performance measures (precision, recall, accuracy,
…). Construct the ROC and then calculate the
AUC.LinearExponential40.450.420.791.000.720.840.480.540.90
0.550.540.520.100.640.660.310.130.790.960.56--+---+++++-+--
+-+--Civil EngineeringGiven the following test set of 20 house
structure conditions (Strong(+) and Weak(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.StrongWeak50.750.940.370.950.150.530.540.420.770.540.
600.150.730.300.970.480.740.760.290.55--+-+--++----+-+-
+++Computer EngineeringGiven the following test set of 20
CPU performances (Super(+) and Poor(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.SuperPoor60.100.870.780.980.360.120.850.870.690.830.9
00.710.720.930.940.340.710.690.320.35+-+-+-+---++----+++-
CybersecurityGiven the following test set of 20 CPU
performances (Super(+) and Poor(-)). Calculate the different
performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.SuperPoor70.470.350.990.180.850.190.720.580.840.720.1
80.600.500.510.340.860.910.720.760.75-+-++-++-+--++-++--
+Electrical EngineeringGiven the following test set of 20 power
outage types (Distribution(+) and Transmission(-)). Calculate
the different performance measures (precision, recall, accuracy,
…). Construct the ROC and then calculate the
AUC.DistributionTransmission80.530.390.910.600.700.730.530.
120.370.140.210.630.200.750.360.890.550.410.810.66--+----+-
++---++--+-HRGiven the following test set of 20 promotion
eligibilities (Eligible(+) and Not_eligible(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.EligibleNot_Eligible90.230.490.550.440.730.760.160.250.
540.790.790.190.490.120.720.200.770.540.660.18-+++-+--++-
++++---++Industrial EngineeringGiven the following test set of
20 steel plates faults (Stains(+) and Bumps(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.StainsBumpsInformation TechnologyGiven the following
test set of 20 CPU performances (Super(+) and Poor(-)).
Calculate the different performance measures (precision, recall,
accuracy, …). Construct the ROC and then calculate the
AUC.SuperPoorInterior DesignGiven the following test set of
20 home decors (Modern(+) and Traditional(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.ModernTraditionalMechanical EngineeringGiven the
following test set of 20 gear conditions (Normal(+) and
Damaged(-)). Calculate the different performance measures
(precision, recall, accuracy, …). Construct the ROC and then
calculate the AUC.NormalDamagedSoftware EngineeringGiven
the following test set of 20 CPU performances (Super(+) and
Poor(-)). Calculate the different performance measures
(precision, recall, accuracy, …). Construct the ROC and then
calculate the AUC.SuperPoor
1. Insert your name and student ID under Name and ID.
2. Select your major from the drop-down list.
This will generate a unique set of data for you only.
For a successful completion of the assignment, you need to fill
all bordered White cells.
Note that any similarity detected will be reported to the
Office of Academic Intergrity (OAI).
ROC
0 9.0909090909090912E-2 9.0909090909090912E-2
0.18181818181818182 0.27272727272727271
0.27272727272727271 0.27272727272727271
0.36363636363636365 0.45454545454545453
0.54545454545454541 0.54545454545454541
0.54545454545454541 0.72727272727272729
0.72727272727272729 0.81818181818181823
0.90909090909090906 0.90909090909090906
0.90909090909090906 0.90909090909090906 1
0.1111111111111111 0.1111111111111111
0.22222222222222221 0.22222222222222221
0.33333333333333331 0.33333333333333331
0.44444444444444442 0.44444444444444442
0.44444444444444442 0.44444444444444442
0.66666666 666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.77777777777777779 0.88888888888888884 1 1
FPR
TPR
3. Apply threshold at each unique value of P(+|A) to find the
predicted class (Predicted (AI)) for each instance.
Note: Make sure you do not change the data in the colored cells.
Hint: You may copy and paste the data (by value) to another
sheet then sort it and begin working on the performance
evaluation. Once done, you may fill in the bordered white cells
in this sheet with your final answers.
4. Count the number of TP, FP, TN, and FN at each threshold
and then calculate the different performance evaluation
measures for each instance. The formulas can be found in the
performance evaluation slides on blackboard.
5. Use the calculated values to plot the Precision-Recall curve
and the Reciever Operating Characteristics (ROC) curve.
6. Find the area under the curve (AUC).
7. Once completed, save the file and submit it on blackboard.
Sheet3TPFPTNFN91100InstanceP(+|A)True Class
(Human)Predicited (AI)TPFPTNFNPrecisionRecall (Sensitivity)
TPRFPRSpecificity/TNRACCF110.98BenignBenign101181.000.
110.001.000.600.1520.94MalignantBenign111080.500.110.090.
910.550.1530.93BenignBenign211070.670.220.090.910.600.274
0.90MalignantBenign22970.500.220.180.820.550.2750.87Benig
nBenign33860.500.330.270.730.550.3560.87MalignantBenign33
860.500.330.270.730.550.3570.85BenignBenign43850.570.440.
270.730.600.4280.83MalignantBenign44750.500.440.360.640.5
50.4290.78MalignantBenign45650.440.440.450.550.500.42100.
72MalignantBenign46550.400.440.550.450.450.42110.71Benign
Benign66530.500.670.550.450.550.52120.71BenignBenign6653
0.500.670.550.450.550.52130.69MalignantBenign68330.430.67
0.730.270.450.52140.69MalignantBenign68330.430.670.730.27
0.450.52150.36MalignantBenign69230.400.670.820.180.400.52
160.35MalignantBenign610130.380.670.910.090.350.52170.34B
enignBenign710120.410.780.910.090.400.56180.32BenignBenig
n810110.440.890.910.090.450.59190.12BenignBenign910100.47
1.000.910.090.500.62200.10MalignantBenign911000.451.001.0
00.000.450.62Model
1A10.01A20.04A30.025A40.1215A50.234A60.09AUC0.5205
Precision vs Recall Curve
0.1111111111111111 0.1111111111111111
0.22222222222222221 0.22222222222222221
0.33333333333333331 0.33333333333333331
0.44444444444444442 0.44444444444444442
0.44444444444444442 0.44444444444444442
0.66666666666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.77777777777777779 0.88888888888888884 1 1
1 0.5 0.66666666666666663 0.5 0.5 0.5
0.5714285714285714 0.5 0.44444444444444442
0.4 0.5 0.5 0.42857142857142855
0.42857142857142855 0.4 0.375
0.41176470588235292 0.44444444444444442
0.47368421052631576 0.45
Recall
Precision
ROC
0 9.0909090909090912E-2 9.0909090909090912E-2
0.18181818181818182 0.27272727272727271
0.27272727272727271 0.27272727272727271
0.36363636363636365 0.45454545454545453
0.54545454545454541 0.54545454545454541
0.54545454545454541 0.72727272727272729
0.72727272727272729 0.81818181818181823
0.90909090909090906 0.90909090909090906
0.90909090909090906 0.90909090909090906 1
0.1111111111111111 0.1111111111111111
0.22222222222222221 0.22222222222222221
0.33333333333333331 0.33333333333333331
0.44444444444444442 0.44444444444444442
0.44444444444444442 0.44444444444444442
0.66666666 666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.77777777777777779 0.88888888888888884 1 1
FPR
TPR
Assignment 3
Guidelines
Step 1: Study ALL Topics Covered So Far
Topic 7: Performance Evaluation
Advise: Redo all activities done in class and
uploaded on BlackBoard
Step 2: On Thursday May 6 at 12:01 am in the Assignments
Folder
Download the Excel File
GEN101 – Assignment 3.xlsm
Step 3: Download and Open the Excel file
The Filename: GEN101 – Assignment 3.xlsm
Disclaimer: No Macros are needed. Use a regular laptop and not
a mobile or tablet. Your computer,
Mac or PC, must have an Intel Chip. Apple M1 Chip (2021
MacBook Pros and Airs) do not yet support
certain Excel features
Step 4: Fill your name, ID, and select your major.
Write ID
Write Name
1
1
The question and data
depend on your ID and
major. No two students will
be getting the same data
and question. Any similarity
will be reported to the
Office of Academic Integrity.
Your assignment is unique
to you.
Select Major
1
2 Observe
2Observe
Step 5: Solve the Problem
Fill
Fill
Follow the post-it note
instructions and fill all the white
cells (boxes). If it is empty and
white, you are expected to fill it.
Use Excel to make the
calculations. Do not insert extra
rows or columns. Save your work!
Step 6: Submit Your Excel File Uncompressed before 11:59pm
on Saturday
Step 7: Wait for Grades
• Grades are final
• This assignment is individual. No cooperation is allowed with
anyone
• Late assignments receive a -50% penalty
• Assignments are not accepted after the last day of classes and
will
receive NO credit
• Anti-cheating measures are embedded within the Excel.
Cheating will
be reported
• You can submit multiple attempts
• Respondus Lockdown and Monitor will not be used for this
Assignment
sheet1Name IDMajorRawan salem
bajjabaa1076676ArchitecturePrecision-Recall CurveCourse
CodeGEN101InstanceP(+|A)True Class (Human)Predicited
(AI)TPFPTNFNPrecisionRecall (Sensitivity)
TPRFPRSpecificity/TNRACCF1Course
Name10.10Cape_codeIntroductory Artificial Intelligence Given
the following test set of 20 architectural styles (Cape_code(+)
and Art_deco(-)). Calculate the different performance measures
(precision, recall, accuracy, …). Construct the ROC and then
calculate the AUC.20.87Art_decoAssignment
No30.78Cape_code340.98Art_decoAssignment
Title50.36Cape_codePerformance
Evaluation60.12Art_decoInstructors70.85Cape_codeProf.
Mohammed Ghazal80.87Art_decoEng. Maha
Yaghi90.69Art_decoEng. Malaz Osman100.83Art_decoEng.
Marah AlHalabi110.90Cape_codeEng. Tasnim
Basmaji120.71Cape_codeEng. Yasmin Abu-
Haeyeh130.72Art_deco140.93Art_deco150.94Art_deco160.34Ar
t_deco170.71Cape_codeReceiver Operating Characteristics
Curve180.69Cape_code190.32Cape_code+-
200.35Art_decoAUCSelect Your MajorWelcome to Assignment
3. Please follow the instructions below to start working on your
assignment.ArchitectureGiven the following test set of 20
architectural styles (Cape_code(+) and Art_deco(-)). Calculate
the different performance measures (precision, recall, accuracy,
…). Construct the ROC and then calculate the
AUC.Cape_codeArt_deco00.250.360.130.350.360.120.320.380.3
60.720.750.930.670.300.860.460.750.910.720.36+-+-+-+++--+--
+---++AviationGiven the following test set of 20 airplane
failure reasons (Overload(+) and Design_flaw(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.OverloadDesign_flaw10.570.900.320.750.530.320.850.760
.340.770.730.610.770.210.730.450.930.240.280.99+-++++++---
+--+++---Biomedical EngineeringGiven the following test set of
20 cancer grades (Benign(+) and Malignant(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.BenignMalignant20.810.830.200.820.800.650.720.430.700
.720.850.360.330.410.740.360.450.750.480.53++---++-+--+++-
+-+--BusinessGiven the following test set of 20 bank accounts
(Investment(+) and Saving(-)). Calculate the different
performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.InvestmentSaving30.990.840.960.850.900.130.750.860.82
0.880.590.670.990.950.680.730.110.640.310.49+++-+--+++-+-
+-+-+--Chemical EngineeringGiven the following test set of 20
gas sensor responses (Linear(+) and Exponential(-)). Calculate
the different performance measures (precision, recall, accuracy,
…). Construct the ROC and then calculate the
AUC.LinearExponential40.450.420.791.000.720.840.480.540.90
0.550.540.520.100.640.660.310.130.790.960.56--+---+++++-+--
+-+--Civil EngineeringGiven the following test set of 20 house
structure conditions (Strong(+) and Weak(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.StrongWeak50.750.940.370.950.150.530.540.420.770.540.
600.150.730.300.970.480.740.760.290.55--+-+--++----+-+-
+++Computer EngineeringGiven the following test set of 20
CPU performances (Super(+) and Poor(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.SuperPoor60.100.870.780.980.360.120.850.870.690.830.9
00.710.720.930.940.340.710.690.320.35+-+-+-+---++----+++-
CybersecurityGiven the following test set of 20 CPU
performances (Super(+) and Poor(-)). Calculate the different
performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.SuperPoor70.470.350.990.180.850.190.720.580.840.720.1
80.600.500.510.340.860.910.720.760.75-+-++-++-+--++-++--
+Electrical EngineeringGiven the following test set of 20 power
outage types (Distribution(+) and Transmission(-)). Calculate
the different performance measures (precision, recall, accuracy,
…). Construct the ROC and then calculate the
AUC.DistributionTransmission80.530.390.910.600.700.730.530.
120.370.140.210.630.200.750.360.890.550.410.810.66--+----+-
++---++--+-HRGiven the following test set of 20 promotion
eligibilities (Eligible(+) and Not_eligible(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.EligibleNot_Eligible90.230.490.550.440.730.760.160.250.
540.790.790.190.490.120.720.200.770.540.660.18-+++-+--++-
++++---++Industrial EngineeringGiven the following test set of
20 steel plates faults (Stains(+) and Bumps(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.StainsBumpsInformation TechnologyGiven the following
test set of 20 CPU performances (Super(+) and Poor(-)).
Calculate the different performance measures (precision, recall,
accuracy, …). Construct the ROC and then calculate the
AUC.SuperPoorInterior DesignGiven the following test set of
20 home decors (Modern(+) and Traditional(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.ModernTraditionalMechanical EngineeringGiven the
following test set of 20 gear conditions (Normal(+) and
Damaged(-)). Calculate the different performance measures
(precision, recall, accuracy, …). Construct the ROC and then
calculate the AUC.NormalDamagedSoftware Engineeri ngGiven
the following test set of 20 CPU performances (Super(+) and
Poor(-)). Calculate the different performance measures
(precision, recall, accuracy, …). Construct the ROC and then
calculate the AUC.SuperPoor
1. Insert your name and student ID under Name and ID.
2. Select your major from the drop-down list.
This will generate a unique set of data for you only.
For a successful completion of the assignment, you need to fill
all bordered White cells.
Note that any similarity detected will be reported to the
Office of Academic Intergrity (OAI).
3. Apply threshold at each unique value of P(+|A) to find the
predicted class (Predicted (AI)) for each instance.
Note: Make sure you do not change the data in the colored cells.
Hint: You may copy and paste the data (by value) to another
sheet then sort it and begin working on the performance
evaluation. Once done, you may fill in the bordered white cells
in this sheet with your final answers.
4. Count the number of TP, FP, TN, and FN at each threshold
and then calculate the different performance evaluation
measures for each instance. The formulas can be found in the
performance evaluation slides on blackboard.
5. Use the calculated values to plot the Precision-Recall curve
and the Reciever Operating Characteristics (ROC) curve.
6. Find the area under the curve (AUC).
7. Once completed, save the file and submit it on blackboard.

More Related Content

More from MatthewTennant613

Assignment 4Project ProgressDue Week 9 and worth 200 points.docx
Assignment 4Project ProgressDue Week 9 and worth 200 points.docxAssignment 4Project ProgressDue Week 9 and worth 200 points.docx
Assignment 4Project ProgressDue Week 9 and worth 200 points.docxMatthewTennant613
 
Assignment 4 PresentationChoose any federal statute that is curre.docx
Assignment 4 PresentationChoose any federal statute that is curre.docxAssignment 4 PresentationChoose any federal statute that is curre.docx
Assignment 4 PresentationChoose any federal statute that is curre.docxMatthewTennant613
 
Assignment 4 The Perfect ManagerWrite a one to two (1–2) page pap.docx
Assignment 4 The Perfect ManagerWrite a one to two (1–2) page pap.docxAssignment 4 The Perfect ManagerWrite a one to two (1–2) page pap.docx
Assignment 4 The Perfect ManagerWrite a one to two (1–2) page pap.docxMatthewTennant613
 
Assignment 4 Presentation Choose any federal statute that is cu.docx
Assignment 4 Presentation Choose any federal statute that is cu.docxAssignment 4 Presentation Choose any federal statute that is cu.docx
Assignment 4 Presentation Choose any federal statute that is cu.docxMatthewTennant613
 
Assignment 4 Inmates Rights and Special CircumstancesDue Week 8 a.docx
Assignment 4 Inmates Rights and Special CircumstancesDue Week 8 a.docxAssignment 4 Inmates Rights and Special CircumstancesDue Week 8 a.docx
Assignment 4 Inmates Rights and Special CircumstancesDue Week 8 a.docxMatthewTennant613
 
Assignment 4 Part D Your Marketing Plan – Video Presentation.docx
Assignment 4 Part D Your Marketing Plan – Video Presentation.docxAssignment 4 Part D Your Marketing Plan – Video Presentation.docx
Assignment 4 Part D Your Marketing Plan – Video Presentation.docxMatthewTennant613
 
Assignment 4 DUE Friday 72117 @ 1100amTurn in a written respon.docx
Assignment 4 DUE Friday 72117 @ 1100amTurn in a written respon.docxAssignment 4 DUE Friday 72117 @ 1100amTurn in a written respon.docx
Assignment 4 DUE Friday 72117 @ 1100amTurn in a written respon.docxMatthewTennant613
 
Assignment 4 Database Modeling and NormalizationImagine that yo.docx
Assignment 4 Database Modeling and NormalizationImagine that yo.docxAssignment 4 Database Modeling and NormalizationImagine that yo.docx
Assignment 4 Database Modeling and NormalizationImagine that yo.docxMatthewTennant613
 
Assignment 3 Inductive and Deductive ArgumentsIn this assignment,.docx
Assignment 3 Inductive and Deductive ArgumentsIn this assignment,.docxAssignment 3 Inductive and Deductive ArgumentsIn this assignment,.docx
Assignment 3 Inductive and Deductive ArgumentsIn this assignment,.docxMatthewTennant613
 
Assignment 3 Wireless WorldWith the fast-moving technology, the w.docx
Assignment 3 Wireless WorldWith the fast-moving technology, the w.docxAssignment 3 Wireless WorldWith the fast-moving technology, the w.docx
Assignment 3 Wireless WorldWith the fast-moving technology, the w.docxMatthewTennant613
 
Assignment 3 Web Design Usability Guide PresentationBefore you .docx
Assignment 3 Web Design Usability Guide PresentationBefore you .docxAssignment 3 Web Design Usability Guide PresentationBefore you .docx
Assignment 3 Web Design Usability Guide PresentationBefore you .docxMatthewTennant613
 
Assignment 3 Understanding the Prevalence of Community PolicingAs.docx
Assignment 3 Understanding the Prevalence of Community PolicingAs.docxAssignment 3 Understanding the Prevalence of Community PolicingAs.docx
Assignment 3 Understanding the Prevalence of Community PolicingAs.docxMatthewTennant613
 
Assignment 3 The Value of Fair Treatment in the Workplace Due .docx
Assignment 3 The Value of Fair Treatment in the Workplace  Due .docxAssignment 3 The Value of Fair Treatment in the Workplace  Due .docx
Assignment 3 The Value of Fair Treatment in the Workplace Due .docxMatthewTennant613
 
Assignment 3 SummaryIn this assignment you will look for on.docx
Assignment 3 SummaryIn this assignment you will look for on.docxAssignment 3 SummaryIn this assignment you will look for on.docx
Assignment 3 SummaryIn this assignment you will look for on.docxMatthewTennant613
 
Assignment 3 Technology Integration Presentation Throughout thi.docx
Assignment 3 Technology Integration Presentation Throughout thi.docxAssignment 3 Technology Integration Presentation Throughout thi.docx
Assignment 3 Technology Integration Presentation Throughout thi.docxMatthewTennant613
 
Assignment 3 Secure Encrypted CommunicationsDue Week 9 and worth .docx
Assignment 3 Secure Encrypted CommunicationsDue Week 9 and worth .docxAssignment 3 Secure Encrypted CommunicationsDue Week 9 and worth .docx
Assignment 3 Secure Encrypted CommunicationsDue Week 9 and worth .docxMatthewTennant613
 
Assignment 3 Policy identificationAccording to the Counsel on Soc.docx
Assignment 3 Policy identificationAccording to the Counsel on Soc.docxAssignment 3 Policy identificationAccording to the Counsel on Soc.docx
Assignment 3 Policy identificationAccording to the Counsel on Soc.docxMatthewTennant613
 
Assignment 3 Law Enforcement ChallengesDue Week 8 and worth 170 p.docx
Assignment 3 Law Enforcement ChallengesDue Week 8 and worth 170 p.docxAssignment 3 Law Enforcement ChallengesDue Week 8 and worth 170 p.docx
Assignment 3 Law Enforcement ChallengesDue Week 8 and worth 170 p.docxMatthewTennant613
 
Assignment 3 Human Resources Planning and Employee RelationsD.docx
Assignment 3 Human Resources Planning and Employee RelationsD.docxAssignment 3 Human Resources Planning and Employee RelationsD.docx
Assignment 3 Human Resources Planning and Employee RelationsD.docxMatthewTennant613
 
Assignment 3 Graded Weekly Assignment Drafting Assignment—Motion t.docx
Assignment 3 Graded Weekly Assignment Drafting Assignment—Motion t.docxAssignment 3 Graded Weekly Assignment Drafting Assignment—Motion t.docx
Assignment 3 Graded Weekly Assignment Drafting Assignment—Motion t.docxMatthewTennant613
 

More from MatthewTennant613 (20)

Assignment 4Project ProgressDue Week 9 and worth 200 points.docx
Assignment 4Project ProgressDue Week 9 and worth 200 points.docxAssignment 4Project ProgressDue Week 9 and worth 200 points.docx
Assignment 4Project ProgressDue Week 9 and worth 200 points.docx
 
Assignment 4 PresentationChoose any federal statute that is curre.docx
Assignment 4 PresentationChoose any federal statute that is curre.docxAssignment 4 PresentationChoose any federal statute that is curre.docx
Assignment 4 PresentationChoose any federal statute that is curre.docx
 
Assignment 4 The Perfect ManagerWrite a one to two (1–2) page pap.docx
Assignment 4 The Perfect ManagerWrite a one to two (1–2) page pap.docxAssignment 4 The Perfect ManagerWrite a one to two (1–2) page pap.docx
Assignment 4 The Perfect ManagerWrite a one to two (1–2) page pap.docx
 
Assignment 4 Presentation Choose any federal statute that is cu.docx
Assignment 4 Presentation Choose any federal statute that is cu.docxAssignment 4 Presentation Choose any federal statute that is cu.docx
Assignment 4 Presentation Choose any federal statute that is cu.docx
 
Assignment 4 Inmates Rights and Special CircumstancesDue Week 8 a.docx
Assignment 4 Inmates Rights and Special CircumstancesDue Week 8 a.docxAssignment 4 Inmates Rights and Special CircumstancesDue Week 8 a.docx
Assignment 4 Inmates Rights and Special CircumstancesDue Week 8 a.docx
 
Assignment 4 Part D Your Marketing Plan – Video Presentation.docx
Assignment 4 Part D Your Marketing Plan – Video Presentation.docxAssignment 4 Part D Your Marketing Plan – Video Presentation.docx
Assignment 4 Part D Your Marketing Plan – Video Presentation.docx
 
Assignment 4 DUE Friday 72117 @ 1100amTurn in a written respon.docx
Assignment 4 DUE Friday 72117 @ 1100amTurn in a written respon.docxAssignment 4 DUE Friday 72117 @ 1100amTurn in a written respon.docx
Assignment 4 DUE Friday 72117 @ 1100amTurn in a written respon.docx
 
Assignment 4 Database Modeling and NormalizationImagine that yo.docx
Assignment 4 Database Modeling and NormalizationImagine that yo.docxAssignment 4 Database Modeling and NormalizationImagine that yo.docx
Assignment 4 Database Modeling and NormalizationImagine that yo.docx
 
Assignment 3 Inductive and Deductive ArgumentsIn this assignment,.docx
Assignment 3 Inductive and Deductive ArgumentsIn this assignment,.docxAssignment 3 Inductive and Deductive ArgumentsIn this assignment,.docx
Assignment 3 Inductive and Deductive ArgumentsIn this assignment,.docx
 
Assignment 3 Wireless WorldWith the fast-moving technology, the w.docx
Assignment 3 Wireless WorldWith the fast-moving technology, the w.docxAssignment 3 Wireless WorldWith the fast-moving technology, the w.docx
Assignment 3 Wireless WorldWith the fast-moving technology, the w.docx
 
Assignment 3 Web Design Usability Guide PresentationBefore you .docx
Assignment 3 Web Design Usability Guide PresentationBefore you .docxAssignment 3 Web Design Usability Guide PresentationBefore you .docx
Assignment 3 Web Design Usability Guide PresentationBefore you .docx
 
Assignment 3 Understanding the Prevalence of Community PolicingAs.docx
Assignment 3 Understanding the Prevalence of Community PolicingAs.docxAssignment 3 Understanding the Prevalence of Community PolicingAs.docx
Assignment 3 Understanding the Prevalence of Community PolicingAs.docx
 
Assignment 3 The Value of Fair Treatment in the Workplace Due .docx
Assignment 3 The Value of Fair Treatment in the Workplace  Due .docxAssignment 3 The Value of Fair Treatment in the Workplace  Due .docx
Assignment 3 The Value of Fair Treatment in the Workplace Due .docx
 
Assignment 3 SummaryIn this assignment you will look for on.docx
Assignment 3 SummaryIn this assignment you will look for on.docxAssignment 3 SummaryIn this assignment you will look for on.docx
Assignment 3 SummaryIn this assignment you will look for on.docx
 
Assignment 3 Technology Integration Presentation Throughout thi.docx
Assignment 3 Technology Integration Presentation Throughout thi.docxAssignment 3 Technology Integration Presentation Throughout thi.docx
Assignment 3 Technology Integration Presentation Throughout thi.docx
 
Assignment 3 Secure Encrypted CommunicationsDue Week 9 and worth .docx
Assignment 3 Secure Encrypted CommunicationsDue Week 9 and worth .docxAssignment 3 Secure Encrypted CommunicationsDue Week 9 and worth .docx
Assignment 3 Secure Encrypted CommunicationsDue Week 9 and worth .docx
 
Assignment 3 Policy identificationAccording to the Counsel on Soc.docx
Assignment 3 Policy identificationAccording to the Counsel on Soc.docxAssignment 3 Policy identificationAccording to the Counsel on Soc.docx
Assignment 3 Policy identificationAccording to the Counsel on Soc.docx
 
Assignment 3 Law Enforcement ChallengesDue Week 8 and worth 170 p.docx
Assignment 3 Law Enforcement ChallengesDue Week 8 and worth 170 p.docxAssignment 3 Law Enforcement ChallengesDue Week 8 and worth 170 p.docx
Assignment 3 Law Enforcement ChallengesDue Week 8 and worth 170 p.docx
 
Assignment 3 Human Resources Planning and Employee RelationsD.docx
Assignment 3 Human Resources Planning and Employee RelationsD.docxAssignment 3 Human Resources Planning and Employee RelationsD.docx
Assignment 3 Human Resources Planning and Employee RelationsD.docx
 
Assignment 3 Graded Weekly Assignment Drafting Assignment—Motion t.docx
Assignment 3 Graded Weekly Assignment Drafting Assignment—Motion t.docxAssignment 3 Graded Weekly Assignment Drafting Assignment—Motion t.docx
Assignment 3 Graded Weekly Assignment Drafting Assignment—Motion t.docx
 

GEN101IntroductoryArtificial Intelligence Arti

  • 1. GEN101 Introductory Artificial Intelligence Artificial Intelligence Performance Evaluation College of Engineering 1 Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to compare the relative performance among competing models?
  • 2. 2 We have already seen: the Confusion Matrix Most widely-used metric: Metrics for Performance Evaluation We will focus on the predictive capability of a model Rather than how fast it takes to classify or build models, scalability, etc.PREDICTED CLASS (AI)ACTUAL CLASS (Human)Class=YesClass=NoClass=Yesa (TP)b (FN)Class=Noc (FP)d (TN) TP: True Positive FP: False Positive TN: True Negative FN: False Negative Limitation of Accuracy Consider a 2-class problem Number of Class 0 (e.g., Benign) examples = 9990 Number of Class 1 (e.g., Malignant) examples = 10 If model predicts everything to be class 0, accuracy is
  • 3. 9990/10000 = 99.9 % Accuracy is misleading because model does not detect any class 1 example 4 Other MeasuresPREDICTED CLASSACTUAL CLASSClass=YesClass=NoClass=Yesa (TP)b (FN)Class=Noc (FP)d (TN) Precision-Recall Two other metrics that are often used to quantify model performance are precision and recall. Precision is defined as the number of true positives divided by the total number of positive predictions. It quantifies what percentage of the positive predictions were correct How correct your model’s positive predictions were. Recall/Sensitivity is defined as the number of true positives divided by the total number of true positives and false negatives (all actual positives) It quantifies what percentage of the actual positives you were
  • 4. able to identify How sensitive your model was in identifying positives. https://medium.com/@douglaspsteen/precision-recall-curves- d32e5b290248 6 Precision-Recall Precision = Recall = Sensitivity = Specificity = Out of all the examples that were predicted as positive, how many are correct? How precise was the AI? Out of all the positive examples, how many were caught (recalled)? Did the AI miss? Out of all the people that do have the disease, how many got positive test results? Did the AI miss anyone? Out of all the people that do NOT have the disease, how many got negative results?
  • 5. https://towardsdatascience.com/should-i-look-at-precision- recall-or-specificity-sensitivity-3946158aace1 7 Precision-Recall Example In this example: If the classifier predicts negative, you can trust it, the example is negative. Our AI does not do a mistake in negative. It is sensitive. However, pay attention, if the example is negative, you can’t be sure it will predict it as negative (specificity=78%). If the classifier predicts positive, you can’t trust it (precision=33%) However, if the example is positive, you can trust the classifier will find it (will not miss it) (recall=100%). Precision: Is the green box pure? Recall/Sensitivity: Did I lose any of the green balls to the red box? Did I miss any greens? Am I so sensitive to greens that I do not miss them? Specificity: Are all the red balls in the red box? 8 Precision-Recall
  • 6. Example In this example: Since the population is imbalanced: The precision is relatively high The recall is 100% because all the positive examples are predicted as positive. The specificity is 0% because no negative example is predicted as negative. 9 Precision-Recall Example In this example: If it predicts that an example is positive, you can trust it — it is positive. However, if it predicts it is negative, you can’t trust it, the chances are that it is still positive. Can be a useful classifier 10
  • 7. Precision-Recall Example In this example: The classifier detects all the positive examples as positive It also detects all negative examples as negative. All the measures are at 100%. 11 Why so many measures? Which is more important? Because it depends False positive is expensive. False alarm is a nightmare! Precision is more important than recall False negative is expensive. Missing one is a nightmare! Recall is more important than precision 12 Precision-Recall Curves The precision-recall curve is used for evaluating the performance of binary classification algorithms.
  • 8. They provide a graphical representation of a classifier’s performance across many thresholds, rather than a single value It is constructed by calculating and plotting the precision against the recall for a single classifier at a variety of thresholds. It helps to visualize how the choice of threshold affects classifier performance and can even help us select the best threshold for a specific problem. 13 Precision-Recall Curves Interpretation https://medium.com/@douglaspsteen/precision-recall-curves- d32e5b290248 14 Precision-Recall Curves A model that produces a precision-recall curve that is closer to the top-right corner is better than a model that produces a precision-recall curve that is skewed towards the bottom of the plot.
  • 9. https://paulvanderlaken.com/2019/08/16/roc-auc-precision-and- recall-visually-explained/ 15 Precision-Recall Curves Example Which of the following P-R curves produce represent a perfect classifier? (a) (b) (c) https://towardsdatascience.com/gaining-an-intuitive- understanding-of-precision-and-recall-3b9df37804a7 (b) -> perfect classifier (a) -> very bad classifier (random classifier) 16 Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
  • 10. Methods for Model Comparison How to compare the relative performance among competing models? 17 Methods for Performance Evaluation Performance of a model may depend on other factors besides the learning algorithm: How to obtain a reliable estimate of performance? Methods of Estimation: 18 Class distribution Size of training and test sets Holdout Reserve 2/3 for training and 1/3 for testing Random subsampling
  • 11. Repeated holdout Cross validation Partition data into k disjoint subsets k-fold: train on k-1 partitions, test on the remaining one Leave-one-out: k=n Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to compare the relative performance among competing models? 19
  • 12. ROC (Receiver Operating Characteristic) Developed in 1950s for signal detection theory to analyze noisy signals Characterize the trade-off between positive hits and false alarms ROC curve plots TP rate (y-axis) against FP rate (x-axis) Performance of each classifier represented as a point on ROC curve changing the threshold of the algorithm, or sample distribution changes the location of the point TP rate (TPR) = FP rate (TPR) = 20 ROC (Receiver Operating Characteristic) 1-dimensional data set containing 2 classes (positive and negative) - any points located at x > t is classified as positive 21 At threshold t:
  • 13. TPR=0.5, FPR=0.12 ROC (Receiver Operating Characteristic) (TPR,FPR): (0,0): declare everything to be negative class (1,1): declare everything to be positive class (0,1): ideal Diagonal line: Random guessing Below diagonal line: prediction is opposite of the true class Using ROC for Model Comparison No model consistently outperform the other M1 is better for small FPR M2 is better for large FPR Area Under the ROC curve Ideal: Area = 1 Random guess: Area = 0.5
  • 14. How to construct an ROC curveInstanceP(+|A)True Class10.85+20.53+30.87-40.85-50.85-60.95+70.76- 80.93+90.43-100.25+ Use a classifier that produces a probability for each test instance P(+|A) for each test A Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) and count the number of TP, FP, TN, FN at each threshold TP rate (TPR) = FP rate (TPR) = How to construct an ROC curveInstanceP(+|A)True Class10.95+20.93+30.87-40.85-50.85-60.85+70.76- 80.53+90.43-100.25+ Use a classifier that produces a probability for each test instance P(+|A) for each test A Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) and count the number of TP, FP, TN, FN at each threshold TP rate (TPR) = FP rate (TPR) = How to construct an ROC curve Use a classifier that produces a probability for each test
  • 15. instance P(+|A) for each test A Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) and count the number of TP, FP, TN, FN at each threshold TP rate (TPR) = FP rate (TPR) = #Threshold >=True Class (Human)AITPFPTNFN10.95++105420.93+-205330.87-- 214340.85--50.85--60.85+-332270.76--341280.53+-441190.43-- 4501100.25+-5500 26 How to construct an ROC curveInstanceP(+|A)True ClassTPFPTNFNFPRTPR10.95+105401/520.93+205302/530.87- 21431/52/540.85-50.85-60.85+33223/53/570.76- 34124/53/580.53+44114/54/590.43-450114/5100.25+550011 Breakout Session How to construct an ROC curveInstanceP(+|A)True ClassFPRTPR10.95+20.93+30.87+40.85+50.83+60.80-70.76- 80.53-90.43-100.25-
  • 16. Example How to construct an ROC curveInstanceP(+|A)True ClassFPRTPR10.95+01/520.93+02/530.87+03/540.85+04/550.8 3+0160.80-1/5170.76-2/5180.53-3/5190.43-4/51100.25-11 Example How to construct an ROC curveInstanceP(+|A)True ClassFPRTPR10.95+01/520.93-1/51/530.87+1/52/540.85- 2/52/550.83+2/53/560.80-3/53/570.76+3/54/580.53- 4/54/590.43+4/51100.25-11 Example ROC Interpretation AUC (Area Under the Curve) AUC stands for "Area under the ROC Curve." It measures the entire two-dimensional area underneath the entire ROC curve from (0,0) to (1,1)
  • 17. It provides an aggregate measure of performance across all possible classification thresholds. AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; which means it has the worst measure of separability A model whose predictions are 100% correct has an AUC of 1.0; which means it has a good measure of separability When AUC is 0.5, it means the model has no class separation capacity 33 AUC (Area Under the Curve) https://paulvanderlaken.com/2019/08/16/roc-auc-precision-and- recall-visually-explained/ 34 AUC (Area Under the Curve) Interpretation Example Red distribution curve is of the positive class (patients with disease) and the green distribution curve is of the negative
  • 18. class(patients with no disease) This is an ideal situation. When two curves don’t overlap at all means model has an ideal measure of separability. It is perfectly able to distinguish between positive class and negative class. https://towardsdatascience.com/understanding-auc-roc-curve- 68b2303cc9c5 35 AUC (Area Under the Curve) Interpretation Example When AUC is 0.7, it means there is a 70% chance that the model will be able to distinguish between positive class and negative class. https://towardsdatascience.com/understanding-auc-roc-curve- 68b2303cc9c5 36 AUC (Area Under the Curve) Interpretation Example This is the worst situation. When AUC is approximately 0.5, the model has no discrimination capacity to distinguish between positive class and negative class.
  • 19. https://towardsdatascience.com/understandi ng-auc-roc-curve- 68b2303cc9c5 37 AUC (Area Under the Curve) Interpretation Example When AUC is approximately 0, the model is actually reciprocating the classes. It means the model is predicting a negative class as a positive class and vice versa. https://towardsdatascience.com/understanding-auc-roc-curve- 68b2303cc9c5 38 AUC (Area Under the Curve) Interpretation Example Which of the following ROC curves produce AUC values greater than 0.5?
  • 20. a b c d e https://developers.google.com/machine-learning/crash- course/classification/check-your-understanding-roc-and-auc A -> This ROC curve has an AUC between 0.5 and 1.0, meaning it ranks a random positive example higher than a random negative example more than 50% of the time. Real-world binary classification AUC values generally fall into this range. E->Best possible ROC curve, as it ranks all positives above all negatives. It has an AUC of 1.0. In practice, if you have a "perfect" classifier with an AUC of 1.0, you should be suspicious, as it likely indicates a bug in your model. For example, you may have overfit to your training data, or the label data may be replicated in one of your features. 39 ROC vs. PRC The main difference between ROC curves and precision-recall curves is that the number of true-negative results is not used for making a PRC Curvex-axisy- axisConceptCalculationConceptCalculationPrecision-recall (PRC)RecallTP / (TP + FN)PrecisionTP / (TP + FP)Receiver Operating Characteristics (ROC)False Positive RateFP / (FP + TN)Recall Sensitivity True Positive RateTP / (TP + FN)
  • 21. 40 Intersection over Union (IoU) for object detection http://arogozhnikov.github.io/2015/10/05/roc-curve.html 41 Is object detection classification or regression? http://arogozhnikov.github.io/2015/10/05/roc-curve.html 42 Intersection over union (IoU) is an evaluation metric used to measure the accuracy of an object detector on a particular dataset. It is a number from 0 to 1 that specifies the amount of overlap between the predicted and ground truth bounding box. It is known to be the most popular evaluation metric for tasks such as segmentation, object detection and tracking.
  • 22. What is Intersection over Union? Definition https://towardsdatascience.com/intersection-over-union-iou- calculation-for-evaluating-an-image-segmentation-model- 8b22e2e84686 43 What is Intersection over Union? Which one is best? https://towardsdatascience.com/intersection-over-union-iou- calculation-for-evaluating-an-image-segmentation-model- 8b22e2e84686 44 An IoU of 0 means that there is no overlap between the boxes An IoU of 1 means that the union of the boxes is the same as their overlap indicating that they are completely overlapping The lower the IoU The worse the prediction result
  • 23. Intersection over Union https://towardsdatascience.com/iou-a-better-detection- evaluation-metric-45a511185be1 45 In order to apply Intersection over Union to evaluate an (arbitrary) object detector we need: The ground-truth bounding boxes (i.e., the hand labeled bounding boxes that specify where in the image our object is). The predicted bounding boxes from our model. Intersection over Union Your goal is to take the: Training images Bounding boxes Then evaluate its performance on the testing set. An Intersection over Union score > 0.5 is normally considered a
  • 24. “good” prediction. Intersection over Union construct an object detector Intersection over Union 48 Confidence Intervals
  • 25. http://arogozhnikov.github.io/2015/10/05/roc-curve.html 49 Confidence, in statistics, is a way to describe probability. Confidence interval is the range of values you expect your estimate to fall between if you redo your test, within a certain level of confidence. For example: if you construct a confidence interval with a 95% confidence level, you are confident that 95 out of 100 times the estimate will fall between the upper and lower values specified by the confidence interval. Confidence Interval for Accuracy Definition Confidence level = 1 −α https://www.scribbr.com/statistics/confidence-interval/ 50 Prediction can be regarded as a Bernoulli trial A Bernoulli trial has 2 possible outcomes Possible outcomes for prediction: correct or wrong Collection of Bernoulli trials has a Binomial distribution: x ~ Bin(N, p) x: number of correct predictions Example: Toss a fair coin 50 times, how many heads would turn up? Expected number of heads = Nxp = 50 x 0.5 = 25
  • 26. Classification Accuracy = Given x (# of correct predictions) or equivalently, acc = x / N, and N (# of test instances), Can we predict p (true accuracy of model)? Confidence Interval for Accuracy For large test sets (N > 30), acc has a normal distribution with mean p and variance p(1-p) / N Confidence Interval for p: Confidence Interval for Accuracy Consider a model that produces an accuracy of 80% when evaluated on 100 test instances: N=100, acc = 0.8 Let 1-α = 0.95 (95% confidence) From probability table, Zα /2=1.96 Confidence Interval for Accuracy ExampleN5010050010005000p(lower)0.6700.7110.7630.774
  • 27. 0.789p(upper)0.8880.8660.8330.8240.8111-αZ 0.992.580.982.330.951.960.901.65 Standard Normal distribution Performance Metrics for Multicalss AI Turn it into Binary Classification Binary Classification Problem 1: red vs. blue Binary Classification Problem 2: red vs. green Binary Classification Problem 3: red vs. yellow Binary Classification Problem 4: blue vs. green Binary Classification Problem 5: blue vs. yellow Binary Classification Problem 6: green vs. yellow Binary Classification Problem 1: red vs [blue, green, yellow] Binary Classification Problem 2: blue vs [red, green, yellow] Binary Classification Problem 3: green vs [red, blue, yellow] Binary Classification Problem 4: yellow vs [red, blue, green] One vs. All (Rest) One vs. One -1)/2 binary classifiers Performance Metrics for Multicalss AI Turn it into Binary Classification One vs. All (Rest) One vs. One
  • 29. One vs. All (Rest) 57
  • 34. Recall CurveCourse CodeGEN101InstanceP(+|A)True Class (Human)Predicited (AI)TPFPTNFNPrecisionRecall (Sensitivity) TPRFPRSpecificity/TNRACCF1Course Name10.10BenignBenign101181.000.110.001.000.600.15Introd uctory Artificial Intelligence Given the following test set of 20 cancer grades (Benign(+) and Malignant(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.20.87MalignantBenign111080.500.110.090.910.550.15Ass ignment No30.78BenignBenign211070.670.220.090.910.600.27340.98Ma lignantBenign22970.500.220.180.820.550.27Assignment Title50.36BenignBenign33860.500.330.270.730.550.35Performa nce Evaluation60.12MalignantBenign33860.500.330.270.730.550.35 Instructors70.85BenignBenign43850.570.440.270.730.600.42Pr of. Mohammed Ghazal80.87MalignantBenign44750.500.440.360.640.550.42Eng . Maha Yaghi90.69MalignantBenign45650.440.440.450.550.500.42Eng. Malaz Osman100.83MalignantBenign46550.400.440.550.450.450.42En g. Marah AlHalabi110.90BenignBenign66530.500.670.550.450.550.52En g. Tasnim Basmaji120.71BenignBenign66530.500.670.550.450.550.52Eng. Yasmin Abu- Haeyeh130.72MalignantBenign68330.430.670.730.270.450.521 40.93MalignantBenign68330.430.670.730.270.450.52150.94Mal ignantBenign69230.400.670.820.180.400.52160.34MalignantBe nign610130.380.670.910.090.350.52170.71BenignBenign71012 0.410.780.910.090.400.56Receiver Operating Characteristics Curve180.69BenignBenign810110.440.890.910.090.450.59190.3 2BenignBenign910100.471.000.910.090.500.62+- 200.35MalignantBenign911000.451.001.000.000.450.62AUCSel ect Your MajorWelcome to Assignment 3. Please follow the
  • 35. instructions below to start working on your assignment.0.5205ArchitectureGiven the following test set of 20 architectural styles (Cape_code(+) and Art_deco(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.Cape_codeArt_deco00.250.360.130.350.360.120.320.380.3 60.720.750.930.670.300.860.460.750.910.720.36+-+-+-+++--+-- +---++AviationGiven the following test set of 20 airplane failure reasons (Overload(+) and Design_flaw(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.OverloadDesign_flaw10.570.900.320.750.530.320.850.760 .340.770.730.610.770.210.730.450.930.240.280.99+-++++++--- +--+++---Biomedical EngineeringGiven the following test set of 20 cancer grades (Benign(+) and Malignant(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.BenignMalignant20.810.830.200.820.800.650.720.430.700 .720.850.360.330.410.740.360.450.750.480.53++---++-+--+++- +-+--BusinessGiven the following test set of 20 bank accounts (Investment(+) and Saving(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.InvestmentSaving30.990.840.960.850.900.130.750.860.82 0.880.590.670.990.950.680.730.110.640.310.49+++-+--+++-+- +-+-+--Chemical EngineeringGiven the following test set of 20 gas sensor responses (Linear(+) and Exponential(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.LinearExponential40.450.420.791.000.720.840.480.540.90 0.550.540.520.100.640.660.310.130.790.960.56--+---+++++-+-- +-+--Civil EngineeringGiven the following test set of 20 house structure conditions (Strong(+) and Weak(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.StrongWeak50.750.940.370.950.150.530.540.420.770.540.
  • 36. 600.150.730.300.970.480.740.760.290.55--+-+--++----+-+- +++Computer EngineeringGiven the following test set of 20 CPU performances (Super(+) and Poor(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.SuperPoor60.100.870.780.980.360.120.850.870.690.830.9 00.710.720.930.940.340.710.690.320.35+-+-+-+---++----+++- CybersecurityGiven the following test set of 20 CPU performances (Super(+) and Poor(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.SuperPoor70.470.350.990.180.850.190.720.580.840.720.1 80.600.500.510.340.860.910.720.760.75-+-++-++-+--++-++-- +Electrical EngineeringGiven the following test set of 20 power outage types (Distribution(+) and Transmission(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.DistributionTransmission80.530.390.910.600.700.730.530. 120.370.140.210.630.200.750.360.890.550.410.810.66--+----+- ++---++--+-HRGiven the following test set of 20 promotion eligibilities (Eligible(+) and Not_eligible(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.EligibleNot_Eligible90.230.490.550.440.730.760.160.250. 540.790.790.190.490.120.720.200.770.540.660.18-+++-+--++- ++++---++Industrial EngineeringGiven the following test set of 20 steel plates faults (Stains(+) and Bumps(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.StainsBumpsInformation TechnologyGiven the following test set of 20 CPU performances (Super(+) and Poor(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.SuperPoorInterior DesignGiven the following test set of 20 home decors (Modern(+) and Traditional(-)). Calculate the different performance measures (precision, recall, accuracy, …).
  • 37. Construct the ROC and then calculate the AUC.ModernTraditionalMechanical EngineeringGiven the following test set of 20 gear conditions (Normal(+) and Damaged(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.NormalDamagedSoftware EngineeringGiven the following test set of 20 CPU performances (Super(+) and Poor(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.SuperPoor 1. Insert your name and student ID under Name and ID. 2. Select your major from the drop-down list. This will generate a unique set of data for you only. For a successful completion of the assignment, you need to fill all bordered White cells. Note that any similarity detected will be reported to the Office of Academic Intergrity (OAI). ROC 0 9.0909090909090912E-2 9.0909090909090912E-2 0.18181818181818182 0.27272727272727271 0.27272727272727271 0.27272727272727271 0.36363636363636365 0.45454545454545453 0.54545454545454541 0.54545454545454541 0.54545454545454541 0.72727272727272729 0.72727272727272729 0.81818181818181823 0.90909090909090906 0.90909090909090906 0.90909090909090906 0.90909090909090906 1 0.1111111111111111 0.1111111111111111 0.22222222222222221 0.22222222222222221 0.33333333333333331 0.33333333333333331 0.44444444444444442 0.44444444444444442
  • 38. 0.44444444444444442 0.44444444444444442 0.66666666 666666663 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.77777777777777779 0.88888888888888884 1 1 FPR TPR 3. Apply threshold at each unique value of P(+|A) to find the predicted class (Predicted (AI)) for each instance. Note: Make sure you do not change the data in the colored cells. Hint: You may copy and paste the data (by value) to another sheet then sort it and begin working on the performance evaluation. Once done, you may fill in the bordered white cells in this sheet with your final answers. 4. Count the number of TP, FP, TN, and FN at each threshold and then calculate the different performance evaluation measures for each instance. The formulas can be found in the performance evaluation slides on blackboard. 5. Use the calculated values to plot the Precision-Recall curve and the Reciever Operating Characteristics (ROC) curve. 6. Find the area under the curve (AUC). 7. Once completed, save the file and submit it on blackboard. Sheet3TPFPTNFN91100InstanceP(+|A)True Class (Human)Predicited (AI)TPFPTNFNPrecisionRecall (Sensitivity) TPRFPRSpecificity/TNRACCF110.98BenignBenign101181.000. 110.001.000.600.1520.94MalignantBenign111080.500.110.090. 910.550.1530.93BenignBenign211070.670.220.090.910.600.274
  • 39. 0.90MalignantBenign22970.500.220.180.820.550.2750.87Benig nBenign33860.500.330.270.730.550.3560.87MalignantBenign33 860.500.330.270.730.550.3570.85BenignBenign43850.570.440. 270.730.600.4280.83MalignantBenign44750.500.440.360.640.5 50.4290.78MalignantBenign45650.440.440.450.550.500.42100. 72MalignantBenign46550.400.440.550.450.450.42110.71Benign Benign66530.500.670.550.450.550.52120.71BenignBenign6653 0.500.670.550.450.550.52130.69MalignantBenign68330.430.67 0.730.270.450.52140.69MalignantBenign68330.430.670.730.27 0.450.52150.36MalignantBenign69230.400.670.820.180.400.52 160.35MalignantBenign610130.380.670.910.090.350.52170.34B enignBenign710120.410.780.910.090.400.56180.32BenignBenig n810110.440.890.910.090.450.59190.12BenignBenign910100.47 1.000.910.090.500.62200.10MalignantBenign911000.451.001.0 00.000.450.62Model 1A10.01A20.04A30.025A40.1215A50.234A60.09AUC0.5205 Precision vs Recall Curve 0.1111111111111111 0.1111111111111111 0.22222222222222221 0.22222222222222221 0.33333333333333331 0.33333333333333331 0.44444444444444442 0.44444444444444442 0.44444444444444442 0.44444444444444442 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.77777777777777779 0.88888888888888884 1 1 1 0.5 0.66666666666666663 0.5 0.5 0.5 0.5714285714285714 0.5 0.44444444444444442 0.4 0.5 0.5 0.42857142857142855 0.42857142857142855 0.4 0.375 0.41176470588235292 0.44444444444444442 0.47368421052631576 0.45 Recall
  • 40. Precision ROC 0 9.0909090909090912E-2 9.0909090909090912E-2 0.18181818181818182 0.27272727272727271 0.27272727272727271 0.27272727272727271 0.36363636363636365 0.45454545454545453 0.54545454545454541 0.54545454545454541 0.54545454545454541 0.72727272727272729 0.72727272727272729 0.81818181818181823 0.90909090909090906 0.90909090909090906 0.90909090909090906 0.90909090909090906 1 0.1111111111111111 0.1111111111111111 0.22222222222222221 0.22222222222222221 0.33333333333333331 0.33333333333333331 0.44444444444444442 0.44444444444444442 0.44444444444444442 0.44444444444444442 0.66666666 666666663 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.66666666666666663 0.77777777777777779 0.88888888888888884 1 1 FPR TPR Assignment 3
  • 41. Guidelines Step 1: Study ALL Topics Covered So Far Topic 7: Performance Evaluation Advise: Redo all activities done in class and uploaded on BlackBoard Step 2: On Thursday May 6 at 12:01 am in the Assignments Folder Download the Excel File GEN101 – Assignment 3.xlsm Step 3: Download and Open the Excel file The Filename: GEN101 – Assignment 3.xlsm Disclaimer: No Macros are needed. Use a regular laptop and not a mobile or tablet. Your computer, Mac or PC, must have an Intel Chip. Apple M1 Chip (2021 MacBook Pros and Airs) do not yet support certain Excel features Step 4: Fill your name, ID, and select your major. Write ID
  • 42. Write Name 1 1 The question and data depend on your ID and major. No two students will be getting the same data and question. Any similarity will be reported to the Office of Academic Integrity. Your assignment is unique to you. Select Major 1 2 Observe 2Observe Step 5: Solve the Problem Fill Fill Follow the post-it note instructions and fill all the white cells (boxes). If it is empty and white, you are expected to fill it. Use Excel to make the
  • 43. calculations. Do not insert extra rows or columns. Save your work! Step 6: Submit Your Excel File Uncompressed before 11:59pm on Saturday Step 7: Wait for Grades • Grades are final • This assignment is individual. No cooperation is allowed with anyone • Late assignments receive a -50% penalty • Assignments are not accepted after the last day of classes and will receive NO credit • Anti-cheating measures are embedded within the Excel. Cheating will be reported • You can submit multiple attempts • Respondus Lockdown and Monitor will not be used for this Assignment sheet1Name IDMajorRawan salem bajjabaa1076676ArchitecturePrecision-Recall CurveCourse CodeGEN101InstanceP(+|A)True Class (Human)Predicited
  • 44. (AI)TPFPTNFNPrecisionRecall (Sensitivity) TPRFPRSpecificity/TNRACCF1Course Name10.10Cape_codeIntroductory Artificial Intelligence Given the following test set of 20 architectural styles (Cape_code(+) and Art_deco(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.20.87Art_decoAssignment No30.78Cape_code340.98Art_decoAssignment Title50.36Cape_codePerformance Evaluation60.12Art_decoInstructors70.85Cape_codeProf. Mohammed Ghazal80.87Art_decoEng. Maha Yaghi90.69Art_decoEng. Malaz Osman100.83Art_decoEng. Marah AlHalabi110.90Cape_codeEng. Tasnim Basmaji120.71Cape_codeEng. Yasmin Abu- Haeyeh130.72Art_deco140.93Art_deco150.94Art_deco160.34Ar t_deco170.71Cape_codeReceiver Operating Characteristics Curve180.69Cape_code190.32Cape_code+- 200.35Art_decoAUCSelect Your MajorWelcome to Assignment 3. Please follow the instructions below to start working on your assignment.ArchitectureGiven the following test set of 20 architectural styles (Cape_code(+) and Art_deco(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.Cape_codeArt_deco00.250.360.130.350.360.120.320.380.3 60.720.750.930.670.300.860.460.750.910.720.36+-+-+-+++--+-- +---++AviationGiven the following test set of 20 airplane failure reasons (Overload(+) and Design_flaw(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.OverloadDesign_flaw10.570.900.320.750.530.320.850.760 .340.770.730.610.770.210.730.450.930.240.280.99+-++++++--- +--+++---Biomedical EngineeringGiven the following test set of 20 cancer grades (Benign(+) and Malignant(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.BenignMalignant20.810.830.200.820.800.650.720.430.700
  • 45. .720.850.360.330.410.740.360.450.750.480.53++---++-+--+++- +-+--BusinessGiven the following test set of 20 bank accounts (Investment(+) and Saving(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.InvestmentSaving30.990.840.960.850.900.130.750.860.82 0.880.590.670.990.950.680.730.110.640.310.49+++-+--+++-+- +-+-+--Chemical EngineeringGiven the following test set of 20 gas sensor responses (Linear(+) and Exponential(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.LinearExponential40.450.420.791.000.720.840.480.540.90 0.550.540.520.100.640.660.310.130.790.960.56--+---+++++-+-- +-+--Civil EngineeringGiven the following test set of 20 house structure conditions (Strong(+) and Weak(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.StrongWeak50.750.940.370.950.150.530.540.420.770.540. 600.150.730.300.970.480.740.760.290.55--+-+--++----+-+- +++Computer EngineeringGiven the following test set of 20 CPU performances (Super(+) and Poor(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.SuperPoor60.100.870.780.980.360.120.850.870.690.830.9 00.710.720.930.940.340.710.690.320.35+-+-+-+---++----+++- CybersecurityGiven the following test set of 20 CPU performances (Super(+) and Poor(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.SuperPoor70.470.350.990.180.850.190.720.580.840.720.1 80.600.500.510.340.860.910.720.760.75-+-++-++-+--++-++-- +Electrical EngineeringGiven the following test set of 20 power outage types (Distribution(+) and Transmission(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.DistributionTransmission80.530.390.910.600.700.730.530.
  • 46. 120.370.140.210.630.200.750.360.890.550.410.810.66--+----+- ++---++--+-HRGiven the following test set of 20 promotion eligibilities (Eligible(+) and Not_eligible(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.EligibleNot_Eligible90.230.490.550.440.730.760.160.250. 540.790.790.190.490.120.720.200.770.540.660.18-+++-+--++- ++++---++Industrial EngineeringGiven the following test set of 20 steel plates faults (Stains(+) and Bumps(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.StainsBumpsInformation TechnologyGiven the following test set of 20 CPU performances (Super(+) and Poor(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.SuperPoorInterior DesignGiven the following test set of 20 home decors (Modern(+) and Traditional(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.ModernTraditionalMechanical EngineeringGiven the following test set of 20 gear conditions (Normal(+) and Damaged(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.NormalDamagedSoftware Engineeri ngGiven the following test set of 20 CPU performances (Super(+) and Poor(-)). Calculate the different performance measures (precision, recall, accuracy, …). Construct the ROC and then calculate the AUC.SuperPoor 1. Insert your name and student ID under Name and ID. 2. Select your major from the drop-down list. This will generate a unique set of data for you only. For a successful completion of the assignment, you need to fill all bordered White cells.
  • 47. Note that any similarity detected will be reported to the Office of Academic Intergrity (OAI). 3. Apply threshold at each unique value of P(+|A) to find the predicted class (Predicted (AI)) for each instance. Note: Make sure you do not change the data in the colored cells. Hint: You may copy and paste the data (by value) to another sheet then sort it and begin working on the performance evaluation. Once done, you may fill in the bordered white cells in this sheet with your final answers. 4. Count the number of TP, FP, TN, and FN at each threshold and then calculate the different performance evaluation measures for each instance. The formulas can be found in the performance evaluation slides on blackboard. 5. Use the calculated values to plot the Precision-Recall curve and the Reciever Operating Characteristics (ROC) curve. 6. Find the area under the curve (AUC). 7. Once completed, save the file and submit it on blackboard.