GEN101IntroductoryArtificial Intelligence Arti

GEN101
Introductory
Artificial Intelligence
Artificial Intelligence Performance Evaluation
College of Engineering
1
Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Performance Evaluation
How to obtain reliable estimates?
Methods for Model Comparison
How to compare the relative performance among competing
models?

2
We have already seen: the Confusion Matrix
Most widely-used metric:
We will focus on the predictive capability of a model
Rather than how fast it takes to classify or build models,
scalability, etc.PREDICTED CLASS (AI)ACTUAL
CLASS
(Human)Class=YesClass=NoClass=Yesa (TP)b (FN)Class=Noc
(FP)d (TN)
TP: True Positive
FP: False Positive
TN: True Negative
FN: False Negative
Limitation of Accuracy
Consider a 2-class problem
Number of Class 0 (e.g., Benign) examples = 9990
Number of Class 1 (e.g., Malignant) examples = 10
If model predicts everything to be class 0, accuracy is

9990/10000 = 99.9 %
Accuracy is misleading because model does not detect any class
1 example
4
Other MeasuresPREDICTED CLASSACTUAL
CLASSClass=YesClass=NoClass=Yesa (TP)b (FN)Class=Noc
(FP)d (TN)
Precision-Recall
Two other metrics that are often used to quantify model
performance are precision and recall.
Precision is defined as the number of true positives divided by
the total number of positive predictions.
It quantifies what percentage of the positive predictions were
correct
How correct your model’s positive predictions were.
Recall/Sensitivity is defined as the number of true positives
divided by the total number of true positives and false negatives
(all actual positives)
It quantifies what percentage of the actual positives you were

able to identify
How sensitive your model was in identifying positives.
https://medium.com/@douglaspsteen/precision-recall-curves-
d32e5b290248
6
Precision-Recall
Precision =
Recall =
Sensitivity =
Specificity =
Out of all the examples that were predicted as positive, how
many are correct? How precise was the AI?
Out of all the positive examples, how many were caught
(recalled)? Did the AI miss?
Out of all the people that do have the disease, how many got
positive test results? Did the AI miss anyone?
Out of all the people that do NOT have the disease, how many
got negative results?

https://towardsdatascience.com/should-i-look-at-precision-
recall-or-specificity-sensitivity-3946158aace1
7
Precision-Recall
Example
In this example:
If the classifier predicts negative, you can trust it, the example
is negative. Our AI does not do a mistake in negative. It is
sensitive. However, pay attention, if the example is negative,
you can’t be sure it will predict it as negative
(specificity=78%).
If the classifier predicts positive, you can’t trust it
(precision=33%)
However, if the example is positive, you can trust the classifier
will find it (will not miss it) (recall=100%).
Precision: Is the green box pure?
Recall/Sensitivity: Did I lose any of the green balls to the red
box? Did I miss any greens? Am I so sensitive to greens that I
do not miss them?
Specificity: Are all the red balls in the red box?
8
Precision-Recall

Example
In this example:
Since the population is imbalanced:
The precision is relatively high
The recall is 100% because all the positive examples are
predicted as positive.
The specificity is 0% because no negative example is predicted
as negative.
9
Precision-Recall
Example
In this example:
If it predicts that an example is positive, you can trust it — it is
positive.
However, if it predicts it is negative, you can’t trust it, the
chances are that it is still positive.
Can be a useful classifier
10

Precision-Recall
Example
In this example:
The classifier detects all the positive examples as positive
It also detects all negative examples as negative.
All the measures are at 100%.
11
Why so many measures? Which is more important?
Because it depends
False positive is expensive. False alarm is a nightmare!
Precision is more important than recall
False negative is expensive. Missing one is a nightmare!
Recall is more important than precision
12
Precision-Recall Curves
The precision-recall curve is used for evaluating the
performance of binary classification algorithms.

They provide a graphical representation of a classifier’s
performance across many thresholds, rather than a single value
It is constructed by calculating and plotting the precision
against the recall for a single classifier at a variety of
thresholds.
It helps to visualize how the choice of threshold affects
classifier performance and can even help us select the best
threshold for a specific problem.
13
Precision-Recall Curves Interpretation
https://medium.com/@douglaspsteen/precision-recall-curves-
d32e5b290248
14
A model that produces a precision-recall curve that is closer to
the top-right corner is better than a model that produces a
precision-recall curve that is skewed towards the bottom of the
plot.

https://paulvanderlaken.com/2019/08/16/roc-auc-precision-and-
recall-visually-explained/
15
Example
Which of the following P-R curves produce represent a perfect
classifier?
(a)
(b)
(c)
https://towardsdatascience.com/gaining-an-intuitive-
understanding-of-precision-and-recall-3b9df37804a7
(b) -> perfect classifier
(a) -> very bad classifier (random classifier)
16
Model Evaluation

models?
17
Performance of a model may depend on other factors besides the
learning algorithm:
How to obtain a reliable estimate of performance?
Methods of Estimation:
18
Class distribution
Size of training and test sets
Holdout
Reserve 2/3 for training and 1/3 for testing
Random subsampling

Repeated holdout
Cross validation
Partition data into k disjoint subsets
k-fold: train on k-1 partitions, test on the remaining one
Leave-one-out: k=n
Model Evaluation
models?
19

ROC (Receiver Operating Characteristic)
Developed in 1950s for signal detection theory to analyze noisy
signals
Characterize the trade-off between positive hits and false
alarms
ROC curve plots TP rate (y-axis) against FP rate (x-axis)
Performance of each classifier represented as a point on ROC
curve
changing the threshold of the algorithm, or sample distribution
changes the location of the point
TP rate (TPR) =
FP rate (TPR) =
20
1-dimensional data set containing 2 classes (positive and
negative)
- any points located at x > t is classified as positive
21
At threshold t:

TPR=0.5, FPR=0.12
(TPR,FPR):
(0,0): declare everything
to be negative class
(1,1): declare everything
to be positive class
(0,1): ideal
Diagonal line:
Random guessing
Below diagonal line:
prediction is opposite of the true class
Using ROC for Model Comparison
No model consistently outperform the other
M1 is better for small FPR
M2 is better for large FPR
Area Under the ROC curve
Ideal:
Area = 1
Random guess:
Area = 0.5

How to construct an ROC curveInstanceP(+|A)True
Class10.85+20.53+30.87-40.85-50.85-60.95+70.76-
80.93+90.43-100.25+
Use a classifier that produces a probability for each test
instance P(+|A) for each test A
Sort the instances according to P(+|A) in decreasing order
Apply threshold at each unique value of P(+|A) and count the
number of TP, FP, TN, FN at each threshold
TP rate (TPR) =
FP rate (TPR) =
Class10.95+20.93+30.87-40.85-50.85-60.85+70.76-
80.53+90.43-100.25+
TP rate (TPR) =
FP rate (TPR) =
How to construct an ROC curve

TP rate (TPR) =
FP rate (TPR) = #Threshold >=True Class
(Human)AITPFPTNFN10.95++105420.93+-205330.87--
214340.85--50.85--60.85+-332270.76--341280.53+-441190.43--
4501100.25+-5500
26
ClassTPFPTNFNFPRTPR10.95+105401/520.93+205302/530.87-
21431/52/540.85-50.85-60.85+33223/53/570.76-
34124/53/580.53+44114/54/590.43-450114/5100.25+550011
Breakout Session
ClassFPRTPR10.95+20.93+30.87+40.85+50.83+60.80-70.76-
80.53-90.43-100.25-

Example
ClassFPRTPR10.95+01/520.93+02/530.87+03/540.85+04/550.8
3+0160.80-1/5170.76-2/5180.53-3/5190.43-4/51100.25-11
Example
ClassFPRTPR10.95+01/520.93-1/51/530.87+1/52/540.85-
2/52/550.83+2/53/560.80-3/53/570.76+3/54/580.53-
4/54/590.43+4/51100.25-11
Example
ROC Interpretation
AUC (Area Under the Curve)
AUC stands for "Area under the ROC Curve."
It measures the entire two-dimensional area underneath the
entire ROC curve from (0,0) to (1,1)

It provides an aggregate measure of performance across all
possible classification thresholds.
AUC ranges in value from 0 to 1.
A model whose predictions are 100% wrong has an AUC of 0.0;
which means it has the worst measure of separability
A model whose predictions are 100% correct has an AUC of
1.0; which means it has a good measure of separability
When AUC is 0.5, it means the model has no class separation
capacity
33
AUC (Area Under the Curve)
https://paulvanderlaken.com/2019/08/16/roc-auc-precision-and-
recall-visually-explained/
34
AUC (Area Under the Curve) Interpretation
Example
Red distribution curve is of the positive class (patients with
disease) and the green distribution curve is of the negative

class(patients with no disease)
This is an ideal situation. When two curves don’t overlap at all
means model has an ideal measure of separability.
It is perfectly able to distinguish between positive class and
negative class.
https://towardsdatascience.com/understanding-auc-roc-curve-
68b2303cc9c5
35
Example
When AUC is 0.7, it means there is a 70% chance that the
model will be able to distinguish between positive class and
negative class.
68b2303cc9c5
36
Example
This is the worst situation. When AUC is approximately 0.5, the
model has no discrimination capacity to distinguish between
positive class and negative class.

https://towardsdatascience.com/understandi ng-auc-roc-curve-
68b2303cc9c5
37
Example
When AUC is approximately 0, the model is actually
reciprocating the classes. It means the model is predicting a
negative class as a positive class and vice versa.
68b2303cc9c5
38
Example
Which of the following ROC curves produce AUC values
greater than 0.5?

a
b
c
d
e
https://developers.google.com/machine-learning/crash-
course/classification/check-your-understanding-roc-and-auc
A -> This ROC curve has an AUC between 0.5 and 1.0, meaning
it ranks a random positive example higher than a random
negative example more than 50% of the time. Real-world binary
classification AUC values generally fall into this range.
E->Best possible ROC curve, as it ranks all positives above all
negatives. It has an AUC of 1.0.
In practice, if you have a "perfect" classifier with an AUC of
1.0, you should be suspicious, as it likely indicates a bug in
your model. For example, you may have overfit to your training
data, or the label data may be replicated in one of your features.
39
ROC vs. PRC
The main difference between ROC curves and precision-recall
curves is that the number of true-negative results is not used for
making a PRC
Curvex-axisy-
axisConceptCalculationConceptCalculationPrecision-recall
(PRC)RecallTP / (TP + FN)PrecisionTP / (TP + FP)Receiver
Operating Characteristics (ROC)False Positive RateFP / (FP +
TN)Recall
Sensitivity
True Positive RateTP / (TP + FN)

40
Intersection over Union (IoU) for object detection
http://arogozhnikov.github.io/2015/10/05/roc-curve.html
41
Is object detection classification or regression?
42
Intersection over union (IoU) is an evaluation metric used to
measure the accuracy of an object detector on a particular
dataset.
It is a number from 0 to 1 that specifies the amount of overlap
between the predicted and ground truth bounding box.
It is known to be the most popular evaluation metric for tasks
such as segmentation, object detection and tracking.

What is Intersection over Union?
Definition
https://towardsdatascience.com/intersection-over-union-iou-
calculation-for-evaluating-an-image-segmentation-model-
8b22e2e84686
43
What is Intersection over Union?
Which one is best?
https://towardsdatascience.com/intersection-over-union-iou-
calculation-for-evaluating-an-image-segmentation-model-
8b22e2e84686
44
An IoU of 0 means that there is no overlap between the boxes
An IoU of 1 means that the union of the boxes is the same as
their overlap indicating that they are completely overlapping
The lower the IoU
The worse the prediction result

Intersection over Union
https://towardsdatascience.com/iou-a-better-detection-
evaluation-metric-45a511185be1
45
In order to apply Intersection over Union to evaluate an
(arbitrary) object detector we need:
The ground-truth bounding boxes (i.e., the hand labeled
bounding boxes that specify where in the image our object is).
The predicted bounding boxes from our model.
Your goal is to take the:
Training images
Bounding boxes
Then evaluate its performance on the testing set.
An Intersection over Union score > 0.5 is normally considered a

“good” prediction.
construct an object detector
48
Confidence Intervals

49
Confidence, in statistics, is a way to describe probability.
Confidence interval is the range of values you expect your
estimate to fall between if you redo your test, within a certain
level of confidence.
For example: if you construct a confidence interval with a 95%
confidence level, you are confident that 95 out of 100 times the
estimate will fall between the upper and lower values specified
by the confidence interval.
Confidence Interval for Accuracy
Definition
Confidence level = 1 −α
https://www.scribbr.com/statistics/confidence-interval/
50
Prediction can be regarded as a Bernoulli trial
A Bernoulli trial has 2 possible outcomes
Possible outcomes for prediction: correct or wrong
Collection of Bernoulli trials has a Binomial distribution:
x ~ Bin(N, p) x: number of correct predictions
Example: Toss a fair coin 50 times, how many heads would
turn up?
Expected number of heads = Nxp = 50 x 0.5 = 25

Classification Accuracy =
Given x (# of correct predictions) or equivalently,
acc = x / N, and N (# of test instances),
Can we predict p (true accuracy of model)?
For large test sets (N > 30), acc has a normal distribution
with mean p and variance p(1-p) / N
Confidence Interval for p:
Consider a model that produces an accuracy of 80% when
evaluated on 100 test instances:
N=100, acc = 0.8
Let 1-α = 0.95 (95% confidence)
From probability table, Zα /2=1.96
ExampleN5010050010005000p(lower)0.6700.7110.7630.774

0.789p(upper)0.8880.8660.8330.8240.8111-αZ
0.992.580.982.330.951.960.901.65
Standard Normal distribution
Performance Metrics for Multicalss AI
Turn it into Binary Classification
Binary Classification Problem 1: red vs. blue
Binary Classification Problem 2: red vs. green
Binary Classification Problem 3: red vs. yellow
Binary Classification Problem 4: blue vs. green
Binary Classification Problem 5: blue vs. yellow
Binary Classification Problem 6: green vs. yellow
Binary Classification Problem 1: red vs [blue, green, yellow]
Binary Classification Problem 2: blue vs [red, green, yellow]
Binary Classification Problem 3: green vs [red, blue, yellow]
Binary Classification Problem 4: yellow vs [red, blue, green]
One vs. All (Rest)
One vs. One
-1)/2 binary classifiers
Performance Metrics for Multicalss AI
Turn it into Binary Classification
One vs. All (Rest)
One vs. One

TP
TN
TP
d
c
b
a
d
a
+
+
+
+
=
+
+
+
+
=
Accuracy
c
b
a
a
p
r
rp
b
a
a
c
a
a
+
+
=

+
=
+
=
+
=
2
2
2
(F)
measure
-
F
(r)
Recall
(p)
Precision
d
w
c
w
b
w
a
w
d
w
a
w
4
3
2
1

4
1
Accuracy
Weighted
+
+
+
+
=
b
a
a
d
c
c
+
=
+
=
rate
positive
True
rate
positive
False
sheet1Name IDMajorTPFPTNFNAmal
Almansoori1078176Biomedical Engineering91100Precision-

Recall CurveCourse CodeGEN101InstanceP(+|A)True Class
(Human)Predicited (AI)TPFPTNFNPrecisionRecall (Sensitivity)
TPRFPRSpecificity/TNRACCF1Course
Name10.10BenignBenign101181.000.110.001.000.600.15Introd
uctory Artificial Intelligence Given the following test set of 20
cancer grades (Benign(+) and Malignant(-)). Calculate the
different performance measures (precision, recall, accuracy, …).
Construct the ROC and then calculate the
AUC.20.87MalignantBenign111080.500.110.090.910.550.15Ass
ignment
No30.78BenignBenign211070.670.220.090.910.600.27340.98Ma
lignantBenign22970.500.220.180.820.550.27Assignment
Title50.36BenignBenign33860.500.330.270.730.550.35Performa
nce
Evaluation60.12MalignantBenign33860.500.330.270.730.550.35
Instructors70.85BenignBenign43850.570.440.270.730.600.42Pr
of. Mohammed
Ghazal80.87MalignantBenign44750.500.440.360.640.550.42Eng
. Maha
Yaghi90.69MalignantBenign45650.440.440.450.550.500.42Eng.
Malaz
Osman100.83MalignantBenign46550.400.440.550.450.450.42En
g. Marah
AlHalabi110.90BenignBenign66530.500.670.550.450.550.52En
g. Tasnim
Basmaji120.71BenignBenign66530.500.670.550.450.550.52Eng.
Yasmin Abu-
Haeyeh130.72MalignantBenign68330.430.670.730.270.450.521
40.93MalignantBenign68330.430.670.730.270.450.52150.94Mal
ignantBenign69230.400.670.820.180.400.52160.34MalignantBe
nign610130.380.670.910.090.350.52170.71BenignBenign71012
0.410.780.910.090.400.56Receiver Operating Characteristics
Curve180.69BenignBenign810110.440.890.910.090.450.59190.3
2BenignBenign910100.471.000.910.090.500.62+-
200.35MalignantBenign911000.451.001.000.000.450.62AUCSel
ect Your MajorWelcome to Assignment 3. Please follow the

instructions below to start working on your
assignment.0.5205ArchitectureGiven the following test set of 20
architectural styles (Cape_code(+) and Art_deco(-)). Calculate
the different performance measures (precision, recall, accuracy,
…). Construct the ROC and then calculate the
AUC.Cape_codeArt_deco00.250.360.130.350.360.120.320.380.3
60.720.750.930.670.300.860.460.750.910.720.36+-+-+-+++--+--
+---++AviationGiven the following test set of 20 airplane
failure reasons (Overload(+) and Design_flaw(-)). Calculate the
AUC.OverloadDesign_flaw10.570.900.320.750.530.320.850.760
.340.770.730.610.770.210.730.450.930.240.280.99+-++++++---
+--+++---Biomedical EngineeringGiven the following test set of
20 cancer grades (Benign(+) and Malignant(-)). Calculate the
AUC.BenignMalignant20.810.830.200.820.800.650.720.430.700
.720.850.360.330.410.740.360.450.750.480.53++---++-+--+++-
+-+--BusinessGiven the following test set of 20 bank accounts
(Investment(+) and Saving(-)). Calculate the different
performance measures (precision, recall, accuracy, …).
AUC.InvestmentSaving30.990.840.960.850.900.130.750.860.82
0.880.590.670.990.950.680.730.110.640.310.49+++-+--+++-+-
+-+-+--Chemical EngineeringGiven the following test set of 20
gas sensor responses (Linear(+) and Exponential(-)). Calculate
AUC.LinearExponential40.450.420.791.000.720.840.480.540.90
0.550.540.520.100.640.660.310.130.790.960.56--+---+++++-+--
+-+--Civil EngineeringGiven the following test set of 20 house
structure conditions (Strong(+) and Weak(-)). Calculate the
AUC.StrongWeak50.750.940.370.950.150.530.540.420.770.540.

600.150.730.300.970.480.740.760.290.55--+-+--++----+-+-
+++Computer EngineeringGiven the following test set of 20
CPU performances (Super(+) and Poor(-)). Calculate the
AUC.SuperPoor60.100.870.780.980.360.120.850.870.690.830.9
00.710.720.930.940.340.710.690.320.35+-+-+-+---++----+++-
CybersecurityGiven the following test set of 20 CPU
performances (Super(+) and Poor(-)). Calculate the different
AUC.SuperPoor70.470.350.990.180.850.190.720.580.840.720.1
80.600.500.510.340.860.910.720.760.75-+-++-++-+--++-++--
+Electrical EngineeringGiven the following test set of 20 power
outage types (Distribution(+) and Transmission(-)). Calculate
AUC.DistributionTransmission80.530.390.910.600.700.730.530.
120.370.140.210.630.200.750.360.890.550.410.810.66--+----+-
++---++--+-HRGiven the following test set of 20 promotion
eligibilities (Eligible(+) and Not_eligible(-)). Calculate the
AUC.EligibleNot_Eligible90.230.490.550.440.730.760.160.250.
540.790.790.190.490.120.720.200.770.540.660.18-+++-+--++-
++++---++Industrial EngineeringGiven the following test set of
20 steel plates faults (Stains(+) and Bumps(-)). Calculate the
AUC.StainsBumpsInformation TechnologyGiven the following
test set of 20 CPU performances (Super(+) and Poor(-)).
Calculate the different performance measures (precision, recall,
accuracy, …). Construct the ROC and then calculate the
AUC.SuperPoorInterior DesignGiven the following test set of
20 home decors (Modern(+) and Traditional(-)). Calculate the

AUC.ModernTraditionalMechanical EngineeringGiven the
following test set of 20 gear conditions (Normal(+) and
Damaged(-)). Calculate the different performance measures
(precision, recall, accuracy, …). Construct the ROC and then
calculate the AUC.NormalDamagedSoftware EngineeringGiven
the following test set of 20 CPU performances (Super(+) and
Poor(-)). Calculate the different performance measures
calculate the AUC.SuperPoor
1. Insert your name and student ID under Name and ID.
2. Select your major from the drop-down list.
This will generate a unique set of data for you only.
For a successful completion of the assignment, you need to fill
all bordered White cells.
Note that any similarity detected will be reported to the
Office of Academic Intergrity (OAI).
ROC
0 9.0909090909090912E-2 9.0909090909090912E-2
0.18181818181818182 0.27272727272727271
0.27272727272727271 0.27272727272727271
0.36363636363636365 0.45454545454545453
0.54545454545454541 0.54545454545454541
0.54545454545454541 0.72727272727272729
0.72727272727272729 0.81818181818181823
0.90909090909090906 0.90909090909090906
0.90909090909090906 0.90909090909090906 1
0.1111111111111111 0.1111111111111111
0.22222222222222221 0.22222222222222221
0.33333333333333331 0.33333333333333331
0.44444444444444442 0.44444444444444442

0.44444444444444442 0.44444444444444442
0.66666666 666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.77777777777777779 0.88888888888888884 1 1
FPR
TPR
3. Apply threshold at each unique value of P(+|A) to find the
predicted class (Predicted (AI)) for each instance.
Note: Make sure you do not change the data in the colored cells.
Hint: You may copy and paste the data (by value) to another
sheet then sort it and begin working on the performance
evaluation. Once done, you may fill in the bordered white cells
in this sheet with your final answers.
4. Count the number of TP, FP, TN, and FN at each threshold
and then calculate the different performance evaluation
measures for each instance. The formulas can be found in the
performance evaluation slides on blackboard.
5. Use the calculated values to plot the Precision-Recall curve
and the Reciever Operating Characteristics (ROC) curve.
6. Find the area under the curve (AUC).
7. Once completed, save the file and submit it on blackboard.
Sheet3TPFPTNFN91100InstanceP(+|A)True Class
(Human)Predicited (AI)TPFPTNFNPrecisionRecall (Sensitivity)
TPRFPRSpecificity/TNRACCF110.98BenignBenign101181.000.
110.001.000.600.1520.94MalignantBenign111080.500.110.090.
910.550.1530.93BenignBenign211070.670.220.090.910.600.274

0.90MalignantBenign22970.500.220.180.820.550.2750.87Benig
nBenign33860.500.330.270.730.550.3560.87MalignantBenign33
860.500.330.270.730.550.3570.85BenignBenign43850.570.440.
270.730.600.4280.83MalignantBenign44750.500.440.360.640.5
50.4290.78MalignantBenign45650.440.440.450.550.500.42100.
72MalignantBenign46550.400.440.550.450.450.42110.71Benign
Benign66530.500.670.550.450.550.52120.71BenignBenign6653
0.500.670.550.450.550.52130.69MalignantBenign68330.430.67
0.730.270.450.52140.69MalignantBenign68330.430.670.730.27
0.450.52150.36MalignantBenign69230.400.670.820.180.400.52
160.35MalignantBenign610130.380.670.910.090.350.52170.34B
enignBenign710120.410.780.910.090.400.56180.32BenignBenig
n810110.440.890.910.090.450.59190.12BenignBenign910100.47
1.000.910.090.500.62200.10MalignantBenign911000.451.001.0
00.000.450.62Model
1A10.01A20.04A30.025A40.1215A50.234A60.09AUC0.5205
Precision vs Recall Curve
0.1111111111111111 0.1111111111111111
0.22222222222222221 0.22222222222222221
0.33333333333333331 0.33333333333333331
0.44444444444444442 0.44444444444444442
0.44444444444444442 0.44444444444444442
0.66666666666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.77777777777777779 0.88888888888888884 1 1
1 0.5 0.66666666666666663 0.5 0.5 0.5
0.5714285714285714 0.5 0.44444444444444442
0.4 0.5 0.5 0.42857142857142855
0.42857142857142855 0.4 0.375
0.41176470588235292 0.44444444444444442
0.47368421052631576 0.45
Recall

Precision
ROC
0 9.0909090909090912E-2 9.0909090909090912E-2
0.18181818181818182 0.27272727272727271
0.27272727272727271 0.27272727272727271
0.36363636363636365 0.45454545454545453
0.54545454545454541 0.54545454545454541
0.54545454545454541 0.72727272727272729
0.72727272727272729 0.81818181818181823
0.90909090909090906 0.90909090909090906
0.90909090909090906 0.90909090909090906 1
0.1111111111111111 0.1111111111111111
0.22222222222222221 0.22222222222222221
0.33333333333333331 0.33333333333333331
0.44444444444444442 0.44444444444444442
0.44444444444444442 0.44444444444444442
0.66666666 666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.66666666666666663 0.66666666666666663
0.77777777777777779 0.88888888888888884 1 1
FPR
TPR
Assignment 3

Guidelines
Step 1: Study ALL Topics Covered So Far
Topic 7: Performance Evaluation
Advise: Redo all activities done in class and
uploaded on BlackBoard
Step 2: On Thursday May 6 at 12:01 am in the Assignments
Folder
Download the Excel File
GEN101 – Assignment 3.xlsm
Step 3: Download and Open the Excel file
The Filename: GEN101 – Assignment 3.xlsm
Disclaimer: No Macros are needed. Use a regular laptop and not
a mobile or tablet. Your computer,
Mac or PC, must have an Intel Chip. Apple M1 Chip (2021
MacBook Pros and Airs) do not yet support
certain Excel features
Step 4: Fill your name, ID, and select your major.
Write ID

Write Name
1
1
The question and data
depend on your ID and
major. No two students will
be getting the same data
and question. Any similarity
will be reported to the
Office of Academic Integrity.
Your assignment is unique
to you.
Select Major
1
2 Observe
2Observe
Step 5: Solve the Problem
Fill
Fill
Follow the post-it note
instructions and fill all the white
cells (boxes). If it is empty and
white, you are expected to fill it.
Use Excel to make the

calculations. Do not insert extra
rows or columns. Save your work!
Step 6: Submit Your Excel File Uncompressed before 11:59pm
on Saturday
Step 7: Wait for Grades
• Grades are final
• This assignment is individual. No cooperation is allowed with
anyone
• Late assignments receive a -50% penalty
• Assignments are not accepted after the last day of classes and
will
receive NO credit
• Anti-cheating measures are embedded within the Excel.
Cheating will
be reported
• You can submit multiple attempts
• Respondus Lockdown and Monitor will not be used for this
Assignment
sheet1Name IDMajorRawan salem
bajjabaa1076676ArchitecturePrecision-Recall CurveCourse
CodeGEN101InstanceP(+|A)True Class (Human)Predicited

(AI)TPFPTNFNPrecisionRecall (Sensitivity)
TPRFPRSpecificity/TNRACCF1Course
Name10.10Cape_codeIntroductory Artificial Intelligence Given
the following test set of 20 architectural styles (Cape_code(+)
and Art_deco(-)). Calculate the different performance measures
calculate the AUC.20.87Art_decoAssignment
No30.78Cape_code340.98Art_decoAssignment
Title50.36Cape_codePerformance
Evaluation60.12Art_decoInstructors70.85Cape_codeProf.
Mohammed Ghazal80.87Art_decoEng. Maha
Yaghi90.69Art_decoEng. Malaz Osman100.83Art_decoEng.
Marah AlHalabi110.90Cape_codeEng. Tasnim
Basmaji120.71Cape_codeEng. Yasmin Abu-
Haeyeh130.72Art_deco140.93Art_deco150.94Art_deco160.34Ar
t_deco170.71Cape_codeReceiver Operating Characteristics
Curve180.69Cape_code190.32Cape_code+-
200.35Art_decoAUCSelect Your MajorWelcome to Assignment
3. Please follow the instructions below to start working on your
assignment.ArchitectureGiven the following test set of 20
architectural styles (Cape_code(+) and Art_deco(-)). Calculate
AUC.Cape_codeArt_deco00.250.360.130.350.360.120.320.380.3
60.720.750.930.670.300.860.460.750.910.720.36+-+-+-+++--+--
+---++AviationGiven the following test set of 20 airplane
failure reasons (Overload(+) and Design_flaw(-)). Calculate the
AUC.OverloadDesign_flaw10.570.900.320.750.530.320.850.760
.340.770.730.610.770.210.730.450.930.240.280.99+-++++++---
+--+++---Biomedical EngineeringGiven the following test set of
20 cancer grades (Benign(+) and Malignant(-)). Calculate the
AUC.BenignMalignant20.810.830.200.820.800.650.720.430.700

.720.850.360.330.410.740.360.450.750.480.53++---++-+--+++-
+-+--BusinessGiven the following test set of 20 bank accounts
(Investment(+) and Saving(-)). Calculate the different
AUC.InvestmentSaving30.990.840.960.850.900.130.750.860.82
0.880.590.670.990.950.680.730.110.640.310.49+++-+--+++-+-
+-+-+--Chemical EngineeringGiven the following test set of 20
gas sensor responses (Linear(+) and Exponential(-)). Calculate
AUC.LinearExponential40.450.420.791.000.720.840.480.540.90
0.550.540.520.100.640.660.310.130.790.960.56--+---+++++-+--
+-+--Civil EngineeringGiven the following test set of 20 house
structure conditions (Strong(+) and Weak(-)). Calculate the
AUC.StrongWeak50.750.940.370.950.150.530.540.420.770.540.
600.150.730.300.970.480.740.760.290.55--+-+--++----+-+-
+++Computer EngineeringGiven the following test set of 20
CPU performances (Super(+) and Poor(-)). Calculate the
AUC.SuperPoor60.100.870.780.980.360.120.850.870.690.830.9
00.710.720.930.940.340.710.690.320.35+-+-+-+---++----+++-
CybersecurityGiven the following test set of 20 CPU
performances (Super(+) and Poor(-)). Calculate the different
AUC.SuperPoor70.470.350.990.180.850.190.720.580.840.720.1
80.600.500.510.340.860.910.720.760.75-+-++-++-+--++-++--
+Electrical EngineeringGiven the following test set of 20 power
outage types (Distribution(+) and Transmission(-)). Calculate
AUC.DistributionTransmission80.530.390.910.600.700.730.530.

120.370.140.210.630.200.750.360.890.550.410.810.66--+----+-
++---++--+-HRGiven the following test set of 20 promotion
eligibilities (Eligible(+) and Not_eligible(-)). Calculate the
AUC.EligibleNot_Eligible90.230.490.550.440.730.760.160.250.
540.790.790.190.490.120.720.200.770.540.660.18-+++-+--++-
++++---++Industrial EngineeringGiven the following test set of
20 steel plates faults (Stains(+) and Bumps(-)). Calculate the
AUC.StainsBumpsInformation TechnologyGiven the following
test set of 20 CPU performances (Super(+) and Poor(-)).
Calculate the different performance measures (precision, recall,
accuracy, …). Construct the ROC and then calculate the
AUC.SuperPoorInterior DesignGiven the following test set of
20 home decors (Modern(+) and Traditional(-)). Calculate the
AUC.ModernTraditionalMechanical EngineeringGiven the
following test set of 20 gear conditions (Normal(+) and
Damaged(-)). Calculate the different performance measures
calculate the AUC.NormalDamagedSoftware Engineeri ngGiven
the following test set of 20 CPU performances (Super(+) and
Poor(-)). Calculate the different performance measures
calculate the AUC.SuperPoor
1. Insert your name and student ID under Name and ID.
2. Select your major from the drop-down list.
This will generate a unique set of data for you only.
For a successful completion of the assignment, you need to fill
all bordered White cells.

Note that any similarity detected will be reported to the
Office of Academic Intergrity (OAI).
3. Apply threshold at each unique value of P(+|A) to find the
predicted class (Predicted (AI)) for each instance.
Note: Make sure you do not change the data in the colored cells.
Hint: You may copy and paste the data (by value) to another
sheet then sort it and begin working on the performance
evaluation. Once done, you may fill in the bordered white cells
in this sheet with your final answers.
4. Count the number of TP, FP, TN, and FN at each threshold
and then calculate the different performance evaluation
measures for each instance. The formulas can be found in the
performance evaluation slides on blackboard.
5. Use the calculated values to plot the Precision-Recall curve
and the Reciever Operating Characteristics (ROC) curve.
6. Find the area under the curve (AUC).
7. Once completed, save the file and submit it on blackboard.

GEN101IntroductoryArtificial Intelligence Arti

Recommended

Recommended

More Related Content

More from MatthewTennant613

More from MatthewTennant613 (20)

GEN101IntroductoryArtificial Intelligence Arti