SlideShare a Scribd company logo
1 of 62
Download to read offline
Energy Systems
Modeling and Optimization
Dr. Mohamad Kharseh
Office: G 342
mohamad.kharseh@aurak.ac.ae
2
Lec. 10: Making Assumptions
Important Function In Excel
 frequency f for each class: FREQUENCY(Data_array,bins_array)
 The mean: AVERAGE(number1, number2…)
 The median: MEDIAN(number1, number2…)
 The mode: Mode(number1, number2…)
 𝑧 =
𝑎−𝜇
𝜎
 P(x≤𝑎)=𝐹(𝑎)=NORMSDIST(z)= NORM.DIST(a, mean,standard_dev,TRUE)
 Z-score: zα/2= NORMSINV(α/2)
 T-score: t α/2= T.INV(α/2,n-1)
 The error margin E:
o E=CONFIDENCE(α,σ,n), n≥30
o E=CONFIDENCE.T(α,s,n), n<30
3
Available and Missing Variables
When modeling a system, encountering missing data is common.
 What shall a modeler do in the case of unknown or missing information?
Available and Missing Variables
 When dealing with missing data, it is critical to make correct assumptions to ensure that
the system is accurate.
 One must common strategy for handling such situations is calculate the average of
available data for the similar existing systems (i.e., creating sampling data).
 Use this average as a reasonable estimate for the missing value.
However, be cautious
 Variability: If there are significant variations (high standard deviation) in collected data
from the experiment, relying solely on historical averages may not be accurate.
 Instead the modeller use the Confidence Interval to defining the range of the value of the
missing data
 In such a case, two approach exists:
 Normal Distribution and Z-Test: the standard deviation of the population is known and sample size is
greater than 30
 Normal Distribution and t-Test: the standard deviation of the population is unknown or sample size is
smaller than 30
6
Continuous Random Variables
 Continuous random variables play a crucial role in probability and statistics, dealing with
scenarios where the variable can take on any value within a specific range. Unlike discrete
variables (which have whole number values), continuous variables represent a spectrum of
possibilities.
7
Probability
 Probability statement describes the likelihood that a particular value occurs.
 The likelihood is quantified by assigning a number from the interval [0, 1] to the set of
values (or a percentage from 0 to 100%).
 A probability is usually expressed in terms of a random variable, e.g., P(x)=80%.
 Higher numbers indicate that the set of values is more likely.
8
Probability Distribution Types
9
Probability Density Function Rules
 The probability distribution of x is described by a density curve.
o f (x) is the probability density function (pdf)
 The probability density cannot be negative, f(x)≥0
 The total area under the curve must be 1.
 The probability of a continuous random variable is not defined at specific values.
 If x is continuous, then for any number c, P(x = c) = 0.
 Instead, it is defined over an interval of value, P(a≤ x ≤b).
 P(a≤ x ≤b) is the shaded area below the pdf
10
  ( )
b
a
P a X b f x dx
   
Cumulative Distribution Function
 F (x) is cumulative distribution function (cdf) 𝑑𝐹(𝑥)/𝑑𝑥 = f(x)
 The probability of any value of x below x0, equals the area under the density curve to the left
of x0.
𝑃 𝑥 ≤ 𝑥0 = 𝑃(𝑥 < 𝑥0) = 𝐹(𝑥0) =
−∞
𝑥0
𝑓 𝑥 𝑑𝑥
𝑃 𝑥 ≥ 𝑥0 = 1 − 𝐹(𝑥0) = 1 −
−∞
𝑥0
𝑓 𝑥 𝑑𝑥
 For any two numbers a and b with a < b, then:
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = 𝐹 𝑏 − 𝐹 𝑎 =
𝑎
𝑏
𝑓 𝑥 𝑑𝑥
11
Exercise: Reaction Time
The following cumulative distribution function approximates the time until a chemical reaction
is completed (in milliseconds, ms):
 What is the Probability density function?
 What proportion of reactions is complete within 200 ms?
12
   0.01
0 for 0
1 for 0 x
x
x
F x
e


 
 
 

0.01 0.01
0 0 for 0
1 0.01 for 0 x
x x
dF x d x
f x
e e
dx dx
 
 
  
  

    2
200 200 1 0.8647
P X F e
    
Normal Distribution
 The famous "bell curve" distribution!
 Key characteristics:
o Symmetrical around the mean (μ).
The total area under the curve is 1.0, so half is above the mean, half is below
o Standard deviation (σ) controls the spread of the curve.
Larger σ indicates a wider spread of values.
o It is usually referred to by N(μ, σ)
 Equation
16
Cumulative Distribution Function of Normal Distribution
 𝐹(𝑥0) = −∞
𝑥0
𝑓 𝑥 𝑑𝑥
 P(x ≤ 𝑎) = 𝐹(𝑎) = NORM.DIST(a, mean,standard_dev,TRUE)
 P x ≥ 𝑎 = 1 − 𝐹(𝑎)
 P 𝑎 ≤ x ≤ 𝑏 = 𝐹 𝑏 − 𝐹 𝑎
17
Standard Normal Distribution
 The standard normal distribution, also called the z-distribution, is a special normal
distribution where the mean is 0 and the standard deviation is 1.
 The CDF doesn't have a simple closed-form expression and usually requires tables.
o In Excel: P(x≤𝑎)=𝐹(𝑎)=NORMSDIST(z)
18
19
In Excel:
=NORMSDIST(z)
20
In Excel:
=NORMSDIST(z)
Unusual values Standard deviation
 Unusual values occur outside the range -2 ≤ z ≤ 2 (or µ-2 σ ≤ x ≤ µ-2 σ)
Example 1: Young Women’s Heights
The height of young women can be defined as a continuous random variable (Y) with a
probability distribution is N(64, 2.7).
A. What is the probability that a randomly chosen young woman has a height between 68 and
70 inches? P(68 ≤ Y ≤ 70) = ???
22
z 
68 64
2.7
1.4815
z 
7064
2.7
 2.2222
 P(1.48 ≤ Z ≤ 2.22) = P(Z ≤ 2.22) – P(Z ≤ 1.48) = 0.9869 – 0.9308 = 0.0561
There is about a 5.6% chance that a randomly chosen
young woman has a height between 68 and 70 inches.
Example 1: Young Women’s Heights
The height of young women can be defined as a continuous random variable (Y) with a
probability distribution is N(64, 2.7).
B. At 71 inches tall, is Mrs. Daniel unusually tall? P(Y ≤ 70) = ???
23
Yes, Mrs. Daniel is unusually tall because 99.5% of the
population is shorter than her.
z 
7164
2.7
 2.5926 >2
P value: 0.995
Example: Time for Charging
The average battery takes 60 minutes (μ) to get full charged, with a standard deviation (σ) of
10 minutes. We can model the time with a normal distribution N(μ, σ).
A. What percentage of battery takes between 45 and 75 minutes to get full charged?
B. If a manufacturer claims his battery tacks only 32 minutes to get charged, would you
consider this claim unusual?
C. Determine the time for which the probability that a battery takes is less than 0.98.
24
Solution
B. A z-score of -2.8 indicates that the charging time is 2.8 standard
deviations below the mean hours.
 In normal distributions, most values fall within 2 standard deviations
of the mean (around 95%). Values beyond this range are considered
less frequent (around 5% on either tail).
C. We used Goal Seek in Excel to determine the value of x.
25
μ 60.0
σ 10
x 50.0
z -1.00
A P(x<50) 0.1587
ERF 0.1586553 0.158655
x 75.0
z 1.50
P(x<75) 0.9332
ERF 0.9332
P(50<x<75) 77.5%
77.5%
B x 32
z -2.8000
P(x<32) 0.3%
C x 80.56573
z 2.0566
P(x) 98.0% 0.98 Goal seek
Example 2
Example 3: Exam Scores
The scores on the Engineering Statistics Midterm exam can be modeled by a normal
distribution.
A. What is the probability that a randomly chosen engineering student scored between 75 and
90 points on the exam?
B. What is the probability of a student scoring less than 60?
C. What is the probability of a student scoring more than 90?
26
27
Student ID Overall Grade
2022005779 48
2021004896 49
2022005709 53
2022005577 56
2022005690 60
2022005436 60
2022005600 67
2022005480 69
2022005802 70
2018003821 72
2021004786 74
2021004893 74
2022005687 74
2022005560 74
2022005359 74
2022005597 75
2022005446 75
2021005070 76
2022005590 76
2022005710 76
2022005479 77
2022005757 78
2022005580 78
2022005581 78
2020004723 78
2022005565 78
2022005402 78
2022005618 79
2022005625 80
2022005401 80
2022005533 80
2022005616 81
2022005350 81
2021005055 81
2017003079 83
2022005700 85
2022005433 85
2022005685 85
2022005558 86
2021004872 86
2022005448 87
2022005678 88
2022005462 88
2021005252 88
2022005636 90
2022005691 91
2023005883 92
2022005535 94
2022005464 95
2021005126 96
2022005413 96
2022005620 97
2022005426 98
2022005444 98
2021004912 100
2022005425 100
2022005663 100
Solution
Scores are typically distributed with a mean (μ)=80.12
and a standard deviation (σ) of 12.35 points.
μ 80.1
σ 12.34741
x 75.0
z -0.4149
A P(x<75) 0.3391
ERF 0.339112
x 90.0
z 0.7999
P(x<90) 0.7881
ERF 0.7881
P(75<x<90) 44.902%
44.902%
B x 60
z -1.6297
P(x<60) 5.2%
C x 90
z 0.7999
P(x>90) 21.19%
21.05%
Example 4
In an electronics lab, PV panels are manufactured with a target capacity of 100 (W). However,
due to slight variations in the manufacturing process, the actual capacity of each panel can be
±2% from the target value. Assume this variation can be modeled using a normal distribution.
A. What is the probability of a panel’s capacity below 98 W
B. What is the probability of a panel’s capacity below 102 W
C. What is the probability of a panel meeting the target with the stated accuracy
D. What is the probability of a panel’s capacity exceeding 104 W
28
Central Limit Theorem
29
Sample Mean
30
Central Limit Theorem
 The Central Limit Theorem (CLT) is a fundamental concept in statistics that describes the
behavior of sample mean drawn from a population, regardless of the shape of the
population's distribution.
 The CLT states that as the sample size increases, the distribution of the sample means
(average of values in each sample) will tend towards a normal distribution
o This is true even if the original population distribution is not normal (e.g., skewed, uniform)
 The normal distribution of the sample mean has
o a mean of 𝜇𝑥 = 𝜇
o And standard error of the sample mean (standard deviation) of 𝜎𝑥 =
𝜎
𝑛
31
From the Book
 If we are sampling from a population that has an unknown probability distribution, the
distribution of the sample mean will still be approximately normal with mean μ and
variance σ2/n if the sample size n is large. The statement is as follows:
32
Applications of the CLT
 The CLT allows us to apply statistical methods and tools that rely on normal distributions to
data from populations that might not be normally distributed themselves.
o This is incredibly useful because the normal distribution is well-understood and has many
established properties, making it easier to perform calculations and draw inferences from data.
 The CLT allows statisticians to make inferences about population parameters based on
sample data, even when the population distribution is unknown or non-normal.
33
Example 5
A factory produces metal widgets. The weights of these widgets are known to follow
a uniform distribution between 10 grams and 12 grams.
How does the variability of the average weight change with different sample sizes?
 Solution:
o If we take a small sample (e.g., 3 widgets), the average weight of that sample could be
anywhere between 10 grams and 12 grams, depending on which specific widgets were
chosen. The variability of these small sample will be high.
o According to the CLT, as the sample size increases (e.g., 30 widgets or more), the
distribution of sample means will approach a normal distribution. The variability of these
sample means will become smaller, even though the original weight distribution was
uniform.
34
# widgets weight σ(n=3) σ(n=30)
1 12 1 0.803
2 10
3 11
4 12
5 10
6 11
7 10
8 12
9 10
10 11
11 12
12 11
13 11
14 11
15 10
16 12
17 11
18 10
19 12
20 11
21 12
22 12
23 12
24 12
25 10
26 11
27 11
28 12
29 11
30 10
Example 6
 An electronics company manufactures resistors that have a mean resistance of 100 ohms
and a standard deviation of 10 ohms. Find the probability that a random sample of n = 25
resistors will have an average resistance of fewer than 95 ohms.
 Solution
35
Applications of the CLT
 The CLT allows us to apply statistical methods and tools that rely on normal distributions to
data from populations that might not be normally distributed themselves.
o This is incredibly useful because the normal distribution is well-understood and has many
established properties, making it easier to perform calculations and draw inferences from data.
 The CLT allows statisticians to make inferences about population parameters based on
sample data, even when the population distribution is unknown or non-normal.
36
Confidence Interval (CI)
37
Introduction
 Suppose you are studying the heights of students at AURAK.
o You take a random sample from the population and establish a mean height of 𝑥 = 170 cm.
o The mean of 𝑥 = 170 cm is a point estimate of the population mean.
o A point estimate by itself is of limited usefulness because it does not reveal the uncertainty associated
with the estimate;
 What's missing is the degree of uncertainty in this single sample.
 Namely, if you take another sample of students, very likely
to end up with a mean height that differs from 170 cm.
38
Confidence Interval (CI)
 CI is a statistical method used to estimate a population parameter (e.g., mean, proportion)
with a certain level of confidence.
 It provides a range of values that are likely to contain the true population parameter.
 The range is expressed as a lower and upper bound, often denoted by +/- a margin of error
around the sample parameter (e.g., sample mean for population mean).
39
Development Of The Confidence Interval
 we know that the sample mean 𝑥 is normally distributed with mean μ and variance σ2/ n.
 The z-score given by:
 A confidence interval estimate for μ is an interval of the form
L ≤ μ ≤ U
where the end-points L and U are computed from the different sample data.
40
Determining the end-points L and U
 Suppose that we can determine values of L and U such that the following probability
statement is true:
1-α is called the confidence coefficient.
 Because has a standard normal distribution, we can write
 This can be rewritten by
41
Guidelines for determining the Interval
• σ is known and n≥30
 E=CONFIDENCE(α,s,n) =
 Zα/2=NORMSINV(α/2)
42
• σ is unknown or n<30
 E=CONFIDENCE.T(α,s,n) =
 tα/2=T.INV(α/2,d.f)
 The mean of the population is given by μ = 𝑥 ± 𝐸
Example 7: Modelling Solar PV System
You are creating a simulation model for solar PV system for a residential
house. For this purpose, the energy consumption of households is needed.
You collect a random sample of 25 households and the energy consumption
values were.
1. Construct a 95% confidence interval for the monthly energy
consumption of a house.
2. Assume that you need to be in safe side by sizing the solar PV system
that cover 70% of the population, what is the energy consumption of the
house need to be assumed.
43
#house
Energy
Consumption
1 1572
2 1552
3 1431
4 1595
5 1500
6 1493
7 1449
8 1459
9 1506
10 1426
11 1515
12 1575
13 1524
14 1551
15 1432
16 1496
17 1586
18 1562
19 1508
20 1405
21 1533
22 1558
23 1464
24 1482
25 1402
Solution
44
x 1503.040
S 58.08003099
L 1480.27
z -1.9600
P(x<1480.27304620705) 2.5%
U 1525.807
z 1.9600
P(x<1525.80695379295) 97.5%
P(1480.27304620705<x<1525.80695379295) 95%
Different form
n 25
x 1503.04
S 58.08003
z 1.959964
Ez 22.76695
L 1480
U 1526
Et 23.97426
L 1479
U 1527
x 1503.040
S 58.08003099
x 1509.13
z 0.5244
P(x<1509.13145519591) 70.0%
Slection the value of the energy
consumption
P 𝐿 ≤ 𝜇 ≤ 𝑈 = 0.95
Error vs. Confidence
 There is a trade-off between acceptable error (or
required precision) and confidence.
o When you are required to be precise, you are less
confident.
o When greater error is allowed, you can be more confident.
 When in Doubt: If you're unsure about the population
standard deviation or the sample size is small, it's safer to use
a t-score test to ensure the robustness of the statistical
inference.
45
Z-distribution vs. t-distribution
Example 8: T-test
 Repeat example 7 using CL=98%
 What is the error margin in this case?
 Repeat example 1 using t-test.
 What is the error margin in this case?
46
Ez 22.8
L 1480
U 1526
Et 24.0
L 1479
U 1527
Ez 27.0
L 1476
U 1530
Example 9: 3DMark TimeSpy
You have conducted a 3DMark TimeSpy test on your laptop and have compared the result with
other laptops with the same GPU and CPU specifications. You have collected the results of the
top 100 of these tests.
1. Calculate the mean, median, and mode of the 3DMark TimeSpy scores. What do these
measures tell you about the distribution of the scores?
2. Calculate the standard deviation of the scores. What does this tell you about the variability
of the scores?
47
Solution
1. the mean, median, and mode:
 The mean, median, and mode can tell us about the distribution of the scores.
o If the mean, median, and mode are all close to each other, it suggests that the scores are symmetrically distributed.
o If the mean is greater than the mode, it suggests that the scores are right-skewed (i.e., there are a few very high
scores).
o If the mean is less than the median, it suggests that the scores are left-skewed (i.e., there are a few very low scores).
2. The standard deviation, s, tells us about the variability of the scores.
o If s is small (s/𝑥 < 10%), it means that most of the scores are close to the mean, indicating that the performance of
the laptops is quite consistent.
o If s is large (s/𝑥 > 10%), it means that the scores are spread out over a wider range, indicating more variability in
performance.
 Since the results reveal that s/𝑥=7%, one can say that the performance of the laptops is quite consistent.
48
S Mean Media Mode
340.5119 5238 5114 5113
Example 10
A battery manufacturer wishes to investigate the tread life of its batteries. A sample of 10
batteries used 5000 cycles revealed a sample mean of 12% degradation in battery performance
with a standard deviation of 2%. Construct a 95 percent confidence interval for the population
mean.
1. Would it be legal for the manufacturer to claim that after 5000 cycles the degradation in
battery performance is 10%?
2. Compute the required C.L. that makes the 10% degradation is accepted value.
49
Solution
1. The value of 10% is not in the confidence interval. Hence, we conclude that the population
mean is unlikely to be 10%.
2. The required CL is 99%
50
n 10
x
12
S 2
CL 95%
α 5%
z 1.959963985
Ez 1.239590065
L 11
U 13
Et 1.43
L 10.6
U 13.4
Determining Sample Size
51
Confidence Level and Precision of Estimation
 Our choice of confidence level (CL) is essentially arbitrary.
o if we had chosen a level of confidence, say, CL=95%, the length of the confidence interval (CI) is
o if we had chosen a higher level of confidence, say, 99%? the length of the CI is
 This is why we are more confident with 99% than 95%.
52
Procedures Of Making Assumption
 Select your sample, n readings
 Determine the mean value of the sample, x
̅
 Determine the standard deviation of the sample, s
 Select your confidence level, 90%, 95%, or 99%
 Calculate the z-score (σ in known and n≥30), or t-score (σ in unknown or n<30)
o As sample size increases (exceeding 30), Z-test and T-test results converge.
 Determine the margin of error
 Then, the mean value of the population, μ, is:
Choice Of Sample Size
 The length of a confidence interval is a measure of the precision of estimation.
 The precision is inversely related to the confidence level.
 This means that in using 𝑥 to estimate μ, the error is less than or equal to E:
 3 factors determine the size of a sample:
o The level of confidence selected.
o The maximum allowable error.
o The variation in the population.
54
Sample Size
 It is desirable to obtain a confidence interval that is short enough for decision-making purposes
and that also has adequate confidence.
 One way to achieve this is by choosing the sample size “n” to be large enough to give a CI of
specified length or precision with given confidence
 Given a confidence level and a maximum error of estimate (error margin), E, the minimum
sample size n needed to estimate the population mean, is:
𝑛 =
𝑧𝛼/2 𝜎
𝐸
2
𝑜𝑟 𝑛 =
𝑡𝛼/2 𝑠
𝐸
2
 Where;
o E is the allowable error
o Z is the z-score corresponding to the selected level of confidence
o S is the standard deviation (of sample)
o σ is the standard deviation (of population)
55
Notice That:
 As the desired length of the interval “2E” decreases, the required sample size “n” increases
for a fixed value of “σ” and specified confidence.
 As “σ” increases, the required sample size n increases for a fixed desired length 2E and
specified confidence.
 As the confidence level decreases, the required sample size “n” decreases for fixed desired
length “2E” and standard deviation “σ”.
56
𝑛 =
𝑧𝛼/2 𝜎
𝐸
2
𝑜𝑟 𝑛 =
𝑡𝛼/2 𝑠
𝐸
2
Example 11
A random sample of 32 textbook prices is taken from a local college bookstore. The mean of
the sample is 𝑥 ̅ = 74.22, and the sample standard deviation is s = 23.44.
1. What is the error margin at confidence level of 99%
2. How many books must be included in your sample if you want to be 99% confident that the
sample mean is within $5 of the population mean?
3. Repeat question 2 assuming the standard deviation is s = 24.44
4. Repeat question 2 using 95% confidence level
57
Solution
58
zc = 2.575
x =74.22   s = 23.44
 145.7Always round up
2. You should include at least 146 books in your sample.
n 32 n 146 n 159
9%
n 85
x
74.22
x
74.22
x
74.22
x
74.22
S 23.44 S 23.44 S 24.44 4% S 23.44
CL 99% CL 99% CL 99% CL 95%
α 1% α 1% α 1% α 5%
z 2.58 z 2.58 z 2.58 z 1.96
Ez 10.67 10.67 Ez 5.00 5.00 Ez 4.993 4.99 Ez 4.983 4.98
1 2 3 4
Example 12: from my Students’ SDP
The students wants to estimate the cooling requirement of a building. The U-value of the wall is
needed and unknown for the building under investigation. A sample of 9 buildings reveals the
following values of U.
1. What is the length of the CI at confidence level of 95%
2. What is the endpoints of the confidence interval of U-value
3. What is required sample size to maintain the error less than 3%, with CL=95%.
sample
U-value
(W/m2.K)
1 1.80
2 1.85
3 1.90
4 2.00
5 2.05
6 2.10
7 2.25
8 2.30
9 2.40
Solution
n 9
mean 2.07
S
tandard deviation 0.2093
CL 95.00%
α 5.00%
t-score 2.306 z-score 1.9600
0.1609 0.1368
0.1609 7.8% 0.1368 6.6%
Q1 CI 0.32 0.322 0.27
Umin 1.911 1.935
Umax 2.233 2.209
P(x1≤x≤x2) 95.0%
Required E(%) 3%
Target E 0.062167 W/m2.K
n 61 44
Q3
E
z
Et
Q
2
the
endpoints
60
Example 13
The plan is to design a PV system to supply the required energy for mosques. Therefore the energy
consumption is required. However, the energy consumption of the mosque subject of the study is not
known. The area and volume of the mosque are 250 m2, and 750 m3, respectively.
Based on the collected data from another mosques, and using the confidence interval of 95%:
1. how many mosques need to be collected in the survey to assume the average energy consumption
with an error of 10%.
2. What is the interval of the energy consumption
Area Volume
Total Annual Current
Consumption (A)
Total Annual Power
Consumption (kW)
867.6 6940.8 170,068 37415
198 1584 28299 6226
216 1512 54718 12038
375.24 1200.768 116021 25525
453.3136 2946.5384 83707 18416
168.4535 1094.94775 85344 18776
386.1685 3282.43225 152925 33644
118.4625 770.00625 67142 14771
342.21 2908.785 81040 17829
330 2310 50539 11119
Solution
sample Energy (kWh/m2)
1 43.1 n 10.00
2 31.4 mean 64.80
3 55.7 Standard deviation 31.0658 Error
4 68.0 Observation 2000.000 27.8%
5 40.6 U≤Observation 100.0%
6 111.5 CL 90.00%
7 87.1 α 10.00%
8 124.7 t-score 1.833 z-score 1.645
9 52.1 Error, CL=0.90 18.0083 18.0083 16.15882
10 33.7 EAmin 46.793
EAmax 82.809
Target Error (%) 10%
Target Error, CL=0.90 6.4801
z-score 1.645
Required n 63
Exercises: From the Book 8-14 & 8-16
 The life in hours of a 75-watt light bulb is known to be normally distributed with σ = 25
hours. A random sample of 20 bulbs has a mean life of x = 1014 hours.
(a) Construct a 95% two-sided confidence interval on the mean life.
(b) Suppose that we wanted the error in estimating the mean life from the two-sided confidence
interval to be five hours at 95% confidence. What sample size should be used?
63
Solution
n 20
x
1014
σ 25
CL 95%
α 5%
z 1.96
Ez 10.96 10.96
CI 21.91 21.913
L 1003.043
U 1024.957
Target Error (%) 10%
Target Error 5.0000
z-score 1.960
Required n 97
1
64
Thank You…
65

More Related Content

Similar to Lec. 10: Making Assumptions of Missing data

Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statisticsTarun Gehlot
 
5-Propability-2-87.pdf
5-Propability-2-87.pdf5-Propability-2-87.pdf
5-Propability-2-87.pdfelenashahriari
 
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
02 PSBE3_PPT.Ch01_2_Examining Distribution.pptBishoyRomani
 
QT1 - 06 - Normal Distribution
QT1 - 06 - Normal DistributionQT1 - 06 - Normal Distribution
QT1 - 06 - Normal DistributionPrithwis Mukerjee
 
QT1 - 06 - Normal Distribution
QT1 - 06 - Normal DistributionQT1 - 06 - Normal Distribution
QT1 - 06 - Normal DistributionPrithwis Mukerjee
 
Standard deviation
Standard deviationStandard deviation
Standard deviationMai Ngoc Duc
 
Probility distribution
Probility distributionProbility distribution
Probility distributionVinya P
 
8. normal distribution qt pgdm 1st semester
8. normal distribution qt pgdm 1st  semester8. normal distribution qt pgdm 1st  semester
8. normal distribution qt pgdm 1st semesterKaran Kukreja
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docxsimonithomas47935
 
Lecture 6 Normal Distribution.pptx
Lecture 6 Normal Distribution.pptxLecture 6 Normal Distribution.pptx
Lecture 6 Normal Distribution.pptxABCraftsman
 
normaldistribution.pptxxxxxxxxxxxxxxxxxx
normaldistribution.pptxxxxxxxxxxxxxxxxxxnormaldistribution.pptxxxxxxxxxxxxxxxxxx
normaldistribution.pptxxxxxxxxxxxxxxxxxxAliceRivera13
 
normaldistribution.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
normaldistribution.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxnormaldistribution.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
normaldistribution.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxAliceRivera13
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributionsnszakir
 
Normal Distribution slides(1).pptx
Normal Distribution slides(1).pptxNormal Distribution slides(1).pptx
Normal Distribution slides(1).pptxKinzaSuhail2
 
raghu veera stats.ppt
raghu veera stats.pptraghu veera stats.ppt
raghu veera stats.pptDevarajuBn
 

Similar to Lec. 10: Making Assumptions of Missing data (20)

Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statistics
 
5-Propability-2-87.pdf
5-Propability-2-87.pdf5-Propability-2-87.pdf
5-Propability-2-87.pdf
 
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
02 PSBE3_PPT.Ch01_2_Examining Distribution.ppt
 
QT1 - 06 - Normal Distribution
QT1 - 06 - Normal DistributionQT1 - 06 - Normal Distribution
QT1 - 06 - Normal Distribution
 
QT1 - 06 - Normal Distribution
QT1 - 06 - Normal DistributionQT1 - 06 - Normal Distribution
QT1 - 06 - Normal Distribution
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Probility distribution
Probility distributionProbility distribution
Probility distribution
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
8. normal distribution qt pgdm 1st semester
8. normal distribution qt pgdm 1st  semester8. normal distribution qt pgdm 1st  semester
8. normal distribution qt pgdm 1st semester
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
 
Lecture 6 Normal Distribution.pptx
Lecture 6 Normal Distribution.pptxLecture 6 Normal Distribution.pptx
Lecture 6 Normal Distribution.pptx
 
normaldistribution.pptxxxxxxxxxxxxxxxxxx
normaldistribution.pptxxxxxxxxxxxxxxxxxxnormaldistribution.pptxxxxxxxxxxxxxxxxxx
normaldistribution.pptxxxxxxxxxxxxxxxxxx
 
normaldistribution.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
normaldistribution.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxnormaldistribution.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
normaldistribution.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 
Talk 3
Talk 3Talk 3
Talk 3
 
Density Curves and Normal Distributions
Density Curves and Normal DistributionsDensity Curves and Normal Distributions
Density Curves and Normal Distributions
 
The Standard Normal Distribution
The Standard Normal Distribution  The Standard Normal Distribution
The Standard Normal Distribution
 
Inferential statistics-estimation
Inferential statistics-estimationInferential statistics-estimation
Inferential statistics-estimation
 
Normal Distribution slides(1).pptx
Normal Distribution slides(1).pptxNormal Distribution slides(1).pptx
Normal Distribution slides(1).pptx
 
Probability concept and Probability distribution_Contd
Probability concept and Probability distribution_ContdProbability concept and Probability distribution_Contd
Probability concept and Probability distribution_Contd
 
raghu veera stats.ppt
raghu veera stats.pptraghu veera stats.ppt
raghu veera stats.ppt
 

Recently uploaded

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 

Recently uploaded (20)

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 

Lec. 10: Making Assumptions of Missing data

  • 1. Energy Systems Modeling and Optimization Dr. Mohamad Kharseh Office: G 342 mohamad.kharseh@aurak.ac.ae
  • 2. 2 Lec. 10: Making Assumptions
  • 3. Important Function In Excel  frequency f for each class: FREQUENCY(Data_array,bins_array)  The mean: AVERAGE(number1, number2…)  The median: MEDIAN(number1, number2…)  The mode: Mode(number1, number2…)  𝑧 = 𝑎−𝜇 𝜎  P(x≤𝑎)=𝐹(𝑎)=NORMSDIST(z)= NORM.DIST(a, mean,standard_dev,TRUE)  Z-score: zα/2= NORMSINV(α/2)  T-score: t α/2= T.INV(α/2,n-1)  The error margin E: o E=CONFIDENCE(α,σ,n), n≥30 o E=CONFIDENCE.T(α,s,n), n<30 3
  • 4. Available and Missing Variables When modeling a system, encountering missing data is common.  What shall a modeler do in the case of unknown or missing information?
  • 5. Available and Missing Variables  When dealing with missing data, it is critical to make correct assumptions to ensure that the system is accurate.  One must common strategy for handling such situations is calculate the average of available data for the similar existing systems (i.e., creating sampling data).  Use this average as a reasonable estimate for the missing value.
  • 6. However, be cautious  Variability: If there are significant variations (high standard deviation) in collected data from the experiment, relying solely on historical averages may not be accurate.  Instead the modeller use the Confidence Interval to defining the range of the value of the missing data  In such a case, two approach exists:  Normal Distribution and Z-Test: the standard deviation of the population is known and sample size is greater than 30  Normal Distribution and t-Test: the standard deviation of the population is unknown or sample size is smaller than 30 6
  • 7. Continuous Random Variables  Continuous random variables play a crucial role in probability and statistics, dealing with scenarios where the variable can take on any value within a specific range. Unlike discrete variables (which have whole number values), continuous variables represent a spectrum of possibilities. 7
  • 8. Probability  Probability statement describes the likelihood that a particular value occurs.  The likelihood is quantified by assigning a number from the interval [0, 1] to the set of values (or a percentage from 0 to 100%).  A probability is usually expressed in terms of a random variable, e.g., P(x)=80%.  Higher numbers indicate that the set of values is more likely. 8
  • 10. Probability Density Function Rules  The probability distribution of x is described by a density curve. o f (x) is the probability density function (pdf)  The probability density cannot be negative, f(x)≥0  The total area under the curve must be 1.  The probability of a continuous random variable is not defined at specific values.  If x is continuous, then for any number c, P(x = c) = 0.  Instead, it is defined over an interval of value, P(a≤ x ≤b).  P(a≤ x ≤b) is the shaded area below the pdf 10   ( ) b a P a X b f x dx    
  • 11. Cumulative Distribution Function  F (x) is cumulative distribution function (cdf) 𝑑𝐹(𝑥)/𝑑𝑥 = f(x)  The probability of any value of x below x0, equals the area under the density curve to the left of x0. 𝑃 𝑥 ≤ 𝑥0 = 𝑃(𝑥 < 𝑥0) = 𝐹(𝑥0) = −∞ 𝑥0 𝑓 𝑥 𝑑𝑥 𝑃 𝑥 ≥ 𝑥0 = 1 − 𝐹(𝑥0) = 1 − −∞ 𝑥0 𝑓 𝑥 𝑑𝑥  For any two numbers a and b with a < b, then: 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = 𝐹 𝑏 − 𝐹 𝑎 = 𝑎 𝑏 𝑓 𝑥 𝑑𝑥 11
  • 12. Exercise: Reaction Time The following cumulative distribution function approximates the time until a chemical reaction is completed (in milliseconds, ms):  What is the Probability density function?  What proportion of reactions is complete within 200 ms? 12    0.01 0 for 0 1 for 0 x x x F x e          0.01 0.01 0 0 for 0 1 0.01 for 0 x x x dF x d x f x e e dx dx                2 200 200 1 0.8647 P X F e     
  • 13. Normal Distribution  The famous "bell curve" distribution!  Key characteristics: o Symmetrical around the mean (μ). The total area under the curve is 1.0, so half is above the mean, half is below o Standard deviation (σ) controls the spread of the curve. Larger σ indicates a wider spread of values. o It is usually referred to by N(μ, σ)  Equation 16
  • 14. Cumulative Distribution Function of Normal Distribution  𝐹(𝑥0) = −∞ 𝑥0 𝑓 𝑥 𝑑𝑥  P(x ≤ 𝑎) = 𝐹(𝑎) = NORM.DIST(a, mean,standard_dev,TRUE)  P x ≥ 𝑎 = 1 − 𝐹(𝑎)  P 𝑎 ≤ x ≤ 𝑏 = 𝐹 𝑏 − 𝐹 𝑎 17
  • 15. Standard Normal Distribution  The standard normal distribution, also called the z-distribution, is a special normal distribution where the mean is 0 and the standard deviation is 1.  The CDF doesn't have a simple closed-form expression and usually requires tables. o In Excel: P(x≤𝑎)=𝐹(𝑎)=NORMSDIST(z) 18
  • 18. Unusual values Standard deviation  Unusual values occur outside the range -2 ≤ z ≤ 2 (or µ-2 σ ≤ x ≤ µ-2 σ)
  • 19. Example 1: Young Women’s Heights The height of young women can be defined as a continuous random variable (Y) with a probability distribution is N(64, 2.7). A. What is the probability that a randomly chosen young woman has a height between 68 and 70 inches? P(68 ≤ Y ≤ 70) = ??? 22 z  68 64 2.7 1.4815 z  7064 2.7  2.2222  P(1.48 ≤ Z ≤ 2.22) = P(Z ≤ 2.22) – P(Z ≤ 1.48) = 0.9869 – 0.9308 = 0.0561 There is about a 5.6% chance that a randomly chosen young woman has a height between 68 and 70 inches.
  • 20. Example 1: Young Women’s Heights The height of young women can be defined as a continuous random variable (Y) with a probability distribution is N(64, 2.7). B. At 71 inches tall, is Mrs. Daniel unusually tall? P(Y ≤ 70) = ??? 23 Yes, Mrs. Daniel is unusually tall because 99.5% of the population is shorter than her. z  7164 2.7  2.5926 >2 P value: 0.995
  • 21. Example: Time for Charging The average battery takes 60 minutes (μ) to get full charged, with a standard deviation (σ) of 10 minutes. We can model the time with a normal distribution N(μ, σ). A. What percentage of battery takes between 45 and 75 minutes to get full charged? B. If a manufacturer claims his battery tacks only 32 minutes to get charged, would you consider this claim unusual? C. Determine the time for which the probability that a battery takes is less than 0.98. 24
  • 22. Solution B. A z-score of -2.8 indicates that the charging time is 2.8 standard deviations below the mean hours.  In normal distributions, most values fall within 2 standard deviations of the mean (around 95%). Values beyond this range are considered less frequent (around 5% on either tail). C. We used Goal Seek in Excel to determine the value of x. 25 μ 60.0 σ 10 x 50.0 z -1.00 A P(x<50) 0.1587 ERF 0.1586553 0.158655 x 75.0 z 1.50 P(x<75) 0.9332 ERF 0.9332 P(50<x<75) 77.5% 77.5% B x 32 z -2.8000 P(x<32) 0.3% C x 80.56573 z 2.0566 P(x) 98.0% 0.98 Goal seek Example 2
  • 23. Example 3: Exam Scores The scores on the Engineering Statistics Midterm exam can be modeled by a normal distribution. A. What is the probability that a randomly chosen engineering student scored between 75 and 90 points on the exam? B. What is the probability of a student scoring less than 60? C. What is the probability of a student scoring more than 90? 26
  • 24. 27 Student ID Overall Grade 2022005779 48 2021004896 49 2022005709 53 2022005577 56 2022005690 60 2022005436 60 2022005600 67 2022005480 69 2022005802 70 2018003821 72 2021004786 74 2021004893 74 2022005687 74 2022005560 74 2022005359 74 2022005597 75 2022005446 75 2021005070 76 2022005590 76 2022005710 76 2022005479 77 2022005757 78 2022005580 78 2022005581 78 2020004723 78 2022005565 78 2022005402 78 2022005618 79 2022005625 80 2022005401 80 2022005533 80 2022005616 81 2022005350 81 2021005055 81 2017003079 83 2022005700 85 2022005433 85 2022005685 85 2022005558 86 2021004872 86 2022005448 87 2022005678 88 2022005462 88 2021005252 88 2022005636 90 2022005691 91 2023005883 92 2022005535 94 2022005464 95 2021005126 96 2022005413 96 2022005620 97 2022005426 98 2022005444 98 2021004912 100 2022005425 100 2022005663 100 Solution Scores are typically distributed with a mean (μ)=80.12 and a standard deviation (σ) of 12.35 points. μ 80.1 σ 12.34741 x 75.0 z -0.4149 A P(x<75) 0.3391 ERF 0.339112 x 90.0 z 0.7999 P(x<90) 0.7881 ERF 0.7881 P(75<x<90) 44.902% 44.902% B x 60 z -1.6297 P(x<60) 5.2% C x 90 z 0.7999 P(x>90) 21.19% 21.05%
  • 25. Example 4 In an electronics lab, PV panels are manufactured with a target capacity of 100 (W). However, due to slight variations in the manufacturing process, the actual capacity of each panel can be ±2% from the target value. Assume this variation can be modeled using a normal distribution. A. What is the probability of a panel’s capacity below 98 W B. What is the probability of a panel’s capacity below 102 W C. What is the probability of a panel meeting the target with the stated accuracy D. What is the probability of a panel’s capacity exceeding 104 W 28
  • 28. Central Limit Theorem  The Central Limit Theorem (CLT) is a fundamental concept in statistics that describes the behavior of sample mean drawn from a population, regardless of the shape of the population's distribution.  The CLT states that as the sample size increases, the distribution of the sample means (average of values in each sample) will tend towards a normal distribution o This is true even if the original population distribution is not normal (e.g., skewed, uniform)  The normal distribution of the sample mean has o a mean of 𝜇𝑥 = 𝜇 o And standard error of the sample mean (standard deviation) of 𝜎𝑥 = 𝜎 𝑛 31
  • 29. From the Book  If we are sampling from a population that has an unknown probability distribution, the distribution of the sample mean will still be approximately normal with mean μ and variance σ2/n if the sample size n is large. The statement is as follows: 32
  • 30. Applications of the CLT  The CLT allows us to apply statistical methods and tools that rely on normal distributions to data from populations that might not be normally distributed themselves. o This is incredibly useful because the normal distribution is well-understood and has many established properties, making it easier to perform calculations and draw inferences from data.  The CLT allows statisticians to make inferences about population parameters based on sample data, even when the population distribution is unknown or non-normal. 33
  • 31. Example 5 A factory produces metal widgets. The weights of these widgets are known to follow a uniform distribution between 10 grams and 12 grams. How does the variability of the average weight change with different sample sizes?  Solution: o If we take a small sample (e.g., 3 widgets), the average weight of that sample could be anywhere between 10 grams and 12 grams, depending on which specific widgets were chosen. The variability of these small sample will be high. o According to the CLT, as the sample size increases (e.g., 30 widgets or more), the distribution of sample means will approach a normal distribution. The variability of these sample means will become smaller, even though the original weight distribution was uniform. 34 # widgets weight σ(n=3) σ(n=30) 1 12 1 0.803 2 10 3 11 4 12 5 10 6 11 7 10 8 12 9 10 10 11 11 12 12 11 13 11 14 11 15 10 16 12 17 11 18 10 19 12 20 11 21 12 22 12 23 12 24 12 25 10 26 11 27 11 28 12 29 11 30 10
  • 32. Example 6  An electronics company manufactures resistors that have a mean resistance of 100 ohms and a standard deviation of 10 ohms. Find the probability that a random sample of n = 25 resistors will have an average resistance of fewer than 95 ohms.  Solution 35
  • 33. Applications of the CLT  The CLT allows us to apply statistical methods and tools that rely on normal distributions to data from populations that might not be normally distributed themselves. o This is incredibly useful because the normal distribution is well-understood and has many established properties, making it easier to perform calculations and draw inferences from data.  The CLT allows statisticians to make inferences about population parameters based on sample data, even when the population distribution is unknown or non-normal. 36
  • 35. Introduction  Suppose you are studying the heights of students at AURAK. o You take a random sample from the population and establish a mean height of 𝑥 = 170 cm. o The mean of 𝑥 = 170 cm is a point estimate of the population mean. o A point estimate by itself is of limited usefulness because it does not reveal the uncertainty associated with the estimate;  What's missing is the degree of uncertainty in this single sample.  Namely, if you take another sample of students, very likely to end up with a mean height that differs from 170 cm. 38
  • 36. Confidence Interval (CI)  CI is a statistical method used to estimate a population parameter (e.g., mean, proportion) with a certain level of confidence.  It provides a range of values that are likely to contain the true population parameter.  The range is expressed as a lower and upper bound, often denoted by +/- a margin of error around the sample parameter (e.g., sample mean for population mean). 39
  • 37. Development Of The Confidence Interval  we know that the sample mean 𝑥 is normally distributed with mean μ and variance σ2/ n.  The z-score given by:  A confidence interval estimate for μ is an interval of the form L ≤ μ ≤ U where the end-points L and U are computed from the different sample data. 40
  • 38. Determining the end-points L and U  Suppose that we can determine values of L and U such that the following probability statement is true: 1-α is called the confidence coefficient.  Because has a standard normal distribution, we can write  This can be rewritten by 41
  • 39. Guidelines for determining the Interval • σ is known and n≥30  E=CONFIDENCE(α,s,n) =  Zα/2=NORMSINV(α/2) 42 • σ is unknown or n<30  E=CONFIDENCE.T(α,s,n) =  tα/2=T.INV(α/2,d.f)  The mean of the population is given by μ = 𝑥 ± 𝐸
  • 40. Example 7: Modelling Solar PV System You are creating a simulation model for solar PV system for a residential house. For this purpose, the energy consumption of households is needed. You collect a random sample of 25 households and the energy consumption values were. 1. Construct a 95% confidence interval for the monthly energy consumption of a house. 2. Assume that you need to be in safe side by sizing the solar PV system that cover 70% of the population, what is the energy consumption of the house need to be assumed. 43 #house Energy Consumption 1 1572 2 1552 3 1431 4 1595 5 1500 6 1493 7 1449 8 1459 9 1506 10 1426 11 1515 12 1575 13 1524 14 1551 15 1432 16 1496 17 1586 18 1562 19 1508 20 1405 21 1533 22 1558 23 1464 24 1482 25 1402
  • 41. Solution 44 x 1503.040 S 58.08003099 L 1480.27 z -1.9600 P(x<1480.27304620705) 2.5% U 1525.807 z 1.9600 P(x<1525.80695379295) 97.5% P(1480.27304620705<x<1525.80695379295) 95% Different form n 25 x 1503.04 S 58.08003 z 1.959964 Ez 22.76695 L 1480 U 1526 Et 23.97426 L 1479 U 1527 x 1503.040 S 58.08003099 x 1509.13 z 0.5244 P(x<1509.13145519591) 70.0% Slection the value of the energy consumption P 𝐿 ≤ 𝜇 ≤ 𝑈 = 0.95
  • 42. Error vs. Confidence  There is a trade-off between acceptable error (or required precision) and confidence. o When you are required to be precise, you are less confident. o When greater error is allowed, you can be more confident.  When in Doubt: If you're unsure about the population standard deviation or the sample size is small, it's safer to use a t-score test to ensure the robustness of the statistical inference. 45 Z-distribution vs. t-distribution
  • 43. Example 8: T-test  Repeat example 7 using CL=98%  What is the error margin in this case?  Repeat example 1 using t-test.  What is the error margin in this case? 46 Ez 22.8 L 1480 U 1526 Et 24.0 L 1479 U 1527 Ez 27.0 L 1476 U 1530
  • 44. Example 9: 3DMark TimeSpy You have conducted a 3DMark TimeSpy test on your laptop and have compared the result with other laptops with the same GPU and CPU specifications. You have collected the results of the top 100 of these tests. 1. Calculate the mean, median, and mode of the 3DMark TimeSpy scores. What do these measures tell you about the distribution of the scores? 2. Calculate the standard deviation of the scores. What does this tell you about the variability of the scores? 47
  • 45. Solution 1. the mean, median, and mode:  The mean, median, and mode can tell us about the distribution of the scores. o If the mean, median, and mode are all close to each other, it suggests that the scores are symmetrically distributed. o If the mean is greater than the mode, it suggests that the scores are right-skewed (i.e., there are a few very high scores). o If the mean is less than the median, it suggests that the scores are left-skewed (i.e., there are a few very low scores). 2. The standard deviation, s, tells us about the variability of the scores. o If s is small (s/𝑥 < 10%), it means that most of the scores are close to the mean, indicating that the performance of the laptops is quite consistent. o If s is large (s/𝑥 > 10%), it means that the scores are spread out over a wider range, indicating more variability in performance.  Since the results reveal that s/𝑥=7%, one can say that the performance of the laptops is quite consistent. 48 S Mean Media Mode 340.5119 5238 5114 5113
  • 46. Example 10 A battery manufacturer wishes to investigate the tread life of its batteries. A sample of 10 batteries used 5000 cycles revealed a sample mean of 12% degradation in battery performance with a standard deviation of 2%. Construct a 95 percent confidence interval for the population mean. 1. Would it be legal for the manufacturer to claim that after 5000 cycles the degradation in battery performance is 10%? 2. Compute the required C.L. that makes the 10% degradation is accepted value. 49
  • 47. Solution 1. The value of 10% is not in the confidence interval. Hence, we conclude that the population mean is unlikely to be 10%. 2. The required CL is 99% 50 n 10 x 12 S 2 CL 95% α 5% z 1.959963985 Ez 1.239590065 L 11 U 13 Et 1.43 L 10.6 U 13.4
  • 49. Confidence Level and Precision of Estimation  Our choice of confidence level (CL) is essentially arbitrary. o if we had chosen a level of confidence, say, CL=95%, the length of the confidence interval (CI) is o if we had chosen a higher level of confidence, say, 99%? the length of the CI is  This is why we are more confident with 99% than 95%. 52
  • 50. Procedures Of Making Assumption  Select your sample, n readings  Determine the mean value of the sample, x ̅  Determine the standard deviation of the sample, s  Select your confidence level, 90%, 95%, or 99%  Calculate the z-score (σ in known and n≥30), or t-score (σ in unknown or n<30) o As sample size increases (exceeding 30), Z-test and T-test results converge.  Determine the margin of error  Then, the mean value of the population, μ, is:
  • 51. Choice Of Sample Size  The length of a confidence interval is a measure of the precision of estimation.  The precision is inversely related to the confidence level.  This means that in using 𝑥 to estimate μ, the error is less than or equal to E:  3 factors determine the size of a sample: o The level of confidence selected. o The maximum allowable error. o The variation in the population. 54
  • 52. Sample Size  It is desirable to obtain a confidence interval that is short enough for decision-making purposes and that also has adequate confidence.  One way to achieve this is by choosing the sample size “n” to be large enough to give a CI of specified length or precision with given confidence  Given a confidence level and a maximum error of estimate (error margin), E, the minimum sample size n needed to estimate the population mean, is: 𝑛 = 𝑧𝛼/2 𝜎 𝐸 2 𝑜𝑟 𝑛 = 𝑡𝛼/2 𝑠 𝐸 2  Where; o E is the allowable error o Z is the z-score corresponding to the selected level of confidence o S is the standard deviation (of sample) o σ is the standard deviation (of population) 55
  • 53. Notice That:  As the desired length of the interval “2E” decreases, the required sample size “n” increases for a fixed value of “σ” and specified confidence.  As “σ” increases, the required sample size n increases for a fixed desired length 2E and specified confidence.  As the confidence level decreases, the required sample size “n” decreases for fixed desired length “2E” and standard deviation “σ”. 56 𝑛 = 𝑧𝛼/2 𝜎 𝐸 2 𝑜𝑟 𝑛 = 𝑡𝛼/2 𝑠 𝐸 2
  • 54. Example 11 A random sample of 32 textbook prices is taken from a local college bookstore. The mean of the sample is 𝑥 ̅ = 74.22, and the sample standard deviation is s = 23.44. 1. What is the error margin at confidence level of 99% 2. How many books must be included in your sample if you want to be 99% confident that the sample mean is within $5 of the population mean? 3. Repeat question 2 assuming the standard deviation is s = 24.44 4. Repeat question 2 using 95% confidence level 57
  • 55. Solution 58 zc = 2.575 x =74.22   s = 23.44  145.7Always round up 2. You should include at least 146 books in your sample. n 32 n 146 n 159 9% n 85 x 74.22 x 74.22 x 74.22 x 74.22 S 23.44 S 23.44 S 24.44 4% S 23.44 CL 99% CL 99% CL 99% CL 95% α 1% α 1% α 1% α 5% z 2.58 z 2.58 z 2.58 z 1.96 Ez 10.67 10.67 Ez 5.00 5.00 Ez 4.993 4.99 Ez 4.983 4.98 1 2 3 4
  • 56. Example 12: from my Students’ SDP The students wants to estimate the cooling requirement of a building. The U-value of the wall is needed and unknown for the building under investigation. A sample of 9 buildings reveals the following values of U. 1. What is the length of the CI at confidence level of 95% 2. What is the endpoints of the confidence interval of U-value 3. What is required sample size to maintain the error less than 3%, with CL=95%. sample U-value (W/m2.K) 1 1.80 2 1.85 3 1.90 4 2.00 5 2.05 6 2.10 7 2.25 8 2.30 9 2.40
  • 57. Solution n 9 mean 2.07 S tandard deviation 0.2093 CL 95.00% α 5.00% t-score 2.306 z-score 1.9600 0.1609 0.1368 0.1609 7.8% 0.1368 6.6% Q1 CI 0.32 0.322 0.27 Umin 1.911 1.935 Umax 2.233 2.209 P(x1≤x≤x2) 95.0% Required E(%) 3% Target E 0.062167 W/m2.K n 61 44 Q3 E z Et Q 2 the endpoints 60
  • 58. Example 13 The plan is to design a PV system to supply the required energy for mosques. Therefore the energy consumption is required. However, the energy consumption of the mosque subject of the study is not known. The area and volume of the mosque are 250 m2, and 750 m3, respectively. Based on the collected data from another mosques, and using the confidence interval of 95%: 1. how many mosques need to be collected in the survey to assume the average energy consumption with an error of 10%. 2. What is the interval of the energy consumption Area Volume Total Annual Current Consumption (A) Total Annual Power Consumption (kW) 867.6 6940.8 170,068 37415 198 1584 28299 6226 216 1512 54718 12038 375.24 1200.768 116021 25525 453.3136 2946.5384 83707 18416 168.4535 1094.94775 85344 18776 386.1685 3282.43225 152925 33644 118.4625 770.00625 67142 14771 342.21 2908.785 81040 17829 330 2310 50539 11119
  • 59. Solution sample Energy (kWh/m2) 1 43.1 n 10.00 2 31.4 mean 64.80 3 55.7 Standard deviation 31.0658 Error 4 68.0 Observation 2000.000 27.8% 5 40.6 U≤Observation 100.0% 6 111.5 CL 90.00% 7 87.1 α 10.00% 8 124.7 t-score 1.833 z-score 1.645 9 52.1 Error, CL=0.90 18.0083 18.0083 16.15882 10 33.7 EAmin 46.793 EAmax 82.809 Target Error (%) 10% Target Error, CL=0.90 6.4801 z-score 1.645 Required n 63
  • 60. Exercises: From the Book 8-14 & 8-16  The life in hours of a 75-watt light bulb is known to be normally distributed with σ = 25 hours. A random sample of 20 bulbs has a mean life of x = 1014 hours. (a) Construct a 95% two-sided confidence interval on the mean life. (b) Suppose that we wanted the error in estimating the mean life from the two-sided confidence interval to be five hours at 95% confidence. What sample size should be used? 63
  • 61. Solution n 20 x 1014 σ 25 CL 95% α 5% z 1.96 Ez 10.96 10.96 CI 21.91 21.913 L 1003.043 U 1024.957 Target Error (%) 10% Target Error 5.0000 z-score 1.960 Required n 97 1 64