This document provides an overview of a probability and statistics course, including the textbook, topics covered, evaluation scheme, and sample chapter content. The course covers topics such as descriptive statistics, probability theory, random variables, sampling distributions, and inferential statistics. Evaluation is based on a midterm exam, report, activities, and final exam. The sample chapter discusses descriptive statistics, data types, measures of central tendency (mean, median, mode), measures of dispersion (range, variance, standard deviation), and data representation methods.
Lecture 3 Measures of Central Tendency and Dispersion.pptxshakirRahman10
Objectives:
Define measures of central tendency (mean, median, and mode)
Define measures of dispersion (variance and standard deviation).
Compute the measures of central tendency and Dispersion.
Learn the application of mean and standard deviation using Empirical rule and Tchebyshev’s theorem.
Measures of Central Tendency:
A measure of the central tendency is a value about which the observations tend to cluster.
In other words it is a value around which a data set is centered.
The three most common measures of central tendency are mean, median and mode.
A measure of the central tendency is a value about which the observations tend to cluster.
In other words it is a value around which a data set is centered.
The three most common measures of central tendency are mean, median and mode.
A measure of the central tendency is a value about which the observations tend to cluster.
In other words it is a value around which a data set is centered.
The three most common measures of central tendency are mean, median and mode.
A measure of the central tendency is a value about which the observations tend to cluster.
In other words it is a value around which a data set is centered.
The three most common measures of central tendency are mean, median and mode.
Why is it needed?
To summarize the data.
It provides with a typical value that gives the picture of the entire data set
Mean:
It is the arithmetic average of a set of numbers, It is the most common measure of central tendency.
Computed by summing all values in the data set and dividing the sum by the number of values in the data set Properties:
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set, including extreme values.
Formula:
Mean is calculated by adding all values in the data set and dividing the sum by the number of values in the data set.
Median:
Mid-point or Middle value of the data when the measurements are arranged in ascending order.
A point that divides the data into two equal parts.
Computational Procedure:
Arrange the observations in an ascending order.
If there is an odd number of terms, the median is the middle value and If there is an even number of terms, the median is the average of the middle two terms.
Mode:
The mode is the observation that occurs most frequently in the data set.
There can be more than one mode for a data set OR there maybe no mode in a data set.
Is also applicable to the nominal data.
Comparison of Measures of Central Tendency in Positively Skewed Distributions:
Majority of the data values fall to the left of the mean and cluster at the lower end of the distribution: the tail is to the right Mean, median & mode are different When a distribution has a few extremely high scores, the mean will have a greater value than the median = positively skewed.
Majority of the data values fall to
the right of the mean and cluster at the upper end of the distribution= Negatively Skewed
Chapter 7
Hypothesis Testing Procedures
Learning Objectives
• Define null and research hypothesis, test
statistic, level of significance and decision rule
• Distinguish between Type I and Type II errors
and discuss the implications of each
• Explain the difference between one- and two-
sided tests of hypothesis
Learning Objectives
• Estimate and interpret p-values
• Explain the relationship between confidence interval
estimates and p-values in drawing inferences
• Perform analysis of variance by hand
• Appropriately interpret the results of analysis of
variance tests
• Distinguish between one and two factor analysis of
variance tests
Learning Objectives
• Perform chi-square tests by hand
• Appropriately interpret the results of chi-square tests
• Identify the appropriate hypothesis testing procedures
based on type of outcome variable and number of
samples
Hypothesis Testing
• Research hypothesis is generated about
unknown population parameter
• Sample data are analyzed and determined to
support or refute the research hypothesis
Hypothesis Testing Procedures
Step 1
Null hypothesis (H0):
No difference, no change
Research hypothesis (H1):
What investigator
believes to be true
Hypothesis Testing Procedures
Step 2
Collect sample data and determine whether sample
data support research hypothesis or not.
For example, in test for m, evaluate .
X
Hypothesis Testing Procedures
Step 3
• Set up decision rule to decide when to believe null
versus research hypothesis
• Depends on level of significance, a = P(Reject H0|H0
is true)
Hypothesis Testing Procedures
Steps 4 and 5
• Summarize sample information in test statistic (e.g.,
Z value)
• Draw conclusion by comparing test statistic to
decision rule. Provide final assessment as to whether
H1 is likely true given the observed data.
P-values
• P-values represent the exact significance of the
data
• Estimate p-values when rejecting H0 to
summarize significance of the data (can
approximate with statistical tables, can get
exact value with statistical computing
package)
• P-value is the smallest a where we still reject
H0
Hypothesis Testing Procedures
1. Set up null and research hypotheses, select a
2. Select test statistic
2. Set up decision rule
3. Compute test statistic
4. Draw conclusion & summarize significance
Errors in Hypothesis Tests
Hypothesis Testing for m
• Continuous outcome
• 1 Sample
H0: m=m0
H1: m>m0, m<m0, m≠m0
Test Statistic
n>30 (Find critical
value in Table 1C,
n<30 Table 2, df=n-1)
ns/
μ-X
Z
0
=
ns/
μ-X
t
0
=
Example 7.2.
Hypothesis Testing for m
The National Center for Health Statistics (NCHS)
reports the mean total cholesterol for adults is 203. Is
the mean total cholesterol in Framingham Heart
Study participants significantly different?
In 3310 participants the mean is 200.3 with a standard
deviation of 36.8.
Example 7.2.
Hypothesis Test ...
Lecture 3 Measures of Central Tendency and Dispersion.pptxshakirRahman10
Objectives:
Define measures of central tendency (mean, median, and mode)
Define measures of dispersion (variance and standard deviation).
Compute the measures of central tendency and Dispersion.
Learn the application of mean and standard deviation using Empirical rule and Tchebyshev’s theorem.
Measures of Central Tendency:
A measure of the central tendency is a value about which the observations tend to cluster.
In other words it is a value around which a data set is centered.
The three most common measures of central tendency are mean, median and mode.
A measure of the central tendency is a value about which the observations tend to cluster.
In other words it is a value around which a data set is centered.
The three most common measures of central tendency are mean, median and mode.
A measure of the central tendency is a value about which the observations tend to cluster.
In other words it is a value around which a data set is centered.
The three most common measures of central tendency are mean, median and mode.
A measure of the central tendency is a value about which the observations tend to cluster.
In other words it is a value around which a data set is centered.
The three most common measures of central tendency are mean, median and mode.
Why is it needed?
To summarize the data.
It provides with a typical value that gives the picture of the entire data set
Mean:
It is the arithmetic average of a set of numbers, It is the most common measure of central tendency.
Computed by summing all values in the data set and dividing the sum by the number of values in the data set Properties:
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set, including extreme values.
Formula:
Mean is calculated by adding all values in the data set and dividing the sum by the number of values in the data set.
Median:
Mid-point or Middle value of the data when the measurements are arranged in ascending order.
A point that divides the data into two equal parts.
Computational Procedure:
Arrange the observations in an ascending order.
If there is an odd number of terms, the median is the middle value and If there is an even number of terms, the median is the average of the middle two terms.
Mode:
The mode is the observation that occurs most frequently in the data set.
There can be more than one mode for a data set OR there maybe no mode in a data set.
Is also applicable to the nominal data.
Comparison of Measures of Central Tendency in Positively Skewed Distributions:
Majority of the data values fall to the left of the mean and cluster at the lower end of the distribution: the tail is to the right Mean, median & mode are different When a distribution has a few extremely high scores, the mean will have a greater value than the median = positively skewed.
Majority of the data values fall to
the right of the mean and cluster at the upper end of the distribution= Negatively Skewed
Chapter 7
Hypothesis Testing Procedures
Learning Objectives
• Define null and research hypothesis, test
statistic, level of significance and decision rule
• Distinguish between Type I and Type II errors
and discuss the implications of each
• Explain the difference between one- and two-
sided tests of hypothesis
Learning Objectives
• Estimate and interpret p-values
• Explain the relationship between confidence interval
estimates and p-values in drawing inferences
• Perform analysis of variance by hand
• Appropriately interpret the results of analysis of
variance tests
• Distinguish between one and two factor analysis of
variance tests
Learning Objectives
• Perform chi-square tests by hand
• Appropriately interpret the results of chi-square tests
• Identify the appropriate hypothesis testing procedures
based on type of outcome variable and number of
samples
Hypothesis Testing
• Research hypothesis is generated about
unknown population parameter
• Sample data are analyzed and determined to
support or refute the research hypothesis
Hypothesis Testing Procedures
Step 1
Null hypothesis (H0):
No difference, no change
Research hypothesis (H1):
What investigator
believes to be true
Hypothesis Testing Procedures
Step 2
Collect sample data and determine whether sample
data support research hypothesis or not.
For example, in test for m, evaluate .
X
Hypothesis Testing Procedures
Step 3
• Set up decision rule to decide when to believe null
versus research hypothesis
• Depends on level of significance, a = P(Reject H0|H0
is true)
Hypothesis Testing Procedures
Steps 4 and 5
• Summarize sample information in test statistic (e.g.,
Z value)
• Draw conclusion by comparing test statistic to
decision rule. Provide final assessment as to whether
H1 is likely true given the observed data.
P-values
• P-values represent the exact significance of the
data
• Estimate p-values when rejecting H0 to
summarize significance of the data (can
approximate with statistical tables, can get
exact value with statistical computing
package)
• P-value is the smallest a where we still reject
H0
Hypothesis Testing Procedures
1. Set up null and research hypotheses, select a
2. Select test statistic
2. Set up decision rule
3. Compute test statistic
4. Draw conclusion & summarize significance
Errors in Hypothesis Tests
Hypothesis Testing for m
• Continuous outcome
• 1 Sample
H0: m=m0
H1: m>m0, m<m0, m≠m0
Test Statistic
n>30 (Find critical
value in Table 1C,
n<30 Table 2, df=n-1)
ns/
μ-X
Z
0
=
ns/
μ-X
t
0
=
Example 7.2.
Hypothesis Testing for m
The National Center for Health Statistics (NCHS)
reports the mean total cholesterol for adults is 203. Is
the mean total cholesterol in Framingham Heart
Study participants significantly different?
In 3310 participants the mean is 200.3 with a standard
deviation of 36.8.
Example 7.2.
Hypothesis Test ...
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
2. Text book
• Probability & Statistics for Engineers &
Scientists, Ronald E. Walpole, 9th edition 2012,
Pearson
Dr Yehya Mesalam
3. Brief list of Course topics
1. Introduction to statistics and data analysis.
2. Introduction to probability theory.
3. Random variables and probability distributions.
4. Mathematical Expectation
5. Some discrete probability distribution.
6. Some continuous probability distribution.
7. Functions of Random Variables
8. Fundamental sampling distributions and data
descriptions.
Dr Yehya Mesalam
4. Evaluation Scheme
• Midterm Exam 40
• Report
• Activities
• Final exam 40
• Total 100 =100%
60 = 60%
40 = 40%
4
Dr Yehya Mesalam
20
6. What is Statistics?
• Statistics is the science of collecting,
organizing, summarizing, and analyzing
information to draw conclusions or answer
questions.
• Statistics is a way to get information from data.
It is the science of uncertainty.
6
Dr Yehya Mesalam
7. Steps of Statistical Practice
• Preparation: Set clearly defined goals, questions of
interests for the investigation
• Data collection: Make a plan of which data to collect
and how to collect it
• Data analysis: Apply appropriate statistical methods
to extract information from the data
• Data interpretation: Interpret the information and
draw conclusions
7
Dr Yehya Mesalam
8. Statistical Methods
• Descriptive statistics include the collection,
presentation and description of numerical data .
• Inferential statistics include making inference,
decisions by the appropriate statistical methods by
using the collected data.
• Model building includes developing prediction
equations to understand a complex system.
8
Dr Yehya Mesalam
9. Descriptive Statistics
• Descriptive statistics involves the arrangement,
summary, and presentation of data, to enable
meaningful interpretation, and to support decision
making.
• Descriptive statistics methods make use of
– graphical techniques
– numerical descriptive measures.
• The methods presented apply both to
– the entire population
– the sample
9
Dr Yehya Mesalam
10. Basic Definitions
• Population: The collection of all items of interest in a
particular study.
• Variable: A characteristic of interest about each
element of a population or sample.
• Statistic: A descriptive measure of a sample
• Sample: A set of data drawn from the population;
a subset of the population available for observation
• Parameter: A descriptive measure of the population,
e.g., mean
10
Dr Yehya Mesalam
11. Collecting Data
• Target Population: The population about
which we want to draw inferences.
• Sampled Population: The actual population
from which the sample has been taken.
11
Dr Yehya Mesalam
13. Types of data - examples
Numerical data
Age - income
55 75000
42 68000
. .
. .
Weight gain
+10
+5
.
.
Nominal
Person Marital status
1 married
2 single
3 single
. .
. .
Computer Brand
1 IBM
2 Dell
3 IBM
. .
. .
13
Dr Yehya Mesalam
14. 14
Types of data - examples
Numerical data
Age - income
55 75000
42 68000
. .
. .
Nominal data
A descriptive statistic
for nominal data is
the proportion
of data that falls into
each category.
IBM Dell Compaq Other Total
25 11 8 6 50
50% 22% 16% 12%
Weight gain
+10
+5
.
.
14
Dr Yehya Mesalam
16. Types of Variables
•Qualitative variables (what, which type…)
measure a quality or characteristic on each
experimental unit. (categorical data)
•Examples:
•Hair color (black, brown, blonde…)
•Make of car (Dodge, Honda, Ford…)
•Gender (male, female)
•State of birth (Iowa, Arizona,….)
16
Dr Yehya Mesalam
17. Types of Variables
•Quantitative variables (How big, how
many) measure a numerical quantity on each
experimental unit. (denoted by x)
Discrete if it can assume only a finite or
countable number of values.
Continuous if it can assume the infinitely
many values corresponding to the points
on a line interval.
17
Dr Yehya Mesalam
18. Graphing Qualitative Variables
• Use a data distribution to describe:
– What values of the variable have been measured
– How often each value has occurred
• “How often” can be measured 3 ways:
–Frequency
–Relative frequency = Frequency/n
–Percent frequency = Relative frequency* 100
18
Dr Yehya Mesalam
19. Example
• A bag contains 25 colored balls:
• Raw Data:
• Statistical Table:
Color Tally Frequency Relative
Frequency
Percent
Red 3 3/25 = .12 12%
Blue 6 6/25 = .24 24%
Green 4 4/25 = .16 16%
Orange 5 5/25 = .20 20%
Brown 3 3/25 = .12 12%
Yellow 4 4/25 = .16 16%
m
m
m
m m
m
m m
m
m
m
m m
m
m m
m
m
m
m
m
m
m
m
m
m
m
m
m
m m
m m
m m m
m m m
m m
m m
m
m
m
m
m m
m
19
Dr Yehya Mesalam
20. Graphs
Bar Chart
Pie Chart
Angle=
Relative Frequency times 360
Color
Frequency
Green
Orange
Blue
Red
Yellow
Brown
6
5
4
3
2
1
0
16.0%
Green
20.0%
Orange
24.0%
Blue
12.0%
Red
16.0%
Yellow
12.0%
Brown
Pareto Chart
20
Dr Yehya Mesalam
21. A sample of 30 persons who often consume donuts
were asked what variety of donuts was their favourite.
The responses from these 30 persons were as follows:
glazed filled other plain glazed other
frosted filled filled glazed other frosted
glazed plain other glazed glazed filled
frosted plain other other frosted filled
filled other frosted glazed glazed filled
Construct a frequency distribution table for these data.
Example
Dr Yehya Mesalam 21
24. Graphical Presentation of Qualitative Data
A graph made of bars whose heights represent the
frequencies of respective categories is called a bar
graph.
Dr Yehya Mesalam 24
25. Graphical Presentation of Qualitative Data
A circle divided into portions that represent the
relative frequencies or percentages of a population
or a sample belonging to different categories is
called a pie chart.
Dr Yehya Mesalam 25
30. Dot Plot
30
Dr Yehya Mesalam
Draw the dot plot for the following data, then calculate
the mean, median, and mode
0.86, 0.49, 0.46, 0.52, 0.62, 0.79, 0.75, 0.47, 0.26, 0.43
Mean Calculation:-
65
.
5
.....
6
5
4
3
2
1
x
x
x
x
x
x
x
x
x n
565
.
0
10
65
.
5
n
x
x
31. Dot Plot
31
Dr Yehya Mesalam
Rearrange the data
n=10 Median Order =5&6
0.26, 0.43, 0.46, 0.47, 0.49, 0.52, 0.62, 0.75, 0.79, 0.86
Median =( 0.49+0.52)/2 = 0.505
Mode No Mode
Median Calculation:-
32. Stem and Leaf Displays
In a stem-and-leaf display of quantitative data, each
value is divided into two portions – a stem and a leaf.
The leaves for each stem are shown separately in a
display.
Dr Yehya Mesalam 32
33. Example
The following are the scores of 30 college students
on a statistics test:
Construct a stem-and-leaf display.
75
69
83
52
72
84
80
81
77
96
61
64
65
76
71
79
86
87
71
79
72
87
68
92
93
50
57
95
92
98
Dr Yehya Mesalam 33
37. Example
The following data give the monthly rents paid by a
sample of 30 households selected from a small town.
Construct a stem-and-leaf display for these data.
880
1210
1151
1081
985
630
721
1231
1175
1075
932
952
1023
850
1100
775
825
1140
1235
1000
750
750
915
1140
965
1191
1370
960
1035
1280
Dr Yehya Mesalam 37
42. Mean
The mean for ungrouped data is obtained by dividing the
sum of all values by the number of values in the data set. Thus,
Mean for population data:
Mean for sample data:
Where
is the sum of all values; N is the population size;
n is the sample size;
is the population mean;
is the sample mean.
N
x
n
x
x
x
x
Dr Yehya Mesalam 42
43. Mean
1. Most common measure of central tendency
2. Acts as ‘balance point’
3. Affected by extreme values (‘outliers’)
4. Denoted where
x
x
n
x x x
n
i
i
n
n
1 1 2
…
x
43
Dr Yehya Mesalam
44. Example
Find the mean of cash donations made by these
eight Persons.
63
,
26
,
315
,
21
,
63
,
110
,
199
,
319
8
7
6
5
4
3
2
1 x
x
x
x
x
x
x
x
x
million
n
x
x 5
.
139
$
5
.
139
8
1116
1116
63
26
315
21
63
110
199
319
Solution
Dr Yehya Mesalam 44
45. Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
x
x
n
x x x x x x
i
i
n
1 1 2 3 4 5 6
6
10 3 4 9 8 9 11 7 6 3 7 7
6
8 30
. . . . . .
.
45
Dr Yehya Mesalam
46. Median
1. Measure of central tendency
2. Middle value in ordered sequence
• If n is odd, middle value of sequence
• If n is even, average of 2 middle values
3. Position of median in sequence
4. Not affected by extreme values
Order
n 1
2
Order
n
2
n
2
, +1
n is odd
n is even
46
Dr Yehya Mesalam
47. Median Example
• Raw Data: 24.1 22.6 21.5 23.7 22.6
• Ordered: 21.5 22.6 22.6 23.7 24.1
• Position: 1 2 3 4 5
Positioning Point
Median
n 1
2
5 1
2
3 0
22 6
.
.
47
Dr Yehya Mesalam
48. Median Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position: 1 2 3 4 5 6
Positioning Point
Median
n
2
6
2
3 4
7 7 8 9
2
8 30
,
. .
.
48
Dr Yehya Mesalam
49. Mode
1. Measure of central tendency
2. Value that occurs most often
3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data
49
Dr Yehya Mesalam
50. Mode Example
• No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• One Mode
Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9
• More Than 1 Mode
Raw Data: 21 28 28 41 43 43
50
Dr Yehya Mesalam
51. Range
1. Measure of dispersion
2. Difference between largest & smallest
observations
Range= R = Max Value – Min Value
3. Ignores how data are distributed
7 8 9 10 7 8 9 10
Range = 10 – 7 = 3 Range = 10 – 7 = 3
51
Dr Yehya Mesalam
52. Variance & Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4 6 10 12
x = 8.3
4. Show variation about mean (x or μ)
8
52
Dr Yehya Mesalam
54. Variance and Standard Deviation
Basic Formulas for the Variance and Standard Deviation for
Ungrouped Data
where σ² is the population variance, s² is the sample
variance, σ is the population standard deviation, and s
is the sample standard deviation.
1
and
1
and
2
2
2
2
2
2
n
x
x
s
N
x
n
x
x
s
N
x
Dr Yehya Mesalam 54
55. Variance and Standard Deviation
Short-cut Formulas for the Variance and Standard Deviation for
Ungrouped Data
where σ² is the population variance, s² is the sample variance, σ
is the population standard deviation, and s is the sample
standard deviation.
)
1
(
and
)
1
(
and
2
2
2
2
2
2
2
2
2
2
n
n
x
x
n
s
N
N
x
x
n
n
x
x
n
s
N
N
x
x
Dr Yehya Mesalam 55
56. Variance Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
s
x x
n
x
x
n
s
i
i
n
i
i
n
2
2
1 1
2
2 2 2
1
8 3
10 3 8 3 4 9 8 3 7 7 8 3
6 1
6 368
( )
( ) ( ) ( )
where .
. . . . . .
.
…
56
Dr Yehya Mesalam
57. Variance Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
57
Dr Yehya Mesalam
8
.
49
x 18
.
445
2
x
n= 6
)
1
(
2
2
2
n
n
x
x
n
s
368
.
6
5
*
6
)
8
.
49
(
18
.
445
*
6 2
2
s
523
.
2
368
.
6
s
58. Summary of Variation Measures
Measure Formula Description
Range XMax – XMin Total Spread
Standard Deviation
(Sample)
Dispersion about
Sample Mean
Standard Deviation
(Population)
Dispersion about
Population Mean
Variance
(Sample)
Squared Dispersion
about Sample Mean
xi x
2
i1
n
n 1
xi µx
2
i1
n
N
xi x
2
i1
n
n 1
58
Dr Yehya Mesalam
59. A box and whisker plot also called a box plot
displays the five number summary of a set of data.
The five number summary is
• The minimum value
• First quartile (Q1)
• Median,
• Third quartile (Q3)
• The maximum value
Box-and-whisker Plot
Dr Yehya Mesalam 59
60. Lower quartile Upper quartile
Median
minimum maximum
Box-and-whisker Plot
In a box plot, we draw a box from the first quartile to
the third quartile. A vertical line goes through the box
at the median. The whiskers go from each quartile to
the minimum or maximum.
Dr Yehya Mesalam 60
62. 1. Order the test scores from least to greatest
2. Find the median of the test scores.
78, 85,88, 88, 89, 90,92
88
3. Find Find the quartiles.
The first quartile (Q1) is the median of the data
points to the left of the median.
Solution
85
The third quartile (Q3) is the median of the data
points to the right of the median
90
Dr Yehya Mesalam 62
63. 4. Complete the five-number summary by
finding the min and the max.
Solution
Min = 78
Max = 92
Q1 Q3
Median
Min Maz
78 85 88 88 89 90 92
Dr Yehya Mesalam 63
64. Use the given data to make a box-and-whisker plot.
31, 23, 33, 35, 26, 24, 31, 29
Example
Dr Yehya Mesalam 64
65. Order the data from least to greatest. Then find the
minimum, lower quartile, median, upper quartile, and
maximum.
minimum: 23
maximum: 35
lower quartile: = 25
24 + 26
2
upper quartile: = 32
31 + 33
2
median: = 30
29 + 31
2
23 24 26 29 31 31 33 35
Solution
Dr Yehya Mesalam 65
66. Draw a number line and plot a point above each value.
Draw the box and whiskers.
23 24 26 29 31 31 33 35
22 24 26 28 30 32 34 36 38
Solution
Dr Yehya Mesalam 66
67. Frequency Histograms
• Divide the range of the data into 5-12
subintervals of equal length.
• Calculate the approximate width of the
subinterval as Range/number of subintervals.
• Round the approximate width up to a
convenient value.
• Sturges rule K= 1+3.3log (N).
• Create a statistical table including the
subintervals, their frequencies and relative
frequencies.
67
Dr Yehya Mesalam
69. • Using 7 equal intervals with the lowest starting at
30, compute the mean, and the variance using short-
cut method.
• calculate mode and median (analytically and
graphically)
• Estimate the value below which 75% of the values
fall.
Example
69
Dr Yehya Mesalam
74. Solution
• Determine the Min value = 31
• Determine the Max value = 99
• Calculate the range = Max – Min
• But the starting point is given 30
• use Min = 30
• Range = 99 – 30 = 69
• Interval Length C = Range / No. of intervals
C= 69 / 7 = 9.85 =10
74
Dr Yehya Mesalam
94. Solution
L.L U.L L. B U. B
30 39 29.5 39.5
40 49 39.5 49.5
50 59 49.5 59.5
60 69 59.5 69.5
70 79 69.5 79.5
80 89 79.5 89.5
90 99 89.5 99.5
94
L.L & U.L is the Lower Limit & upper limit for the class
L.B & U.B is the Lower boundary& upper boundary for the class
The graph of histogram
must be on the boundaries
not on the limits
L.B for class i=
L.L i +U.L i-1
2
U.B for class i=
L.L i+1 + U.L i
2
L.B 1 = (30+29)/2 =29.5
U.B 1 = (39+40)/2 =39.5
L.B 2 = (40+39)/2 =39.5
Then L.B i = U.B i-1 or U.B i = L.B i+1
Dr Yehya Mesalam
103. Median
Class
limit
fi F
30-39 11 11
40-49 12 23
50-59 16 39
60-69 23 62
70-79 17 79
80-89 11 90
90-99 10 100
100
med
med
med
f
F
n
C
L
X
1
~
2
*
60-0.5= 59.5
10
Dr Yehya Mesalam
104. Median
Class
limit
fi Less
than F
30-39 11 11
40-49 12 23
50-59 16 39
60-69 23 62
70-79 17 79
80-89 11 90
90-99 10 100
100
med
med
med
f
F
n
C
L
X
1
~
2
*
60-0.5= 59.5
10
Dr Yehya Mesalam
105. Median
Class
limit
fi Less
than F
30-39 11 11
40-49 12 23
50-59 16 39
60-69 23 62
70-79 17 79
80-89 11 90
90-99 10 100
100
med
med
med
f
F
n
C
L
X
1
~
2
*
60-0.5= 59.5
10
Dr Yehya Mesalam
106. Median
Class
limit
fi Less
than F
30-39 11 11
40-49 12 23
50-59 16 39
60-69 23 62
70-79 17 79
80-89 11 90
90-99 10 100
100
med
med
med
f
F
n
C
L
X
1
~
2
*
60-0.5= 59.5
Median =59.5+10(50-39)/23 =
64.28
10
Dr Yehya Mesalam
115. O-Gives ( Less Than & More than)
Lower
Boundary
Less
Than
More
Than
29.5 0 100
39.5 11
49.5 23
59.5 39
69.5 62
79.5 79
89.5 90
99.5 100
11
Dr Yehya Mesalam
116. O-Gives ( Less Than & More than)
Lower
Boundary
Less
Than
More
Than
29.5 0 100
39.5 11 89
49.5 23
59.5 39
69.5 62
79.5 79
89.5 90
99.5 100
11
Dr Yehya Mesalam
117. O-Gives ( Less Than & More than)
Lower
Boundary
Less
Than
More
Than
29.5 0 100
39.5 11 89
49.5 23 77
59.5 39 61
69.5 62 38
79.5 79 21
89.5 90 10
99.5 100 0
M than =n- L than
M than +L than =n
11
Dr Yehya Mesalam
118. O-Gives ( Less Than & More than)
Lower
Boundary
Less
Than
More
Than
29.5 0 100
39.5 11 89
49.5 23 77
59.5 39 61
69.5 62 38
79.5 79 21
89.5 90 10
99.5 100 0
M than +L than =n
11
Dr Yehya Mesalam
119. O-Gives ( Less Than & More than)
Lower
Boundary
Less
Than
More
Than
29.5 0 100
39.5 11 89
49.5 23 77
59.5 39 61
69.5 62 38
79.5 79 21
89.5 90 10
99.5 100 0
0
10
20
30
40
50
60
70
80
90
100
110
29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Cum.
Frequency
Lower Boundary
O-Gives
More Than
Less Than
11
Dr Yehya Mesalam
120. O-Gives ( Less Than & More than)
Lower
Boundary
Less
Than
More
Than
29.5 0 100
39.5 11 89
49.5 23 77
59.5 39 61
69.5 62 38
79.5 79 21
89.5 90 10
99.5 100 0
0
10
20
30
40
50
60
70
80
90
100
110
29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Cum.
Frequency
Lower Boundary
O-Gives
More Than
Less Than
12
Dr Yehya Mesalam
121. O-Gives ( Less Than & More than)
Lower
Boundary
Less
Than
More
Than
29.5 0 100
39.5 11 89
49.5 23 77
59.5 39 61
69.5 62 38
79.5 79 21
89.5 90 10
99.5 100 0
0
10
20
30
40
50
60
70
80
90
100
110
29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Cum.
Frequency
Lower Boundary
O-Gives
More Than
Less Than
Mediam at n=50
Median
12
Dr Yehya Mesalam
122. O-Gives ( Less Than & More than)
0
10
20
30
40
50
60
70
80
90
100
110
29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Cum.
Frequency
Lower Boundary
O-Gives
More Than
Less Than
12
Estimate the value below which 75% of the values fall.
n= 100 100%
? 75%
Then at frequency value =75
draw horizontal line cuts
Less Than and More Than
then determine the required
value
75% of the sample obtained
more ( above) the value 51
75% of the sample obtained
less (blew)the value 77 51
77
Dr Yehya Mesalam
123. Short Cut Method
n
d
f
C
X
X i
i
0
)
1
(
)
( 2
2
2
2
n
n
d
f
d
f
n
C
S
i
i
i
i
12
Dr Yehya Mesalam
124. L.L U.L f F f relative d f*d f *d2
30 39 11 11 0.11 -3 -33
40 49 12 23 0.12 -2 -24
50 59 16 39 0.16 -1 -16
60 69 23 62 0.23 0 0
70 79 17 79 0.17 1 17
80 89 11 90 0.11 2 22
90 99 10 100 0.1 3 30
Sum 100 1 -4
Mean (X ) = 64.1
Variance (S2 ) 317.010101
S.D (s) 17.80477748
C.V 0.277765639
X
s
CV
12
Short Cut Method
n
d
f
C
X
X i
i
0
X = 64.5+ 10 (-4/100)=64.1
Dr Yehya Mesalam
125. L.L U.L f F f relative d f*d f*d2
30 39 11 11 0.11 -3 -33 99
40 49 12 23 0.12 -2 -24 48
50 59 16 39 0.16 -1 -16 16
60 69 23 62 0.23 0 0 0
70 79 17 79 0.17 1 17 17
80 89 11 90 0.11 2 22 44
90 99 10 100 0.1 3 30 90
Sum
100 1 -4 314
Mean (X ) = 64.1
Variance (S2 ) 317.010101
S.D (s) 17.80477748
C.V 0.277765639
125
Short Cut Method
S2 = 102 *[(100*314-(-4)2 )/(100*99)]
=317.010101
)
1
(
)
( 2
2
2
2
n
n
d
f
d
f
n
C
S
i
i
i
i
Dr Yehya Mesalam
126. L.L U.L f F f relative d f*d f *d2
30 39 11 11 0.11 0 0
40 49 12 23 0.12 1 12
50 59 16 39 0.16 2 32
60 69 23 62 0.23 3 69
70 79 17 79 0.17 4 68
80 89 11 90 0.11 5 55
90 99 10 100 0.1 6 60
Sum 100 1 296
Mean (X ) = 64.1
Variance (S2 ) 317.010101
S.D (s) 17.80477748
C.V 0.277765639
126
Short Cut Method
n
d
f
C
X
X i
i
0
X = 34.5+ 10 (296/100)=64.1
Dr Yehya Mesalam
127. Example
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
50 - 8
- 10
- 34 0
- 14
- 10
-
- 119 65 16
U
L X
X i
x i
f F i
d i
i d
f i
i d
f 2
C
120
C
127
Dr Yehya Mesalam
128. Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
50 - 8
- 10
- 34 0
- 14
- 10
-
- 119 65 16
U
L X
X i
x i
f F i
d i
i d
f i
i d
f 2
C
120= 50+7C Then C=10
128
Solution
Dr Yehya Mesalam
129. Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
50 - 59 54.5 8 -2
60 - 69 64.5 10 -1
70 - 79 74.5 34 0
80 - 89 84.5 1 14
90 - 99 94.5 10 2
100 - 109 104.5 3
110 - 119 114.5 65 4 16
U
L X
X i
x i
f F i
d i
i d
f i
i d
f 2
129
Solution
Dr Yehya Mesalam
130. Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
50 - 59 54.5 8 -2
60 - 69 64.5 10 -1
70 - 79 74.5 34 0
80 - 89 84.5 14 1 14
90 - 99 94.5 10 2
100 - 109 104.5 3
110 - 119 114.5 1 65 4 16
U
L X
X i
x i
f F i
d i
i d
f i
i d
f 2
130
Solution
Dr Yehya Mesalam
131. Solution
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
50 - 59 54.5 8 8 -2
60 - 69 64.5 10 18 -1
70 - 79 74.5 16 34 0
80 - 89 84.5 14 48 1 14
90 - 99 94.5 10 58 2
100 - 109 104.5 6 64 3
110 - 119 114.5 1 65 4 16
U
L X
X i
x i
f F i
d i
i d
f i
i d
f 2
131
Dr Yehya Mesalam
132. Solution
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
50 - 59 54.5 8 8 -2 -16 32
60 - 69 64.5 10 18 -1 -10 10
70 - 79 74.5 16 34 0 0 0
80 - 89 84.5 14 48 1 14 14
90 - 99 94.5 10 58 2 20 40
100 - 109 104.5 6 64 3 18 54
110 - 119 114.5 1 65 4 4 16
U
L X
X i
x i
f F i
d i
i d
f i
i d
f 2
30 166
132
Dr Yehya Mesalam
133. Solution
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
50 - 59 54.5 8 8 -2 -16 32
60 - 69 64.5 10 18 -1 -10 10
70 - 79 74.5 16 34 0 0 0
80 - 89 84.5 14 48 1 14 14
90 - 99 94.5 10 58 2 20 40
100 - 109 104.5 6 64 3 18 54
110 - 119 114.5 1 65 4 4 16
U
L X
X i
x i
f F i
d i
i d
f i
i d
f 2
30 166
133
Dr Yehya Mesalam
134. Short Cut Method
n
d
f
C
X
X i
i
0
)
1
(
)
( 2
2
2
2
n
n
d
f
d
f
n
C
S
i
i
i
i
Mean = 74.5 + 10 ( 30/65) = 79.11
Variance = (10)2 [65*166 – (30)2 ] / [65*64 ] = 237.74
S.D = (237.74) 0.5 =15.41
134
Dr Yehya Mesalam
135. Shape
1. Describes how data are distributed
2. Measures of Shape
• Skew = Symmetry
Right-Skewed
Left-Skewed Symmetric
Mean = Median
Mean Median Median Mean
135
Dr Yehya Mesalam
137. Moment
• Coefficient of Skewness 1
3
2
3
1
m
m
0
1
0
1
0
1
Normal Distribution Skewness to Right
Skewness to Left
Right-Skewed
Median Mean
Left-Skewed
Mean Median
Symmetric
Mean = Median
137
See page 38
Dr Yehya Mesalam
138. Moment
• Coefficient of Kurtosis 2
2
2
4
2
m
m
3
2
3
2
3
2
Normal Distribution
Leptokurtic Platykurtic
Symmetric
138
Dr Yehya Mesalam
139. Example
• From the given graph, complete the following tables, draw the histogram
and polygon, determine the mode and median graphically, and calculate
the mean, median, mode, variance, standard deviation, and coefficient
of variation
Class limits
i
x i
f r
f
139
Dr Yehya Mesalam
140. Solution
•
Class
limits
d fd
12 - 16 14 6 6/80 -3 -18
17 - 21 19 8 8/80 -2 -16
22 - 26 24 14 14/80 -1 -14
27 - 31 29 24 24/80 0 0
32 - 36 34 14 14/80 1 14
37 - 41 39 8 8/80 2 16
52 - 46 44 6 6/80 3 18
sum 80 1 0
i
x i
f r
f
X = 29+ 5(0/80)=29
140
Dr Yehya Mesalam
141. Example
• From the given graph, complete the following tables, draw the histogram
and polygon, determine the mode and median graphically, and calculate
the mean, median, mode, variance, standard deviation, and coefficient
of variation
Class limits
i
x i
f r
f
141
Dr Yehya Mesalam
142. Example
• Complete the table, compute the mean, variance, , and mode and
median analytical and graphical
Class limit Frequency
Relative
frequency
Boundaries
Cumulative
frequency
? - ? ? ? More than ? 100
20 - ? ? ? More than 19.95 92
? - ? 17 ? More than ? ?
? - ? ? ? More than ? 46
? - ? ? 0.12 More than 37.95 ?
? - ? ? ? More than ? 5
? ? More than ? ?
142
Dr Yehya Mesalam
143. Solution
•
Class limit Frequency
Relative
frequency
Boundaries
Cumulative
frequency
14 - 19.9 8 0.08 More than 13.95 100
20 - 25.9 29 0.29 More than 19.95 92
26 - 31.9 17 0.17 More than 25.95 63
32 - 37.9 29 0.29 More than 31.95 46
38 - 43.9 12 0.12 More than 37.95 17
44 - 49.9 5 0.05 More than 43.95 5
100 1 More than 49.95 0
143
Dr Yehya Mesalam
144. Solution
L.L U.L f d F d f d2
14 19.9 8 -3 -24 72
20 25.9 29 -2 -58 116
26 31.9 17 -1 -17 17
32 37.9 29 0 0 0
38 43.9 12 1 12 12
44 49.9 5 2 10 20
Sum 100 -77 237
mean X 30.33
Variance S2 64.62181818
S.D 8.038769693
144
Dr Yehya Mesalam