SlideShare a Scribd company logo
1 of 38
Outliers
• An outlier is an extremely high or an extremely low
data value when compared with the rest of the data
values.
• Procedure for Identifying Outliers
• Step 1 Arrange the data in order and find Q1 and Q3.
• Step 2 Find the interquartile range: IQR = Q3 - Q1.
• Step 3 Multiply the IQR by 1.5. = IQR *1.5
• Step 4 Subtract the value obtained in step 3 from Q1
and add the value to Q3.
• Step 5 Check the data set for any data value that is
smaller than Q1 - 1.5*(IQR) or larger than Q3 +
1.5*(IQR).
• Check the following data set for outliers. 5, 6,
12, 13, 15, 18, 22, 50
• Solution
• The data value 50 is extremely suspect. These
are the steps in checking for an outlier.
• Step 1 Find Q1 and Q3.
• Q1 is 9 and Q3 is 20.
• Step 2 Find the interquartile range (IQR),
which is Q3 - Q1.
• IQR = (Q3 - Q1 ) =20 - 9 =11
• Step 3 Multiply this value by 1.5.
• 1.5 * (11) = 16.5
• Step 4 Subtract the value obtained in step 3
from Q1, and add the value in step 3 to Q3.
• 9 - 16.5 = -7.5 and
• 20 + 16.5 =36.5
• Step 5 Check the data set for any data values
that fall outside the interval from - 7.5 to
36.5.
• The value 50 is outside this interval; hence, it
can be considered an outlier.
• Reasons why outliers may occur.
1. The data value may have resulted from a
measurement or observational error. The
variable is measured incorrectly.
2. The data value may have resulted from a
recording error.
3. The data value may have been obtained from
a subject that is not in the defined population
4. The data value might be a legitimate value
that occurred by chance (although the
probability is extremely small).
Exploratory Data Analysis (EDA),
• Here we organize date use a stem and leaf plot.
• The measure of central tendency used in EDA is the
median. The measure of variation used in EDA is the
interquartile range Q3 Q1.
• In EDA the data are represented graphically using a
boxplot (sometimes called a box-and-whisker plot).
• The purpose of exploratory data analysis is to examine
data to find out what information can be discovered
about the data such as the center and the spread.
Stem and Leaf Plots
• The stem and leaf plot is a method of organizing
data and is a combination of sorting and graphing.
• It has the advantage over a grouped frequency
distribution of retaining the actual data while
showing them in graphical form.
• A stem and leaf plot is a data plot that uses part of
the data value as the stem and part of the data
value as the leaf to form groups or classes.
Example
• At an outpatient testing center, the number of
cardiograms performed each day for 20 days is
shown. Construct a stem and leaf plot for the
data.
• 25 31 20 32 13
• 14 43 02 57 23
• 36 32 33 32 44
• 32 52 44 51 45
• Step 1
• Arrange the data in order:
• 02, 13, 14, 20, 23, 25, 31, 32, 32, 32,
• 32, 33, 36, 43, 44, 44, 45, 51, 52, 57
• Step 2
• Separate the data according to the first digit.
• 02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36
• 43, 44, 44, 45 51, 52, 57
• Step 3
• display can be made by using the leading digit as the stem
and the trailing digit as the leaf.
• For example, for the value 32, the leading digit, 3, is the
stem and the trailing digit, 2, is the leaf. For the value 14,
the 1 is the stem and the 4 is the leaf.
The plot also shows that the testing center treated from a minimum of 2 patients to a
maximum of 57 patients in any one day. If there are no data values in a class, you
should write the stem number and leave the leaf row blank. Do not put a zero in the
leaf row leaf row.
• Leading Trailing
• digit (stem) digit (leaf )
• 0 2
• 1 3 4
• 2 0 3 5
• 3 1 2 2 2 2 3 6
• 4 3 4 4 5
• 5 1 2 7
Boxplot
• Boxplot is a graph of a data set obtained by
drawing a horizontal line from the minimum
data value to Q1, drawing a horizontal line
from Q3 to the maximum data value, and
drawing a box whose vertical sides pass
through Q1 and Q3 with a vertical line inside
the box passing through the median or Q2.
A boxplot can be used to graphically represent the data set.
These plots involve five specific values:
1. The lowest value of the data set (i.e., minimum)
2. Q1
3. The median
4. Q3
5. The highest value of the data set (i.e., maximum)
These values are called a five-number summary of the data
set.
Procedure for constructing a boxplot
• 1. Find the five-number summary for the data
values, that is, the maximum and minimum data
values, Q1 and Q3, and the median.
• 2. Draw a horizontal axis with a scale such that it
includes the maximum and minimum data values.
• 3. Draw a box whose vertical sides go through Q1
and Q3, and draw a vertical line though the
median.
• 4. Draw a line from the minimum data value to
the left side of the box and a line from the
maximum data value to the right side of the box.
Min Q1 M Q3 Max
smallest
value
largest
value
Boxplot
First, Second and Third Quartiles
(Second Quartile is the Median, M)
[ ] *
Outlier
Lower
Fence
Upper
Fence
Smallest Data Value > Lower Fence Largest Data Value < Upper Fence
(Min unless min is an outlier) (Max unless max is an outlier)
Five-number summary
• Step 5 Draw a scale for the data on the x axis.
• Step 6 Located the lowest value, Q1, median, Q3,
and the highest value on the scale.
• Step 7 Draw a box around Q1 and Q3, draw a
vertical line through the median, and connect the
upper value and the lower value to the box.
Information Obtained from a Boxplot
• If the median is near the center of the box and
each horizontal line is of approximately equal
length, then the distribution is roughly symmetric
• If the median is to the left of the center of the box
or the right line is substantially longer than the
left line, then the distribution is skewed right
• If the median is to the right of the center of the
box or the left line is substantially longer than the
right line, then the distribution is skewed left
Why Use a Boxplot?
• A boxplot provides an alternative to a histogram, a dotplot, and a stem-and-
leaf plot. Among the advantages of a boxplot over a histogram are ease of
construction and convenient handling of outliers.
• In addition, the construction of a boxplot does not involve subjective
judgements, as does a histogram. That is, two individuals will construct the
same boxplot for a given set of data - which is not necessarily true of a
histogram, because the number of classes and the class endpoints must be
chosen. On the other hand, the boxplot lacks the details the histogram
provides.
• Dotplots and stemplots retain the identity of the individual observations; a
boxplot does not. Many sets of data are more suitable for display as
boxplots than as a stemplot. A boxplot as well as a stemplot are useful for
making side-by-side comparisons.
Example 1
Consumer Reports did a study of ice cream bars (sigh, only vanilla flavored)
in their August 1989 issue?
Construct a boxplot for the data above.
342 377 319 353 295 234 294 286
377 182 310 439 111 201 182 197
209 147 190 151 131 151
Example 1 - Answer
Q1 = 182 Q2 = 221.5 Q3 = 319
Min = 111 Max = 439 Range = 328
IQR = 137 UF = 524.5 LF = -23.5
Calories
100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500
Example 2
The weights of 20 randomly selected juniors at MSHS are recorded below:
a) Construct a boxplot of the data
b) Determine if there are any mild or extreme outliers.
121 126 130 132 143 137 141 144 148 205
125 128 131 133 135 139 141 147 153 213
Example 2 - Answer
Q1 = 130.5 Q2 = 138 Q3 = 145.5
Min = 121 Max = 213 Range = 92
IQR = 15 UF = 168 LF = 108
Weight
100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260
*
*
Extreme Outliers
( > 3 IQR from Q3)
Example 3
The following are the scores of 12 members of a woman’s golf team in
tournament play:
a) Construct a boxplot of the data.
b) Are there any mild or extreme outliers?
c) Find the mean and standard deviation.
d) Based on the mean and median describe the distribution?
89 90 87 95 86 81
111 108 83 88 91 79
Example 3 - Answer
Q1 = 84.5 Q2 = 88.5 Q3 = 93
Min = 79 Max = 111 Range = 32
IQR = 18.5 UF = 120.75 LF = 56.75
78 81 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126
Golf Scores
No Outliers
Mean= 90.67 St Dev = 9.85
Distribution appears to be skewed right (mean > median and long whisker)
Example 4
Comparative Boxplots: The scores of 18 first year college women on the
Survey of Study Habits and Attitudes (this psychological test measures
motivation, study habits and attitudes toward school) are given below:
The college also administered the test to 20 first-year college men. There
scores are also given:
Compare the two distributions by constructing boxplots. Are there any
outliers in either group? Are there any noticeable differences or
similarities between the two groups?
154 109 137 115 152 140 154 178 101
103 126 126 137 165 165 129 200 148
108 140 114 91 180 115 126 92 169 146
109 132 75 88 113 151 70 115 187 104
Example 4 - Answer
Q1 = 126 98 Q2 = 138.5 114.5 Q3 = 154 143
Min = 101 70 Max = 200 187 Range = 99 117
IQR = 28 45 UF = 196 210.5 LF = 59 30.5
60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220
Comparing Men and Women Study Habits and Attitudes
Women
Men
*
Women’s median is greater and they have less variability (spread) in their scores;
the women’s distribution is more symmetric while the men’s is skewed right.
Women have an outlier; while the men do not.
Probability
• definition : chance of an event occurring.
• It is the base of inferential statistics.
Probability experiments
• A probability experiment is a chance process that leads to well-
defined results called outcomes.
• An outcome is the result of a single trial of a probability experiment.
• A sample space is the set of all possible outcomes of a probability
experiment.
• Some sample spaces for various probability experiments are shown
here.
• Experiment Sample space
• Toss one coin Head, tail
• Roll a die 1, 2, 3, 4, 5, 6
• Answer a true/false question True, false
• Toss two coins Head-head, tail-tail, head-tail,
• tail-head
Gender of Children
• Find the sample space for the gender of the
children if a family has three children. Use
• B for boy and G for girl.
• Solution
• There are two genders, male and female, and
each child could be either gender.
• Hence, there are eight possibilities, as shown
here. : BBB BBG BGB GBB GGG GGB GBG BGG
A tree diagram
• A tree diagram is a device consisting of line segments
issued from a starting point and also from the outcome
point. It is used to determine all possible outcomes of a
probability experiment.
Simple and compound events
• An outcome : the result of a single trial of a
probability experiment.
• An event consists of a set of outcomes of a
probability experiment.
An event can be one outcome or more than one outcome.
For example, if a die is rolled and a 6 shows, this result is
called an outcome, since it is a result of a single trial.
An event with one outcome is called a simple event.
The event of getting an odd number when a die is rolled is
called a compound event, since it consists of three
outcomes or three simple events (1, 3, 5)
A compound event consists of two or more outcomes or
simple events.
There are three basic interpretations of probability:
1. Classical probability
2. Empirical or relative frequency probability
3. Subjective probability
Classical probability uses sample spaces to
determine the numerical probability that an
event will happen.
Classical probability assumes that all outcomes in
the sample space are equally likely to occur.
Equally likely events are events that have the same
probability of occurring. when a single die is
rolled, each outcome has the same probability of
occurring.
Drawing Cards
Find the probability of getting a red ace when a card is drawn at random
from an ordinary deck of cards.
Solution
Since there are 52 cards and there are 2 red aces, namely, the ace of
hearts and the ace of diamonds, P(red ace) 2 /52 = 26.
Example 4–6 Gender of Children
If a family has three children, find the probability that two of the three
children are girls.
Solution
The sample space for the gender of the children for a family that has
three children has eight outcomes, that is, BBB, BBG, BGB, GBB,
GGG, GGB, GBG, and BGG.
Since there are three ways to have two girls, namely, GGB, GBG, and
BGG, P(two girls) = 3/8
Probability rules
• Probability Rule 1
• The probability of any event E is a number (either a fraction or
decimal) between and including 0 and 1. This is denoted by 0 ≤
P(E) ≤ 1. Rule 1 states that probabilities cannot be negative or
greater than 1.
• Probability Rule 2
• If an event E cannot occur (i.e., the event contains no members in
the sample space), its probability is 0.
• Rolling a Die
• When a single die is rolled, find the probability of getting a 9.
• Since the sample space is 1, 2, 3, 4, 5, and 6, it is impossible to get
a 9. Hence, the probability is P(9) = 0.
• Probability Rule 3
• If an event E is certain, then the probability of E is 1.
• Rolling a Die
• When a single die is rolled, what is the probability of getting a
number less than 7?
• Solution
• Since all outcomes—1, 2, 3, 4, 5, and 6—are less than 7, the
probability is P(number less than 7) = 6/6= 1
• The event of getting a number less than 7 is certain.
• In other words, probability values range from 0 to 1. When the
probability of an event is close to 0, its occurrence is highly
unlikely.
• When the probability of an event is near 0.5, there is about a 50-
50 chance that the event will occur; and when the probability of
an event is close to 1, the event is highly likely to occur.
• Probability Rule 4
• The sum of the probabilities of all the
outcomes in the sample space is 1.
• For example, in the roll of a fair die, each
outcome in the sample space has a probability
• of . Hence, the sum of the probabilities of the
outcomes is as shown.

More Related Content

What's hot

Statistics Math project class 10th
Statistics Math project class 10thStatistics Math project class 10th
Statistics Math project class 10thRiya Singh
 
Chapter 8 Measure of Dispersion of Data
Chapter 8 Measure of Dispersion of DataChapter 8 Measure of Dispersion of Data
Chapter 8 Measure of Dispersion of DataMISS ESTHER
 
Data presentation
Data presentationData presentation
Data presentationMaiBabes17
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing researchNursing Path
 
Chapter 3 260110 044503
Chapter 3 260110 044503Chapter 3 260110 044503
Chapter 3 260110 044503guest25d353
 
Data presentation Lecture
Data presentation Lecture Data presentation Lecture
Data presentation Lecture AB Rajar
 
Chapter 2
Chapter 2Chapter 2
Chapter 2Lem Lem
 
Methods of data presention
Methods of data presentionMethods of data presention
Methods of data presentionDr Vaibhav Gupta
 
Maths statistcs class 10
Maths statistcs class 10   Maths statistcs class 10
Maths statistcs class 10 Rc Os
 
Statistik Chapter 2
Statistik Chapter 2Statistik Chapter 2
Statistik Chapter 2WanBK Leo
 
Business Statistics Chapter 2
Business Statistics Chapter 2Business Statistics Chapter 2
Business Statistics Chapter 2Lux PP
 
Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE Smitha Sumod
 
Chapter03
Chapter03Chapter03
Chapter03rwmiller
 
Aed1222 lesson 6 2nd part
Aed1222 lesson 6 2nd partAed1222 lesson 6 2nd part
Aed1222 lesson 6 2nd partnurun2010
 
Elementary Statistics
Elementary Statistics Elementary Statistics
Elementary Statistics jennytuazon01630
 
MEASURES OF DISPERSION OF UNGROUPED DATA
MEASURES OF DISPERSION OF UNGROUPED DATAMEASURES OF DISPERSION OF UNGROUPED DATA
MEASURES OF DISPERSION OF UNGROUPED DATAMISS ESTHER
 
Statistics
StatisticsStatistics
Statisticsitutor
 

What's hot (20)

2.2 Histograms
2.2 Histograms2.2 Histograms
2.2 Histograms
 
Statistics Math project class 10th
Statistics Math project class 10thStatistics Math project class 10th
Statistics Math project class 10th
 
Chapter 8 Measure of Dispersion of Data
Chapter 8 Measure of Dispersion of DataChapter 8 Measure of Dispersion of Data
Chapter 8 Measure of Dispersion of Data
 
Data presentation
Data presentationData presentation
Data presentation
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing research
 
Chapter 3 260110 044503
Chapter 3 260110 044503Chapter 3 260110 044503
Chapter 3 260110 044503
 
Data presentation Lecture
Data presentation Lecture Data presentation Lecture
Data presentation Lecture
 
Histograms
HistogramsHistograms
Histograms
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Methods of data presention
Methods of data presentionMethods of data presention
Methods of data presention
 
Maths statistcs class 10
Maths statistcs class 10   Maths statistcs class 10
Maths statistcs class 10
 
Statistics
StatisticsStatistics
Statistics
 
Statistik Chapter 2
Statistik Chapter 2Statistik Chapter 2
Statistik Chapter 2
 
Business Statistics Chapter 2
Business Statistics Chapter 2Business Statistics Chapter 2
Business Statistics Chapter 2
 
Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE
 
Chapter03
Chapter03Chapter03
Chapter03
 
Aed1222 lesson 6 2nd part
Aed1222 lesson 6 2nd partAed1222 lesson 6 2nd part
Aed1222 lesson 6 2nd part
 
Elementary Statistics
Elementary Statistics Elementary Statistics
Elementary Statistics
 
MEASURES OF DISPERSION OF UNGROUPED DATA
MEASURES OF DISPERSION OF UNGROUPED DATAMEASURES OF DISPERSION OF UNGROUPED DATA
MEASURES OF DISPERSION OF UNGROUPED DATA
 
Statistics
StatisticsStatistics
Statistics
 

Similar to Revisionf2

box plot or whisker plot
box plot or whisker plotbox plot or whisker plot
box plot or whisker plotShubham Patel
 
lecture 1 Slides.pptx
lecture 1 Slides.pptxlecture 1 Slides.pptx
lecture 1 Slides.pptxSADAF53170
 
Measures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsMeasures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsLong Beach City College
 
3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysismlong24
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture PackShaun Cochrane
 
3. Descriptive statistics.pdf
3. Descriptive statistics.pdf3. Descriptive statistics.pdf
3. Descriptive statistics.pdfYomifDeksisaHerpa
 
Measures of-variation
Measures of-variationMeasures of-variation
Measures of-variationJhonna Barrosa
 
2. week 2 data presentation and organization
2. week 2 data presentation and organization2. week 2 data presentation and organization
2. week 2 data presentation and organizationrenz50
 
lecture 3 Slides.pptx
lecture 3 Slides.pptxlecture 3 Slides.pptx
lecture 3 Slides.pptxSADAF53170
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.pptfghgjd
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdfthaersyam
 
Rt graphical representation
Rt graphical representationRt graphical representation
Rt graphical representationRinchen
 
Statistics Slides.pdf
Statistics Slides.pdfStatistics Slides.pdf
Statistics Slides.pdfYasirAli74993
 
3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdfAmanuelDina
 
Biostatistics cource for clinical pharmacy
Biostatistics cource for clinical pharmacyBiostatistics cource for clinical pharmacy
Biostatistics cource for clinical pharmacyBatizemaryam
 
Dot Plots and Box Plots.pptx
Dot Plots and Box Plots.pptxDot Plots and Box Plots.pptx
Dot Plots and Box Plots.pptxVaishnaviElumalai
 
Biostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptxBiostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptxMohammedAbdela7
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
Statistics (Measures of Dispersion)
Statistics (Measures of Dispersion)Statistics (Measures of Dispersion)
Statistics (Measures of Dispersion)Ron_Eick
 

Similar to Revisionf2 (20)

box plot or whisker plot
box plot or whisker plotbox plot or whisker plot
box plot or whisker plot
 
lecture 1 Slides.pptx
lecture 1 Slides.pptxlecture 1 Slides.pptx
lecture 1 Slides.pptx
 
Measures of Relative Standing and Boxplots
Measures of Relative Standing and BoxplotsMeasures of Relative Standing and Boxplots
Measures of Relative Standing and Boxplots
 
3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
 
3. Descriptive statistics.pdf
3. Descriptive statistics.pdf3. Descriptive statistics.pdf
3. Descriptive statistics.pdf
 
outliers
outliersoutliers
outliers
 
Measures of-variation
Measures of-variationMeasures of-variation
Measures of-variation
 
2. week 2 data presentation and organization
2. week 2 data presentation and organization2. week 2 data presentation and organization
2. week 2 data presentation and organization
 
lecture 3 Slides.pptx
lecture 3 Slides.pptxlecture 3 Slides.pptx
lecture 3 Slides.pptx
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.ppt
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf
 
Rt graphical representation
Rt graphical representationRt graphical representation
Rt graphical representation
 
Statistics Slides.pdf
Statistics Slides.pdfStatistics Slides.pdf
Statistics Slides.pdf
 
3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf
 
Biostatistics cource for clinical pharmacy
Biostatistics cource for clinical pharmacyBiostatistics cource for clinical pharmacy
Biostatistics cource for clinical pharmacy
 
Dot Plots and Box Plots.pptx
Dot Plots and Box Plots.pptxDot Plots and Box Plots.pptx
Dot Plots and Box Plots.pptx
 
Biostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptxBiostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptx
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
Statistics (Measures of Dispersion)
Statistics (Measures of Dispersion)Statistics (Measures of Dispersion)
Statistics (Measures of Dispersion)
 

Recently uploaded

Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSĂ©rgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSĂ©rgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 

Recently uploaded (20)

Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 

Revisionf2

  • 1.
  • 2. Outliers • An outlier is an extremely high or an extremely low data value when compared with the rest of the data values. • Procedure for Identifying Outliers • Step 1 Arrange the data in order and find Q1 and Q3. • Step 2 Find the interquartile range: IQR = Q3 - Q1. • Step 3 Multiply the IQR by 1.5. = IQR *1.5 • Step 4 Subtract the value obtained in step 3 from Q1 and add the value to Q3. • Step 5 Check the data set for any data value that is smaller than Q1 - 1.5*(IQR) or larger than Q3 + 1.5*(IQR).
  • 3. • Check the following data set for outliers. 5, 6, 12, 13, 15, 18, 22, 50 • Solution • The data value 50 is extremely suspect. These are the steps in checking for an outlier. • Step 1 Find Q1 and Q3. • Q1 is 9 and Q3 is 20. • Step 2 Find the interquartile range (IQR), which is Q3 - Q1. • IQR = (Q3 - Q1 ) =20 - 9 =11
  • 4. • Step 3 Multiply this value by 1.5. • 1.5 * (11) = 16.5 • Step 4 Subtract the value obtained in step 3 from Q1, and add the value in step 3 to Q3. • 9 - 16.5 = -7.5 and • 20 + 16.5 =36.5 • Step 5 Check the data set for any data values that fall outside the interval from - 7.5 to 36.5. • The value 50 is outside this interval; hence, it can be considered an outlier.
  • 5. • Reasons why outliers may occur. 1. The data value may have resulted from a measurement or observational error. The variable is measured incorrectly. 2. The data value may have resulted from a recording error. 3. The data value may have been obtained from a subject that is not in the defined population 4. The data value might be a legitimate value that occurred by chance (although the probability is extremely small).
  • 6. Exploratory Data Analysis (EDA), • Here we organize date use a stem and leaf plot. • The measure of central tendency used in EDA is the median. The measure of variation used in EDA is the interquartile range Q3 Q1. • In EDA the data are represented graphically using a boxplot (sometimes called a box-and-whisker plot). • The purpose of exploratory data analysis is to examine data to find out what information can be discovered about the data such as the center and the spread.
  • 7. Stem and Leaf Plots • The stem and leaf plot is a method of organizing data and is a combination of sorting and graphing. • It has the advantage over a grouped frequency distribution of retaining the actual data while showing them in graphical form. • A stem and leaf plot is a data plot that uses part of the data value as the stem and part of the data value as the leaf to form groups or classes.
  • 8. Example • At an outpatient testing center, the number of cardiograms performed each day for 20 days is shown. Construct a stem and leaf plot for the data. • 25 31 20 32 13 • 14 43 02 57 23 • 36 32 33 32 44 • 32 52 44 51 45
  • 9. • Step 1 • Arrange the data in order: • 02, 13, 14, 20, 23, 25, 31, 32, 32, 32, • 32, 33, 36, 43, 44, 44, 45, 51, 52, 57 • Step 2 • Separate the data according to the first digit. • 02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36 • 43, 44, 44, 45 51, 52, 57 • Step 3 • display can be made by using the leading digit as the stem and the trailing digit as the leaf. • For example, for the value 32, the leading digit, 3, is the stem and the trailing digit, 2, is the leaf. For the value 14, the 1 is the stem and the 4 is the leaf.
  • 10. The plot also shows that the testing center treated from a minimum of 2 patients to a maximum of 57 patients in any one day. If there are no data values in a class, you should write the stem number and leave the leaf row blank. Do not put a zero in the leaf row leaf row. • Leading Trailing • digit (stem) digit (leaf ) • 0 2 • 1 3 4 • 2 0 3 5 • 3 1 2 2 2 2 3 6 • 4 3 4 4 5 • 5 1 2 7
  • 11. Boxplot • Boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1, drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing through the median or Q2.
  • 12. A boxplot can be used to graphically represent the data set. These plots involve five specific values: 1. The lowest value of the data set (i.e., minimum) 2. Q1 3. The median 4. Q3 5. The highest value of the data set (i.e., maximum) These values are called a five-number summary of the data set.
  • 13. Procedure for constructing a boxplot • 1. Find the five-number summary for the data values, that is, the maximum and minimum data values, Q1 and Q3, and the median. • 2. Draw a horizontal axis with a scale such that it includes the maximum and minimum data values. • 3. Draw a box whose vertical sides go through Q1 and Q3, and draw a vertical line though the median. • 4. Draw a line from the minimum data value to the left side of the box and a line from the maximum data value to the right side of the box.
  • 14. Min Q1 M Q3 Max smallest value largest value Boxplot First, Second and Third Quartiles (Second Quartile is the Median, M) [ ] * Outlier Lower Fence Upper Fence Smallest Data Value > Lower Fence Largest Data Value < Upper Fence (Min unless min is an outlier) (Max unless max is an outlier) Five-number summary
  • 15. • Step 5 Draw a scale for the data on the x axis. • Step 6 Located the lowest value, Q1, median, Q3, and the highest value on the scale. • Step 7 Draw a box around Q1 and Q3, draw a vertical line through the median, and connect the upper value and the lower value to the box.
  • 16. Information Obtained from a Boxplot • If the median is near the center of the box and each horizontal line is of approximately equal length, then the distribution is roughly symmetric • If the median is to the left of the center of the box or the right line is substantially longer than the left line, then the distribution is skewed right • If the median is to the right of the center of the box or the left line is substantially longer than the right line, then the distribution is skewed left
  • 17. Why Use a Boxplot? • A boxplot provides an alternative to a histogram, a dotplot, and a stem-and- leaf plot. Among the advantages of a boxplot over a histogram are ease of construction and convenient handling of outliers. • In addition, the construction of a boxplot does not involve subjective judgements, as does a histogram. That is, two individuals will construct the same boxplot for a given set of data - which is not necessarily true of a histogram, because the number of classes and the class endpoints must be chosen. On the other hand, the boxplot lacks the details the histogram provides. • Dotplots and stemplots retain the identity of the individual observations; a boxplot does not. Many sets of data are more suitable for display as boxplots than as a stemplot. A boxplot as well as a stemplot are useful for making side-by-side comparisons.
  • 18. Example 1 Consumer Reports did a study of ice cream bars (sigh, only vanilla flavored) in their August 1989 issue? Construct a boxplot for the data above. 342 377 319 353 295 234 294 286 377 182 310 439 111 201 182 197 209 147 190 151 131 151
  • 19. Example 1 - Answer Q1 = 182 Q2 = 221.5 Q3 = 319 Min = 111 Max = 439 Range = 328 IQR = 137 UF = 524.5 LF = -23.5 Calories 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500
  • 20. Example 2 The weights of 20 randomly selected juniors at MSHS are recorded below: a) Construct a boxplot of the data b) Determine if there are any mild or extreme outliers. 121 126 130 132 143 137 141 144 148 205 125 128 131 133 135 139 141 147 153 213
  • 21. Example 2 - Answer Q1 = 130.5 Q2 = 138 Q3 = 145.5 Min = 121 Max = 213 Range = 92 IQR = 15 UF = 168 LF = 108 Weight 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 * * Extreme Outliers ( > 3 IQR from Q3)
  • 22. Example 3 The following are the scores of 12 members of a woman’s golf team in tournament play: a) Construct a boxplot of the data. b) Are there any mild or extreme outliers? c) Find the mean and standard deviation. d) Based on the mean and median describe the distribution? 89 90 87 95 86 81 111 108 83 88 91 79
  • 23. Example 3 - Answer Q1 = 84.5 Q2 = 88.5 Q3 = 93 Min = 79 Max = 111 Range = 32 IQR = 18.5 UF = 120.75 LF = 56.75 78 81 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126 Golf Scores No Outliers Mean= 90.67 St Dev = 9.85 Distribution appears to be skewed right (mean > median and long whisker)
  • 24. Example 4 Comparative Boxplots: The scores of 18 first year college women on the Survey of Study Habits and Attitudes (this psychological test measures motivation, study habits and attitudes toward school) are given below: The college also administered the test to 20 first-year college men. There scores are also given: Compare the two distributions by constructing boxplots. Are there any outliers in either group? Are there any noticeable differences or similarities between the two groups? 154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148 108 140 114 91 180 115 126 92 169 146 109 132 75 88 113 151 70 115 187 104
  • 25. Example 4 - Answer Q1 = 126 98 Q2 = 138.5 114.5 Q3 = 154 143 Min = 101 70 Max = 200 187 Range = 99 117 IQR = 28 45 UF = 196 210.5 LF = 59 30.5 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 Comparing Men and Women Study Habits and Attitudes Women Men * Women’s median is greater and they have less variability (spread) in their scores; the women’s distribution is more symmetric while the men’s is skewed right. Women have an outlier; while the men do not.
  • 26. Probability • definition : chance of an event occurring. • It is the base of inferential statistics.
  • 27. Probability experiments • A probability experiment is a chance process that leads to well- defined results called outcomes. • An outcome is the result of a single trial of a probability experiment. • A sample space is the set of all possible outcomes of a probability experiment. • Some sample spaces for various probability experiments are shown here. • Experiment Sample space • Toss one coin Head, tail • Roll a die 1, 2, 3, 4, 5, 6 • Answer a true/false question True, false • Toss two coins Head-head, tail-tail, head-tail, • tail-head
  • 28. Gender of Children • Find the sample space for the gender of the children if a family has three children. Use • B for boy and G for girl. • Solution • There are two genders, male and female, and each child could be either gender. • Hence, there are eight possibilities, as shown here. : BBB BBG BGB GBB GGG GGB GBG BGG
  • 29. A tree diagram • A tree diagram is a device consisting of line segments issued from a starting point and also from the outcome point. It is used to determine all possible outcomes of a probability experiment.
  • 30. Simple and compound events • An outcome : the result of a single trial of a probability experiment. • An event consists of a set of outcomes of a probability experiment.
  • 31. An event can be one outcome or more than one outcome. For example, if a die is rolled and a 6 shows, this result is called an outcome, since it is a result of a single trial. An event with one outcome is called a simple event. The event of getting an odd number when a die is rolled is called a compound event, since it consists of three outcomes or three simple events (1, 3, 5) A compound event consists of two or more outcomes or simple events.
  • 32. There are three basic interpretations of probability: 1. Classical probability 2. Empirical or relative frequency probability 3. Subjective probability
  • 33. Classical probability uses sample spaces to determine the numerical probability that an event will happen. Classical probability assumes that all outcomes in the sample space are equally likely to occur. Equally likely events are events that have the same probability of occurring. when a single die is rolled, each outcome has the same probability of occurring.
  • 34.
  • 35. Drawing Cards Find the probability of getting a red ace when a card is drawn at random from an ordinary deck of cards. Solution Since there are 52 cards and there are 2 red aces, namely, the ace of hearts and the ace of diamonds, P(red ace) 2 /52 = 26. Example 4–6 Gender of Children If a family has three children, find the probability that two of the three children are girls. Solution The sample space for the gender of the children for a family that has three children has eight outcomes, that is, BBB, BBG, BGB, GBB, GGG, GGB, GBG, and BGG. Since there are three ways to have two girls, namely, GGB, GBG, and BGG, P(two girls) = 3/8
  • 36. Probability rules • Probability Rule 1 • The probability of any event E is a number (either a fraction or decimal) between and including 0 and 1. This is denoted by 0 ≤ P(E) ≤ 1. Rule 1 states that probabilities cannot be negative or greater than 1. • Probability Rule 2 • If an event E cannot occur (i.e., the event contains no members in the sample space), its probability is 0. • Rolling a Die • When a single die is rolled, find the probability of getting a 9. • Since the sample space is 1, 2, 3, 4, 5, and 6, it is impossible to get a 9. Hence, the probability is P(9) = 0.
  • 37. • Probability Rule 3 • If an event E is certain, then the probability of E is 1. • Rolling a Die • When a single die is rolled, what is the probability of getting a number less than 7? • Solution • Since all outcomes—1, 2, 3, 4, 5, and 6—are less than 7, the probability is P(number less than 7) = 6/6= 1 • The event of getting a number less than 7 is certain. • In other words, probability values range from 0 to 1. When the probability of an event is close to 0, its occurrence is highly unlikely. • When the probability of an event is near 0.5, there is about a 50- 50 chance that the event will occur; and when the probability of an event is close to 1, the event is highly likely to occur.
  • 38. • Probability Rule 4 • The sum of the probabilities of all the outcomes in the sample space is 1. • For example, in the roll of a fair die, each outcome in the sample space has a probability • of . Hence, the sum of the probabilities of the outcomes is as shown.