SlideShare a Scribd company logo
1 of 97
Download to read offline
Statistical Hydrology
MAL1303/MKAG1273
Graphical Data Analysis
Dr. Shamsuddin Shahid
Associate Professor
Department of Hydraulics and Hydrology
Faculty of Civil Engineering
Room No.: M46-332;
Phone: 07-5531624; Mobile: 0182051586
Email: sshahid@utm.my
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• One measure of absolute skewness is difference between
mean and mode. A measure of such would not be true
meaningful because it depends of the units of
measurement.
• The simplest measure of skewness is the Pearson’s
coefficient of skewness:
Skewness
deviationStandard
Mode-Mean
skewnessoftcoefficiensPearson' 
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Skewness coefficient varies between -3.o to +3.0.
• There is no acceptable range of skewness to measure the
distribution of data.
• Some people says that rule of thumb is -1 to +1 being
acceptable (-2 to +2 is often used too) for normal
distribution.
Skewness
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
 Kurtosis measures how peaked the histogram is
 The kurtosis of a normal distribution is 0 (zero)
 Kurtosis characterizes the relative peakedness or flatness of a
distribution compared to the normal distribution
3
)(
4
4




ns
xx
kurtosis
n
i
i
Kurtosis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Platykurtic– When the kurtosis < 0, the frequencies throughout the
curve are closer to be equal (i.e., the curve is more flat and wide).
Thus, negative kurtosis indicates a relatively flat distribution
• Leptokurtic– When the kurtosis > 0, there are high frequencies in
only a small part of the curve (i.e, the curve is more peaked). Thus,
positive kurtosis indicates a relatively peaked distribution
• Kurtosis is based on the size of a distribution's tails. Negative
kurtosis (platykurtic) – distributions with short tails. Positive
kurtosis (leptokurtic) – distributions with relatively long tails
leptokurticplatykurtic
Kurtosis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Coefficient of Kurtosis is the most important measure of kurtosis
which is based on the second and fourth moments :
Kurtosis
2
2
4
2


 
N
xxf
2
2
)( 

N
xxf
4
4
)( 

Where,
Second Momentum
Fourth Momentum
• If 2 -3 > 0, the distribution is leptokurtic.
• If , If 2 -3 < 0 the distribution is platykurtic.
• If , 2 -3 = 0 the distribution is mesokurtic (normal).
A kurtosis value of
+/-1 is considered
very good for most
uses, but +/-2 is
also usually
acceptable.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Chebyshev’s theorem
According to Chebyshev’s theorem,
At least of the measurements will fall within
[Mean – (k-1)*SD] to [Mean + (k-1)*SD], where K = 2
Empirical rule
Give a set of n measurements possessing a mound-shaped histogram,
then
the interval X  s contains approximately 68% of the measurements
the interval X  2s contains approximately 95% of the measurements
the interval X  3s contains approximately 99.7% of the measurements.
Chebyshev’s Rule
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Empirical rule
Give a set of n measurements possessing a mound-shaped histogram, then
the interval X  s contains approximately 68% of the measurements
the interval X  2s contains approximately 95% of the measurements
the interval X  3s contains approximately 99.7% of the measurements.
Chebyshev’s Rule
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Outlier
An outlier is an observation that lies an abnormal distance from other
values in a random sample from a population.
Outlier or an outlying observation, is one that appears to deviate
markedly from other members of the sample in which it occurs.
Outliers can have many anomalous causes:
• A physical apparatus for taking measurements may have suffered a
transient malfunction.
• There may have been an error in data transmission or transcription.
• Outliers arise due to changes in system behaviour, fraudulent
behaviour, human error, instrument error
• simply through natural deviations in populations.
• A sample may have been contaminated with elements from outside
the population being examined.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Identification of Outliers
There is no rigid mathematical definition of what constitutes an outlier.
Determining whether or not an observation is an outlier is ultimately a
subjective exercise.
Type 1 - Determine the outliers with no prior knowledge of the data. This is
essentially a learning approach. The approach processes the data as a
static distribution, pinpoints the most remote points, and flags them as
potential outliers.
Type 2 – Using model-based methods which assume that the data are from
a normal distribution, and identify observations which are deemed
"unlikely" based on mean and standard deviation.
• Chauvenet's criterion
• Grubbs' test
• Dixon's Q test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Chauvenet's criterion
• A value is measured experimentally in several trials as 9, 10, 10, 10,
11, and 50.
• The mean is 16.7 and the standard deviation 16.34.
• Value 50 differs from 16.7 by 33.3, slightly more than two standard
deviations.
• The probability of taking data more than two standard deviations from
the mean is roughly 0.05.
• Six measurements were taken, so the statistic value (data size
multiplied by the probability) is 0.05×6 = 0.3.
• Because 0.3 < 0.5, according to Chauvenet's criterion, the measured
value of 50 should be discarded (leaving a new mean of 10, with
standard deviation 0.7).
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Grubbs' test detects one outlier at a time.
Gcalculated > Gtable then reject the questionable point.
Grubbs' test
Example: 9, 10, 10, 10, 11, and 50
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Grubbs' test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
To apply a Q test, arrange the data in order of increasing values
and calculate Q as defined:
Where gap is the absolute difference between the outlier in
question and the closest number to it.
If Qcalculated > Qtable then reject the questionable point.
Dixon's Q test, or simply the Q test
Example: 9, 10, 10, 10, 11, and 50
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Dixon's Q test
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Common Characteristics of Water Resources Data
1. A lower bound of zero. No negative values are possible.
2. Presence of 'outliers‘ regularly occur, specially outliers on the
high side are more common in water resources.
3. Non-normal distribution of data
4. Positive skewness is common.
5. Data reported only as below or above some threshold
(censored data). Examples include concentrations below one or
more detection limits, annual flood above a level, etc.
6. Seasonal patterns. Values tend to be higher or lower in certain
seasons of the year.
7. Positive autocorrelation. Consecutive observations tend to be
strongly correlated with each other. High values tend to follow
high values and low values tend to follow low values.
8. Dependence on other uncontrolled variables. Water discharge
from a well highly depends on hydraulic conductivity, sediment
grain size, or some other variable.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Graphical Data Analysis
1. Data type
2. Mean, median and Mode
3. Data quality control
4. Outliers
5. Nature of Hydrological Data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
General Characteristics of Water Resources Data
1. A lower bound of zero. No negative values are possible.
2. Presence of 'outliers‘ regularly occur, specially outliers on the
high side are more common in water resources.
3. Non-normal distribution of data
4. Positive skewness is common.
5. Data reported only as below or above some threshold
(censored data). Examples include concentrations below one or
more detection limits, annual flood above a level, etc.
6. Seasonal patterns. Values tend to be higher or lower in certain
seasons of the year.
7. Positive autocorrelation. Consecutive observations tend to be
strongly correlated with each other. High values tend to follow
high values and low values tend to follow low values.
8. Dependence on other uncontrolled variables. Water discharge
from a well highly depends on hydraulic conductivity, sediment
grain size, or some other variable.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
1. Histogram
2. Scatter Plot
3. Box-Plot
4. Quantile Plot
5. Q-Q Plots
6. Enhancement of data presentation
7. Presentation of multivariate data.
Graphical Data Analysis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Exploratory Data Analysis (EDA) is an approach/philosophy for
data analysis that employs a variety of techniques (mostly
graphical) to maximize insight into a data set, uncover
underlying structure, extract important variables, detect outliers
and anomalies, test underlying assumptions, etc.
The EDA approach is an approach, not a set of techniques, but
an attitude/philosophy about how a data analysis should be
carried out.
EDA is a philosophy as to how we dissect a data set; what we
look for; how we look; and how we interpret.
Exploratory Data Analysis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data Summarization
A summary analysis is simply a numeric reduction of a data set. It is
quite passive. Quite commonly, its purpose is to simply arrive at a
few key statistics (for example, mean and standard deviation) which
may then either replace the data set or be added to the data set in
the form of a summary table.
Exploratory Data Analysis
In contrast, EDA has as its broadest goal the desire to gain insight
into the engineering/scientific process behind the data. EDA uses
the data to peer into the heart of the process that generated the
data. There is an archival role in the research for summary
statistics, but there is an enormously larger role for the EDA
approach.
Summarization and Exploratory data analysis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Exploratory data analysis
Exploratory data analysis mostly depends on graphical analysis.
The particular graphical techniques employed in EDA are often
quite simple, consisting of various techniques of:
Plotting the raw data (such as histograms, scatter plots, etc.)
Plotting simple statistics such as mean plots, standard deviation
plots, box plots, and main effects plots of the raw data.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data analysis and interpretation cannot be completely automated,
particularly when making crucial modeling choices. The analyst
must use judgment and make decisions that require familiarity
with the data, the site, and the questions that need to be
answered.
The analysis of data typically starts by plotting the data and
calculating statistics that describe important characteristics of
the sample.
It does little help if we just look at tabulated data. However, the
human eye can recognize patterns from graphical displays of the
data.
We perform such an exploratory analysis to:
1. familiarize ourselves with the data and
2. detect patterns of regularity.
Why Graphical Data Analysis?
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Why Graphical Data Analysis?
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Why Graphical Data Analysis?
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Summary statistics, medians and IQRs used in explanatory
data analysis are said to be resistant statistics. A resistant
statistic is relatively less affected by outliers than a
nonresistant statistic. The mean and standard deviation are
examples of nonresistant statistics.
Exploratory data analysis
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Distribution of Data: Histogram
• A histogram is used to graphically summarize the distribution of a data set
• A histogram divides the range of values in a data set into intervals
• Over each interval is placed a bar whose height represents the frequency of
data values in the interval.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Negatively Skewed Positively Skewed
Histogram from 50 years of annual average river discharge data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
PlatykurticLeptokurtic
For the normal distribution the kurtosis coefficient is 3. The normal distribution is said to be
mesokurtic. If a distribution has a relatively greater concentration of probability near the
mean than the normal distribution, the kurtosis coefficient will be greater than 3 and the
distribution is said to be leptokurtic. If a distribution has a relatively smaller concentration of
probability, the kurtosis coefficient will be less than 3 and the distribution is said to be
platykurtic.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Scatter Plot
A scatter plot is useful for studying the association between two
interval variables. It is a plot of the values of one variable against
the other.
Scatter plot can be used for:
• To suggest a relationship between the two variables, for instance
a linear or quadratic relation,
• It may help to identify patterns or clusters in the data.
• Inspect these plots may help to detect outliers.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
At each depth two data are collected: Temperature and Nitrogen Concentration.
We obtained two scatter plot:
(i) Depth vs. Temperature;
(ii) Depth vs. Nitrogen Concentration.
In the first graph, it is observed that temperature is increasing with depth, as a general
tendency. This corresponds to a positive association.
In the second graph, Nitrogen concentration decreasing with depth. This corresponds to a
negative association.
Scatter Plot and Data Association
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Scatter Plot and Data Association
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Scatter Plot and Data Pattern
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Scatter Plot and Outliers
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
A dot chart or dot plot is a statistical chart consisting of group of data
points plotted on a simple scale.
Dot Plot
• Dot plots are one of the simplest statistical plots, and are
suitable for small to moderate sized data sets.
• Dot plots are used for continuous, quantitative, univariate data.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Groundwater depth
(meter) at 12 locations
are given below:
9, 11, 18, 7, 12, 21, 15, 12,
23, 13, 12, 10
Use dot plot for EDA
Dot Plot
• They are useful for highlighting
clusters and gaps, as well as
outliers.
• Their other advantage is the
conservation of numerical
information.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Groundwater depth
(meter):
9, 11, 18, 7, 12, 21, 15, 13,
23, 14, 17, 10
Dot Plot • Data Distribution is often not clear.
• Can not be used for large data set
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
A stem-leaf plot is a technique for presenting
quantitative data in a graphical format.
Similar to a histogram, to assist in visualizing
the shape of a distribution.
A basic stem-leaf plot contains two columns
separated by a vertical line. The left column
contains the stems and the right column
contains the leaves.
Stem-leaf Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Groundwater Depth (ft)
at 16 locations:
39, 31, 18, 7, 22, 21, 35, 12,
23, 13, 23, 10, 41, 27, 24, 9
Stem-leaf Plot
Unlike histograms, stem-leaf plots
retain the original data to at least
two significant digits, and put the
data in order.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Stem-leaf Plot
Groundwater Depth (ft)
at 16 locations:
39, 31, 18, 7, 22, 21, 35, 12,
23, 13, 23, 10, 41, 27, 24, 9
Can break into more
than one group.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data:
9, 11, 18, 7, 12, 21, 15, 12, 23, 13,
12, 10
Dot Plot and Stem-leaf Plot: Comparison
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data:
9, 11, 18, 7, 12, 21, 15, 12,
23, 13, 12, 10
Dot Plot and Stem-leaf Plot: Comparison
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Data:
9, 11, 18, 7, 12, 21, 15, 13,
23, 14, 17, 10
Dot Plot and Stem-leaf Plot: Comparison
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Stem-leaf plots are useful for displaying the relative density and
shape of the data, giving the reader a quick overview of
distribution.
• They retain (most of) the raw numerical data, often with perfect
integrity.
• They are also useful for highlighting outliers and finding the
mode.
• However, stem and leaf plots are only useful for moderately
sized data sets (around 15-150 data points).
• With very small data sets a stem and leaf plot can be of little
use, as a reasonable number of data points are required to
establish definitive distribution properties.
Stem-leaf Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
A boxplot is a graph of a data set that depicts the five-number
summary in a visual way.
The Five-Number Summary of a data set consists of the five
values { min value, Q1, Q2, Q3, max value }:
1. the smallest observation,
2. lower quantile (Q1),
3. median (Q2),
4. upper quantile (Q3), and
5. largest observation.
Box Plot
• It is also useful in helping you compare data sets.
• It is also sometimes referred to as a box-and-whisker-plot.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Draw a horizontal measurement scale
• Place a rectangle above this axis: the left edge is the lower
fourth quartile(Q1), the right edge is the upper fourth quartile
(Q3)
• Place a vertical line inside the rectangle at the location of
median (Q2)
• Draw “whiskers” out from either end of the rectangle to the
smallest and largest observations that are not outliers
Constructing a Box Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Five point summary of river discharge values are given below:
Minimum discharge: 51.0 cumec
First quartile (Q1): 60.75 cumec
Median: 63.0 cumec
Third quartile (Q3): 65.0 cumec
Maximum Discharge: 70.0 cumec
Constructing a Box Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Simple Box Plot
In simple box-plot whisker is extended to maximum and minimum
data points.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
In standard box-plot whisker is extended to 1.5 times the height of the
box (1.5 times the interquartile range). Observations between one and
two steps from the box in either direction, if present, are plotted
individually with an asterisk ("outside values"). Observations farther
than two steps beyond the box, if present, are distinguished by plotting
them with a small circle ("far-out values").
Standard Box Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Truncated Box Plot
In truncated box-plot the whiskers are drawn only to the 90th and 10th
percentiles of the data set. The largest 10 percent and smallest 10
percent of the data are not shown. It is used only when the extreme 20
percent of data are not of interest.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Interpretation of Box Plot
Normal distribution,
symmetrical data
Right skewed
Left-skewed
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Information contain in a Box Plot
• Location and Spread
• More informative to use a comparative Box plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Significance of Box Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Interpretation of Box
Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Interpretation of Box
Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Interpretation of Box Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Interpretation of Box Plot
• Box plot can be used
for any size of data
set
• Only summary
values and spread
are visible
• Numerical
information of data
are lost.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Information Obtained from a Box Plot
• If the median is near the center of the box, the distribution is
approximately symmetric.
• If the median falls to the left of the center of the box, the distribution
is positively skewed.
• If the median falls to the right of the center, the distribution is
negatively skewed.
• If the whisker lines are about the same length, the distribution is
approximately symmetric.
• If the right whisker line is larger than the left whisker line, the
distribution is positively skewed.
• If the left whisker line is larger than the right whisker line, the
distribution is negatively skewed.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Quantile PlotQuantile Plot
To construct a quantile plot, cumulative frequency of sample data are
plotted against quantiles of the standardized theoretical distribution.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Quantile Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Quantile Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Quantile Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Symmetric Right Skewed
Left Skewed Uniform11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Quantile Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Q-Q Plot
Quantile-Quantile Plot (Q-Q Plot) is a graphical method for
diagnosing differences between the probability distribution and the
sampling distribution or comparing two sample distribution.
This is a scatterplot with the quantiles of the variable on the
horizontal axis and the expected normal scores on the vertical axis.
Q-Q plot can be two types:
1. Normal Q-Q plot: The normal Q-Q plot graphically compares the
distribution of a given variable to the normal distribution.
2. Q-Q plot: The Q-Q plot graphically compares the distribution of two
variables.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• A normal distribution is often a reasonable model for the data.
• Without inspecting the data, however, it is risky to assume a
normal distribution.
• There are a number of graphs that can be used to check the
deviations of the data from the normal distribution. A histogram is
an example of a graph that can be used to check normality. Here,
the histogram should reveal a bell shaped curve.
• The most useful tool for assessing normality is a quantile-
quantile or QQ plot.
• Q-Q plot is also a important graphical method for identify the
outliers.
• Q-Q plot can also be used to identify the shape of the data
distribution, skewness, etc.
Why normal Q-Q Plot?
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Normal QNormal Q--Q PlotsQ Plots
Construction
Order n observations from smallest to largest and given a rank (i)
according to its position.
Quantile values corresponding to each observation is calculated using
formula
Theoretical normal quantile values corresponding to each calculated
quantile values are obtained from normal distribution table.
Plot the pairs on a two-coordinate system
 x-axis: theoretical (distribution) quantiles
 y-axis: sample quantiles
Sample
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
QQ--Q PlotsQ Plots
Formulas used to calculate the quantiles:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Normal Q-Q Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
QQ--Q PlotsQ Plots
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Normal Q-Q Plot:
Weibull
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Normal Q-Q Plot: Comparison of Formulas
Probability Plot using Blom, Cunnane and Gringorten Formulas
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Normal Q-Q Plot: Comparison of Formulas
Probability Plot using Blom, Cunnane and Gringorten Formulas
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Normal Q-Q Plot:
Hazen
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Q-Q Plot using:
Weibull Formula
Blom, Cunnane and
Gringorten Formulas
Hazen Formula
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Interpretation of Normal Q-Q Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
All but a few points fall on a line. -
outliers in the data
Left end of pattern is below the
line; right end of pattern is
above the line. - Long tails at
both ends of the data
distribution.
Interpretation of
Normal Q-Q Plot
For the normal distribution:
68% within 1 SD of the mean
95% within 2 SDs
99.7% within 3 SDs
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Left end of pattern is above the
line; right end of pattern is below
the line. - Short tails at both ends
of the data distribution
Curved pattern with slope
increasing from left to right. -
Data distribution is skewed to
the right
Interpretation of
Normal Q-Q Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Interpretation of
Normal Q-Q Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Description of Point Pattern Possible Interpretation
all but a few points fall on a line outliers in the data
left end of pattern is below the line; right
end of pattern is above the line
long tails at both ends of
the data distribution
left end of pattern is above the line; right
end of pattern is below the line
short tails at both ends of
the data distribution
curved pattern with slope increasing from
left to right
data distribution is skewed
to the right
curved pattern with slope decreasing from
left to right
data distribution is skewed
to the left
staircase pattern (plateaus and gaps)
data have been rounded or
are discrete
Interpretation of Probability Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Q-Q Plot
Q-Q plot is similar to probability plot, except instead of comparing
one quantile with theoretical normal quantile, two quantiles are
compared.
It gives us an idea about dispersion of one set of observation with
other set.
It is possible to compare two sets of observation to make some
important interpretations.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Annual averaged rainfall for years (1960-
1980) at 15 stations are given in first column
(A). Annual averaged rainfall of same 15
stations for the years (1981-2000) are given in
second column (B). Use Q-Q plot to see the
changing pattern of rainfall over the time.
Q-Q Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Construction of Q-Q Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The Q-Q Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• If the two distributions being compared are identical, the Q-Q plot
follows the 45° line y = x.
• If the two distributions agree after linearly transforming the values
in one of the distributions, then the Q-Q plot follows some line, but
not necessarily the line y = x.
• If the general trend of the Q-Q plot is flatter than the line y = x, the
distribution plotted on the horizontal axis is more dispersed than
the distribution plotted on the vertical axis.
• Conversely, if the general trend of the Q-Q plot is steeper than the
line y = x, the distribution plotted on the vertical axis is more
dispersed than the distribution plotted on the horizontal axis.
• Q-Q plots are often arced, or "S" shaped, indicating that one of
the distributions is more skewed than the other, or that one of the
distributions has heavier tails than the other
The Q-Q Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Multivariate data analysis techniques are appropriate when more
than one response is measured.
• Multivariate data is somewhat difficult to accurately visualize
because of the multidimensional nature of the problem.
• In a multivariate approach, each response variable adds another
dimension to the analysis problem.
Some examples of graphical techniques used for easily displaying
multivariate data in two-dimensions:
Profile Plot
Area Plot
Kite Graph
Star Plot
Glyph Plot
Etc.
Graphical Representation of Multivariate Data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
A profile plot uses a series of vertical axes presented consecutively along the
base (x-axis) of the plot. Any number of response variables can be considered
with varying scales of measurement. The response variables are arranged
along the base of the Plot.
Profile Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Area Plot
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Representation of multivariate data
Multiple variables can be displayed
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Representation of multivariate data
Multiple variables can be displayed
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Representation of multivariate data
Multiple variables can be displayed
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Kite Graph
Four variables can be displayed
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Star Plot
Multiple variables can be displayed
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Glyph Plots
Color, size and type of marker, a whisker line with different length, angle,
and color are used to represent different variables
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Representation of Directional Data
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Statistical Hydrology
MAL1303/MKAG1273
Graphical Data Analysis
Dr. Shamsuddin Shahid
Associate Professor
Department of Hydraulics and Hydrology
Faculty of Civil Engineering
Room No.: M46-332;
Phone: 07-5531624; Mobile: 0182051586
Email: sshahid@utm.my
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

More Related Content

Viewers also liked

History of-silverlight-versions-and-its-features-CRB-Tech
History of-silverlight-versions-and-its-features-CRB-TechHistory of-silverlight-versions-and-its-features-CRB-Tech
History of-silverlight-versions-and-its-features-CRB-TechPooja Gaikwad
 
لو بتسوق .. 11 حاجة لازم تعرفها عن عربيتك ..
لو بتسوق .. 11 حاجة لازم تعرفها عن عربيتك ..لو بتسوق .. 11 حاجة لازم تعرفها عن عربيتك ..
لو بتسوق .. 11 حاجة لازم تعرفها عن عربيتك ..Alexandria University, Egypt
 
75 лет начала блокады Ленинграда
75 лет начала блокады Ленинграда75 лет начала блокады Ленинграда
75 лет начала блокады Ленинградаkatya bokunova
 
Consumer Demographic and lifestyle
Consumer Demographic and lifestyleConsumer Demographic and lifestyle
Consumer Demographic and lifestyleHarshal Verma
 
PRESENTATION_FIE2016
PRESENTATION_FIE2016PRESENTATION_FIE2016
PRESENTATION_FIE2016WAJID HUSSAIN
 
China and the South China Sea ACC
China and the South China Sea ACCChina and the South China Sea ACC
China and the South China Sea ACCEugenia Witherow
 
قال البابا تواضروس الثاني، بابا الإسكندرية وبطريرك الكرازة المرقسية
قال البابا تواضروس الثاني، بابا الإسكندرية وبطريرك الكرازة المرقسيةقال البابا تواضروس الثاني، بابا الإسكندرية وبطريرك الكرازة المرقسية
قال البابا تواضروس الثاني، بابا الإسكندرية وبطريرك الكرازة المرقسيةAlexandria University, Egypt
 

Viewers also liked (13)

History of-silverlight-versions-and-its-features-CRB-Tech
History of-silverlight-versions-and-its-features-CRB-TechHistory of-silverlight-versions-and-its-features-CRB-Tech
History of-silverlight-versions-and-its-features-CRB-Tech
 
لو بتسوق .. 11 حاجة لازم تعرفها عن عربيتك ..
لو بتسوق .. 11 حاجة لازم تعرفها عن عربيتك ..لو بتسوق .. 11 حاجة لازم تعرفها عن عربيتك ..
لو بتسوق .. 11 حاجة لازم تعرفها عن عربيتك ..
 
75 лет начала блокады Ленинграда
75 лет начала блокады Ленинграда75 лет начала блокады Ленинграда
75 лет начала блокады Ленинграда
 
Chair
ChairChair
Chair
 
Consumer Demographic and lifestyle
Consumer Demographic and lifestyleConsumer Demographic and lifestyle
Consumer Demographic and lifestyle
 
The game
The gameThe game
The game
 
PRESENTATION_FIE2016
PRESENTATION_FIE2016PRESENTATION_FIE2016
PRESENTATION_FIE2016
 
Mari cv - office
Mari cv - officeMari cv - office
Mari cv - office
 
China and the South China Sea ACC
China and the South China Sea ACCChina and the South China Sea ACC
China and the South China Sea ACC
 
Diapositivasss dva
Diapositivasss dvaDiapositivasss dva
Diapositivasss dva
 
قال البابا تواضروس الثاني، بابا الإسكندرية وبطريرك الكرازة المرقسية
قال البابا تواضروس الثاني، بابا الإسكندرية وبطريرك الكرازة المرقسيةقال البابا تواضروس الثاني، بابا الإسكندرية وبطريرك الكرازة المرقسية
قال البابا تواضروس الثاني، بابا الإسكندرية وبطريرك الكرازة المرقسية
 
2012 pe review__hyd_
2012 pe review__hyd_2012 pe review__hyd_
2012 pe review__hyd_
 
Otimização dos métodos de imagem
Otimização dos métodos de imagemOtimização dos métodos de imagem
Otimização dos métodos de imagem
 

Similar to Graphical Analysis of Statistical Hydrology Data

Shahid Lecture-9- MKAG1273
Shahid Lecture-9- MKAG1273Shahid Lecture-9- MKAG1273
Shahid Lecture-9- MKAG1273nchakori
 
Risk Based Monitoring in Clinical Trials.
Risk Based Monitoring in Clinical Trials.Risk Based Monitoring in Clinical Trials.
Risk Based Monitoring in Clinical Trials.ClinosolIndia
 
Towards the Validation of National Risk Assessments against Historical Observ...
Towards the Validation of National Risk Assessments against Historical Observ...Towards the Validation of National Risk Assessments against Historical Observ...
Towards the Validation of National Risk Assessments against Historical Observ...Global Risk Forum GRFDavos
 
Metrology & The Consequences of Bad Measurement Decisions
Metrology & The Consequences of Bad Measurement DecisionsMetrology & The Consequences of Bad Measurement Decisions
Metrology & The Consequences of Bad Measurement DecisionsRick Hogan
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...DataScienceConferenc1
 
Application of microbiological data
Application of microbiological dataApplication of microbiological data
Application of microbiological dataTim Sandle, Ph.D.
 
Cost utility analysis of interventions to return employees to work following ...
Cost utility analysis of interventions to return employees to work following ...Cost utility analysis of interventions to return employees to work following ...
Cost utility analysis of interventions to return employees to work following ...ScHARR HEDS
 
Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...
Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...
Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...IRJET Journal
 
QBD for Downstream Virus Filtration
QBD for Downstream Virus Filtration QBD for Downstream Virus Filtration
QBD for Downstream Virus Filtration Merck Life Sciences
 
QBD for Downstream Virus Filtration
QBD for Downstream Virus FiltrationQBD for Downstream Virus Filtration
QBD for Downstream Virus FiltrationMilliporeSigma
 
East ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehtaEast ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehtaCytel
 
Eugm 2012 mehta - future plans for east - 2012 eugm
Eugm 2012   mehta - future plans for east - 2012 eugmEugm 2012   mehta - future plans for east - 2012 eugm
Eugm 2012 mehta - future plans for east - 2012 eugmCytel USA
 
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
Risk And Uncertainty Analysis:  A Primer for Floodplain ManagersRisk And Uncertainty Analysis:  A Primer for Floodplain Managers
Risk And Uncertainty Analysis: A Primer for Floodplain ManagersMichael DePue
 
Variation Over Time (Short/Long Term Data)
Variation Over Time (Short/Long Term Data)Variation Over Time (Short/Long Term Data)
Variation Over Time (Short/Long Term Data)Matt Hansen
 
Weighted angiographic scoring model (w cto score)
Weighted angiographic scoring model (w cto score)Weighted angiographic scoring model (w cto score)
Weighted angiographic scoring model (w cto score)Ramachandra Barik
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...jemille6
 
Flaskdata.io automated monitoring for clinical trials
Flaskdata.io automated monitoring for clinical trialsFlaskdata.io automated monitoring for clinical trials
Flaskdata.io automated monitoring for clinical trialsFlaskdata.io
 
Basic QC Statistics - Improving Laboratory Performance Through Quality Contro...
Basic QC Statistics - Improving Laboratory Performance Through Quality Contro...Basic QC Statistics - Improving Laboratory Performance Through Quality Contro...
Basic QC Statistics - Improving Laboratory Performance Through Quality Contro...Randox
 

Similar to Graphical Analysis of Statistical Hydrology Data (20)

Shahid Lecture-9- MKAG1273
Shahid Lecture-9- MKAG1273Shahid Lecture-9- MKAG1273
Shahid Lecture-9- MKAG1273
 
Risk Based Monitoring in Clinical Trials.
Risk Based Monitoring in Clinical Trials.Risk Based Monitoring in Clinical Trials.
Risk Based Monitoring in Clinical Trials.
 
Towards the Validation of National Risk Assessments against Historical Observ...
Towards the Validation of National Risk Assessments against Historical Observ...Towards the Validation of National Risk Assessments against Historical Observ...
Towards the Validation of National Risk Assessments against Historical Observ...
 
Metrology & The Consequences of Bad Measurement Decisions
Metrology & The Consequences of Bad Measurement DecisionsMetrology & The Consequences of Bad Measurement Decisions
Metrology & The Consequences of Bad Measurement Decisions
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
 
Quality control chemical
Quality control chemicalQuality control chemical
Quality control chemical
 
Application of microbiological data
Application of microbiological dataApplication of microbiological data
Application of microbiological data
 
Cost utility analysis of interventions to return employees to work following ...
Cost utility analysis of interventions to return employees to work following ...Cost utility analysis of interventions to return employees to work following ...
Cost utility analysis of interventions to return employees to work following ...
 
heb_lab_talk_2015
heb_lab_talk_2015heb_lab_talk_2015
heb_lab_talk_2015
 
Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...
Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...
Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...
 
QBD for Downstream Virus Filtration
QBD for Downstream Virus Filtration QBD for Downstream Virus Filtration
QBD for Downstream Virus Filtration
 
QBD for Downstream Virus Filtration
QBD for Downstream Virus FiltrationQBD for Downstream Virus Filtration
QBD for Downstream Virus Filtration
 
East ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehtaEast ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehta
 
Eugm 2012 mehta - future plans for east - 2012 eugm
Eugm 2012   mehta - future plans for east - 2012 eugmEugm 2012   mehta - future plans for east - 2012 eugm
Eugm 2012 mehta - future plans for east - 2012 eugm
 
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
Risk And Uncertainty Analysis:  A Primer for Floodplain ManagersRisk And Uncertainty Analysis:  A Primer for Floodplain Managers
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
 
Variation Over Time (Short/Long Term Data)
Variation Over Time (Short/Long Term Data)Variation Over Time (Short/Long Term Data)
Variation Over Time (Short/Long Term Data)
 
Weighted angiographic scoring model (w cto score)
Weighted angiographic scoring model (w cto score)Weighted angiographic scoring model (w cto score)
Weighted angiographic scoring model (w cto score)
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...
 
Flaskdata.io automated monitoring for clinical trials
Flaskdata.io automated monitoring for clinical trialsFlaskdata.io automated monitoring for clinical trials
Flaskdata.io automated monitoring for clinical trials
 
Basic QC Statistics - Improving Laboratory Performance Through Quality Contro...
Basic QC Statistics - Improving Laboratory Performance Through Quality Contro...Basic QC Statistics - Improving Laboratory Performance Through Quality Contro...
Basic QC Statistics - Improving Laboratory Performance Through Quality Contro...
 

Recently uploaded

Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 

Recently uploaded (20)

Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 

Graphical Analysis of Statistical Hydrology Data

  • 1. Statistical Hydrology MAL1303/MKAG1273 Graphical Data Analysis Dr. Shamsuddin Shahid Associate Professor Department of Hydraulics and Hydrology Faculty of Civil Engineering Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: sshahid@utm.my 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 2. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 3. • One measure of absolute skewness is difference between mean and mode. A measure of such would not be true meaningful because it depends of the units of measurement. • The simplest measure of skewness is the Pearson’s coefficient of skewness: Skewness deviationStandard Mode-Mean skewnessoftcoefficiensPearson'  11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 4. • Skewness coefficient varies between -3.o to +3.0. • There is no acceptable range of skewness to measure the distribution of data. • Some people says that rule of thumb is -1 to +1 being acceptable (-2 to +2 is often used too) for normal distribution. Skewness 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 5.  Kurtosis measures how peaked the histogram is  The kurtosis of a normal distribution is 0 (zero)  Kurtosis characterizes the relative peakedness or flatness of a distribution compared to the normal distribution 3 )( 4 4     ns xx kurtosis n i i Kurtosis 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 6. • Platykurtic– When the kurtosis < 0, the frequencies throughout the curve are closer to be equal (i.e., the curve is more flat and wide). Thus, negative kurtosis indicates a relatively flat distribution • Leptokurtic– When the kurtosis > 0, there are high frequencies in only a small part of the curve (i.e, the curve is more peaked). Thus, positive kurtosis indicates a relatively peaked distribution • Kurtosis is based on the size of a distribution's tails. Negative kurtosis (platykurtic) – distributions with short tails. Positive kurtosis (leptokurtic) – distributions with relatively long tails leptokurticplatykurtic Kurtosis 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 7. Coefficient of Kurtosis is the most important measure of kurtosis which is based on the second and fourth moments : Kurtosis 2 2 4 2     N xxf 2 2 )(   N xxf 4 4 )(   Where, Second Momentum Fourth Momentum • If 2 -3 > 0, the distribution is leptokurtic. • If , If 2 -3 < 0 the distribution is platykurtic. • If , 2 -3 = 0 the distribution is mesokurtic (normal). A kurtosis value of +/-1 is considered very good for most uses, but +/-2 is also usually acceptable. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 8. Chebyshev’s theorem According to Chebyshev’s theorem, At least of the measurements will fall within [Mean – (k-1)*SD] to [Mean + (k-1)*SD], where K = 2 Empirical rule Give a set of n measurements possessing a mound-shaped histogram, then the interval X  s contains approximately 68% of the measurements the interval X  2s contains approximately 95% of the measurements the interval X  3s contains approximately 99.7% of the measurements. Chebyshev’s Rule 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 9. Empirical rule Give a set of n measurements possessing a mound-shaped histogram, then the interval X  s contains approximately 68% of the measurements the interval X  2s contains approximately 95% of the measurements the interval X  3s contains approximately 99.7% of the measurements. Chebyshev’s Rule 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 10. Outlier An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. Outlier or an outlying observation, is one that appears to deviate markedly from other members of the sample in which it occurs. Outliers can have many anomalous causes: • A physical apparatus for taking measurements may have suffered a transient malfunction. • There may have been an error in data transmission or transcription. • Outliers arise due to changes in system behaviour, fraudulent behaviour, human error, instrument error • simply through natural deviations in populations. • A sample may have been contaminated with elements from outside the population being examined. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 11. Identification of Outliers There is no rigid mathematical definition of what constitutes an outlier. Determining whether or not an observation is an outlier is ultimately a subjective exercise. Type 1 - Determine the outliers with no prior knowledge of the data. This is essentially a learning approach. The approach processes the data as a static distribution, pinpoints the most remote points, and flags them as potential outliers. Type 2 – Using model-based methods which assume that the data are from a normal distribution, and identify observations which are deemed "unlikely" based on mean and standard deviation. • Chauvenet's criterion • Grubbs' test • Dixon's Q test 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 12. Chauvenet's criterion • A value is measured experimentally in several trials as 9, 10, 10, 10, 11, and 50. • The mean is 16.7 and the standard deviation 16.34. • Value 50 differs from 16.7 by 33.3, slightly more than two standard deviations. • The probability of taking data more than two standard deviations from the mean is roughly 0.05. • Six measurements were taken, so the statistic value (data size multiplied by the probability) is 0.05×6 = 0.3. • Because 0.3 < 0.5, according to Chauvenet's criterion, the measured value of 50 should be discarded (leaving a new mean of 10, with standard deviation 0.7). 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 13. Grubbs' test detects one outlier at a time. Gcalculated > Gtable then reject the questionable point. Grubbs' test Example: 9, 10, 10, 10, 11, and 50 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 14. Grubbs' test 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 15. To apply a Q test, arrange the data in order of increasing values and calculate Q as defined: Where gap is the absolute difference between the outlier in question and the closest number to it. If Qcalculated > Qtable then reject the questionable point. Dixon's Q test, or simply the Q test Example: 9, 10, 10, 10, 11, and 50 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 16. Dixon's Q test 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 17. Common Characteristics of Water Resources Data 1. A lower bound of zero. No negative values are possible. 2. Presence of 'outliers‘ regularly occur, specially outliers on the high side are more common in water resources. 3. Non-normal distribution of data 4. Positive skewness is common. 5. Data reported only as below or above some threshold (censored data). Examples include concentrations below one or more detection limits, annual flood above a level, etc. 6. Seasonal patterns. Values tend to be higher or lower in certain seasons of the year. 7. Positive autocorrelation. Consecutive observations tend to be strongly correlated with each other. High values tend to follow high values and low values tend to follow low values. 8. Dependence on other uncontrolled variables. Water discharge from a well highly depends on hydraulic conductivity, sediment grain size, or some other variable. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 18. Graphical Data Analysis 1. Data type 2. Mean, median and Mode 3. Data quality control 4. Outliers 5. Nature of Hydrological Data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 19. General Characteristics of Water Resources Data 1. A lower bound of zero. No negative values are possible. 2. Presence of 'outliers‘ regularly occur, specially outliers on the high side are more common in water resources. 3. Non-normal distribution of data 4. Positive skewness is common. 5. Data reported only as below or above some threshold (censored data). Examples include concentrations below one or more detection limits, annual flood above a level, etc. 6. Seasonal patterns. Values tend to be higher or lower in certain seasons of the year. 7. Positive autocorrelation. Consecutive observations tend to be strongly correlated with each other. High values tend to follow high values and low values tend to follow low values. 8. Dependence on other uncontrolled variables. Water discharge from a well highly depends on hydraulic conductivity, sediment grain size, or some other variable. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 20. 1. Histogram 2. Scatter Plot 3. Box-Plot 4. Quantile Plot 5. Q-Q Plots 6. Enhancement of data presentation 7. Presentation of multivariate data. Graphical Data Analysis 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 21. Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to maximize insight into a data set, uncover underlying structure, extract important variables, detect outliers and anomalies, test underlying assumptions, etc. The EDA approach is an approach, not a set of techniques, but an attitude/philosophy about how a data analysis should be carried out. EDA is a philosophy as to how we dissect a data set; what we look for; how we look; and how we interpret. Exploratory Data Analysis 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 22. Data Summarization A summary analysis is simply a numeric reduction of a data set. It is quite passive. Quite commonly, its purpose is to simply arrive at a few key statistics (for example, mean and standard deviation) which may then either replace the data set or be added to the data set in the form of a summary table. Exploratory Data Analysis In contrast, EDA has as its broadest goal the desire to gain insight into the engineering/scientific process behind the data. EDA uses the data to peer into the heart of the process that generated the data. There is an archival role in the research for summary statistics, but there is an enormously larger role for the EDA approach. Summarization and Exploratory data analysis 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 23. Exploratory data analysis Exploratory data analysis mostly depends on graphical analysis. The particular graphical techniques employed in EDA are often quite simple, consisting of various techniques of: Plotting the raw data (such as histograms, scatter plots, etc.) Plotting simple statistics such as mean plots, standard deviation plots, box plots, and main effects plots of the raw data. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 24. Data analysis and interpretation cannot be completely automated, particularly when making crucial modeling choices. The analyst must use judgment and make decisions that require familiarity with the data, the site, and the questions that need to be answered. The analysis of data typically starts by plotting the data and calculating statistics that describe important characteristics of the sample. It does little help if we just look at tabulated data. However, the human eye can recognize patterns from graphical displays of the data. We perform such an exploratory analysis to: 1. familiarize ourselves with the data and 2. detect patterns of regularity. Why Graphical Data Analysis? 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 25. Why Graphical Data Analysis? 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 26. Why Graphical Data Analysis? 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 27. Summary statistics, medians and IQRs used in explanatory data analysis are said to be resistant statistics. A resistant statistic is relatively less affected by outliers than a nonresistant statistic. The mean and standard deviation are examples of nonresistant statistics. Exploratory data analysis 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 28. Distribution of Data: Histogram • A histogram is used to graphically summarize the distribution of a data set • A histogram divides the range of values in a data set into intervals • Over each interval is placed a bar whose height represents the frequency of data values in the interval. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 29. Negatively Skewed Positively Skewed Histogram from 50 years of annual average river discharge data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 30. PlatykurticLeptokurtic For the normal distribution the kurtosis coefficient is 3. The normal distribution is said to be mesokurtic. If a distribution has a relatively greater concentration of probability near the mean than the normal distribution, the kurtosis coefficient will be greater than 3 and the distribution is said to be leptokurtic. If a distribution has a relatively smaller concentration of probability, the kurtosis coefficient will be less than 3 and the distribution is said to be platykurtic. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 31. Scatter Plot A scatter plot is useful for studying the association between two interval variables. It is a plot of the values of one variable against the other. Scatter plot can be used for: • To suggest a relationship between the two variables, for instance a linear or quadratic relation, • It may help to identify patterns or clusters in the data. • Inspect these plots may help to detect outliers. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 32. At each depth two data are collected: Temperature and Nitrogen Concentration. We obtained two scatter plot: (i) Depth vs. Temperature; (ii) Depth vs. Nitrogen Concentration. In the first graph, it is observed that temperature is increasing with depth, as a general tendency. This corresponds to a positive association. In the second graph, Nitrogen concentration decreasing with depth. This corresponds to a negative association. Scatter Plot and Data Association 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 33. Scatter Plot and Data Association 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 34. Scatter Plot and Data Pattern 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 35. Scatter Plot and Outliers 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 36. A dot chart or dot plot is a statistical chart consisting of group of data points plotted on a simple scale. Dot Plot • Dot plots are one of the simplest statistical plots, and are suitable for small to moderate sized data sets. • Dot plots are used for continuous, quantitative, univariate data. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 37. Groundwater depth (meter) at 12 locations are given below: 9, 11, 18, 7, 12, 21, 15, 12, 23, 13, 12, 10 Use dot plot for EDA Dot Plot • They are useful for highlighting clusters and gaps, as well as outliers. • Their other advantage is the conservation of numerical information. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 38. Groundwater depth (meter): 9, 11, 18, 7, 12, 21, 15, 13, 23, 14, 17, 10 Dot Plot • Data Distribution is often not clear. • Can not be used for large data set 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 39. A stem-leaf plot is a technique for presenting quantitative data in a graphical format. Similar to a histogram, to assist in visualizing the shape of a distribution. A basic stem-leaf plot contains two columns separated by a vertical line. The left column contains the stems and the right column contains the leaves. Stem-leaf Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 40. Groundwater Depth (ft) at 16 locations: 39, 31, 18, 7, 22, 21, 35, 12, 23, 13, 23, 10, 41, 27, 24, 9 Stem-leaf Plot Unlike histograms, stem-leaf plots retain the original data to at least two significant digits, and put the data in order. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 41. Stem-leaf Plot Groundwater Depth (ft) at 16 locations: 39, 31, 18, 7, 22, 21, 35, 12, 23, 13, 23, 10, 41, 27, 24, 9 Can break into more than one group. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 42. Data: 9, 11, 18, 7, 12, 21, 15, 12, 23, 13, 12, 10 Dot Plot and Stem-leaf Plot: Comparison 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 43. Data: 9, 11, 18, 7, 12, 21, 15, 12, 23, 13, 12, 10 Dot Plot and Stem-leaf Plot: Comparison 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 44. Data: 9, 11, 18, 7, 12, 21, 15, 13, 23, 14, 17, 10 Dot Plot and Stem-leaf Plot: Comparison 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 45. • Stem-leaf plots are useful for displaying the relative density and shape of the data, giving the reader a quick overview of distribution. • They retain (most of) the raw numerical data, often with perfect integrity. • They are also useful for highlighting outliers and finding the mode. • However, stem and leaf plots are only useful for moderately sized data sets (around 15-150 data points). • With very small data sets a stem and leaf plot can be of little use, as a reasonable number of data points are required to establish definitive distribution properties. Stem-leaf Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 46. A boxplot is a graph of a data set that depicts the five-number summary in a visual way. The Five-Number Summary of a data set consists of the five values { min value, Q1, Q2, Q3, max value }: 1. the smallest observation, 2. lower quantile (Q1), 3. median (Q2), 4. upper quantile (Q3), and 5. largest observation. Box Plot • It is also useful in helping you compare data sets. • It is also sometimes referred to as a box-and-whisker-plot. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 47. • Draw a horizontal measurement scale • Place a rectangle above this axis: the left edge is the lower fourth quartile(Q1), the right edge is the upper fourth quartile (Q3) • Place a vertical line inside the rectangle at the location of median (Q2) • Draw “whiskers” out from either end of the rectangle to the smallest and largest observations that are not outliers Constructing a Box Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 48. Five point summary of river discharge values are given below: Minimum discharge: 51.0 cumec First quartile (Q1): 60.75 cumec Median: 63.0 cumec Third quartile (Q3): 65.0 cumec Maximum Discharge: 70.0 cumec Constructing a Box Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 49. Simple Box Plot In simple box-plot whisker is extended to maximum and minimum data points. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 50. In standard box-plot whisker is extended to 1.5 times the height of the box (1.5 times the interquartile range). Observations between one and two steps from the box in either direction, if present, are plotted individually with an asterisk ("outside values"). Observations farther than two steps beyond the box, if present, are distinguished by plotting them with a small circle ("far-out values"). Standard Box Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 51. Truncated Box Plot In truncated box-plot the whiskers are drawn only to the 90th and 10th percentiles of the data set. The largest 10 percent and smallest 10 percent of the data are not shown. It is used only when the extreme 20 percent of data are not of interest. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 52. Interpretation of Box Plot Normal distribution, symmetrical data Right skewed Left-skewed 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 53. Information contain in a Box Plot • Location and Spread • More informative to use a comparative Box plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 54. Significance of Box Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 55. Interpretation of Box Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 56. Interpretation of Box Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 57. Interpretation of Box Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 58. Interpretation of Box Plot • Box plot can be used for any size of data set • Only summary values and spread are visible • Numerical information of data are lost. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 59. Information Obtained from a Box Plot • If the median is near the center of the box, the distribution is approximately symmetric. • If the median falls to the left of the center of the box, the distribution is positively skewed. • If the median falls to the right of the center, the distribution is negatively skewed. • If the whisker lines are about the same length, the distribution is approximately symmetric. • If the right whisker line is larger than the left whisker line, the distribution is positively skewed. • If the left whisker line is larger than the right whisker line, the distribution is negatively skewed. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 60. Quantile PlotQuantile Plot To construct a quantile plot, cumulative frequency of sample data are plotted against quantiles of the standardized theoretical distribution. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 61. Quantile Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 62. Quantile Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 63. Quantile Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 64. Symmetric Right Skewed Left Skewed Uniform11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 65. Quantile Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 66. Q-Q Plot Quantile-Quantile Plot (Q-Q Plot) is a graphical method for diagnosing differences between the probability distribution and the sampling distribution or comparing two sample distribution. This is a scatterplot with the quantiles of the variable on the horizontal axis and the expected normal scores on the vertical axis. Q-Q plot can be two types: 1. Normal Q-Q plot: The normal Q-Q plot graphically compares the distribution of a given variable to the normal distribution. 2. Q-Q plot: The Q-Q plot graphically compares the distribution of two variables. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 67. • A normal distribution is often a reasonable model for the data. • Without inspecting the data, however, it is risky to assume a normal distribution. • There are a number of graphs that can be used to check the deviations of the data from the normal distribution. A histogram is an example of a graph that can be used to check normality. Here, the histogram should reveal a bell shaped curve. • The most useful tool for assessing normality is a quantile- quantile or QQ plot. • Q-Q plot is also a important graphical method for identify the outliers. • Q-Q plot can also be used to identify the shape of the data distribution, skewness, etc. Why normal Q-Q Plot? 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 68. Normal QNormal Q--Q PlotsQ Plots Construction Order n observations from smallest to largest and given a rank (i) according to its position. Quantile values corresponding to each observation is calculated using formula Theoretical normal quantile values corresponding to each calculated quantile values are obtained from normal distribution table. Plot the pairs on a two-coordinate system  x-axis: theoretical (distribution) quantiles  y-axis: sample quantiles Sample 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 69. QQ--Q PlotsQ Plots Formulas used to calculate the quantiles: 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 70. Normal Q-Q Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 71. QQ--Q PlotsQ Plots 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 72. Normal Q-Q Plot: Weibull 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 73. Normal Q-Q Plot: Comparison of Formulas Probability Plot using Blom, Cunnane and Gringorten Formulas 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 74. Normal Q-Q Plot: Comparison of Formulas Probability Plot using Blom, Cunnane and Gringorten Formulas 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 75. Normal Q-Q Plot: Hazen 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 76. Q-Q Plot using: Weibull Formula Blom, Cunnane and Gringorten Formulas Hazen Formula 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 77. Interpretation of Normal Q-Q Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 78. All but a few points fall on a line. - outliers in the data Left end of pattern is below the line; right end of pattern is above the line. - Long tails at both ends of the data distribution. Interpretation of Normal Q-Q Plot For the normal distribution: 68% within 1 SD of the mean 95% within 2 SDs 99.7% within 3 SDs 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 79. Left end of pattern is above the line; right end of pattern is below the line. - Short tails at both ends of the data distribution Curved pattern with slope increasing from left to right. - Data distribution is skewed to the right Interpretation of Normal Q-Q Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 80. Interpretation of Normal Q-Q Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 81. Description of Point Pattern Possible Interpretation all but a few points fall on a line outliers in the data left end of pattern is below the line; right end of pattern is above the line long tails at both ends of the data distribution left end of pattern is above the line; right end of pattern is below the line short tails at both ends of the data distribution curved pattern with slope increasing from left to right data distribution is skewed to the right curved pattern with slope decreasing from left to right data distribution is skewed to the left staircase pattern (plateaus and gaps) data have been rounded or are discrete Interpretation of Probability Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 82. Q-Q Plot Q-Q plot is similar to probability plot, except instead of comparing one quantile with theoretical normal quantile, two quantiles are compared. It gives us an idea about dispersion of one set of observation with other set. It is possible to compare two sets of observation to make some important interpretations. 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 83. Annual averaged rainfall for years (1960- 1980) at 15 stations are given in first column (A). Annual averaged rainfall of same 15 stations for the years (1981-2000) are given in second column (B). Use Q-Q plot to see the changing pattern of rainfall over the time. Q-Q Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 84. Construction of Q-Q Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 85. The Q-Q Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 86. • If the two distributions being compared are identical, the Q-Q plot follows the 45° line y = x. • If the two distributions agree after linearly transforming the values in one of the distributions, then the Q-Q plot follows some line, but not necessarily the line y = x. • If the general trend of the Q-Q plot is flatter than the line y = x, the distribution plotted on the horizontal axis is more dispersed than the distribution plotted on the vertical axis. • Conversely, if the general trend of the Q-Q plot is steeper than the line y = x, the distribution plotted on the vertical axis is more dispersed than the distribution plotted on the horizontal axis. • Q-Q plots are often arced, or "S" shaped, indicating that one of the distributions is more skewed than the other, or that one of the distributions has heavier tails than the other The Q-Q Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 87. • Multivariate data analysis techniques are appropriate when more than one response is measured. • Multivariate data is somewhat difficult to accurately visualize because of the multidimensional nature of the problem. • In a multivariate approach, each response variable adds another dimension to the analysis problem. Some examples of graphical techniques used for easily displaying multivariate data in two-dimensions: Profile Plot Area Plot Kite Graph Star Plot Glyph Plot Etc. Graphical Representation of Multivariate Data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 88. A profile plot uses a series of vertical axes presented consecutively along the base (x-axis) of the plot. Any number of response variables can be considered with varying scales of measurement. The response variables are arranged along the base of the Plot. Profile Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 89. Area Plot 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 90. Representation of multivariate data Multiple variables can be displayed 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 91. Representation of multivariate data Multiple variables can be displayed 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 92. Representation of multivariate data Multiple variables can be displayed 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 93. Kite Graph Four variables can be displayed 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 94. Star Plot Multiple variables can be displayed 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 95. Glyph Plots Color, size and type of marker, a whisker line with different length, angle, and color are used to represent different variables 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 96. Representation of Directional Data 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
  • 97. Statistical Hydrology MAL1303/MKAG1273 Graphical Data Analysis Dr. Shamsuddin Shahid Associate Professor Department of Hydraulics and Hydrology Faculty of Civil Engineering Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: sshahid@utm.my 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)